Mercurial > repos > peterjc > sample_seqs
diff tools/sample_seqs/sample_seqs.xml @ 5:6b71ad5d43fb draft
v0.2.3 clarified help, internal cleanup of Python script
author | peterjc |
---|---|
date | Wed, 01 Feb 2017 09:39:36 -0500 |
parents | d3aa9f25c24c |
children | 31f5701cd2e9 |
line wrap: on
line diff
--- a/tools/sample_seqs/sample_seqs.xml Wed Aug 05 12:30:18 2015 -0400 +++ b/tools/sample_seqs/sample_seqs.xml Wed Feb 01 09:39:36 2017 -0500 @@ -1,4 +1,4 @@ -<tool id="sample_seqs" name="Sub-sample sequences files" version="0.2.2"> +<tool id="sample_seqs" name="Sub-sample sequences files" version="0.2.3"> <description>e.g. to reduce coverage</description> <requirements> <requirement type="package" version="1.65">biopython</requirement> @@ -205,9 +205,13 @@ For example using 20% would take every 5th pair of records, or you could request 1000 read pairs. +If instead of interleaved paired reads you have two matched files (one +for each pair), run the tool twice with the same sampling options to +make to matched smaller files. + .. class:: warningmark -Note interleaves/pair mode does *not* actually check your read names +Note interleaved/pair mode does *not* actually check your read names match a known pair naming scheme! **Example Usage** @@ -215,8 +219,9 @@ Suppose you have some Illumina paired end data as files ``R1.fastq`` and ``R2.fastq`` which give an estimated x200 coverage, and you wish to do a *de novo* assembly with a tool like MIRA which recommends lower coverage. -Taking every 3rd read would reduce the estimated coverage to about x66, -and would preserve the pairing as well. +Running the tool twice (on ``R1.fastq`` and ``R2.fastq``) taking every +3rd read would reduce the estimated coverage to about x66, and would +preserve the pairing as well (as two smaller FASTQ files). Similarly, if you had some Illumina paired end data interleaved into one file with an estimated x200 coverage, you would run this tool in