Mercurial > repos > peterjc > sample_seqs

diff tools/sample_seqs/sample_seqs.xml @ 5:6b71ad5d43fb draft
v0.2.3 clarified help, internal cleanup of Python script
author: peterjc
date: Wed, 01 Feb 2017 09:39:36 -0500
parents: d3aa9f25c24c
children: 31f5701cd2e9
--- a/tools/sample_seqs/sample_seqs.xml	Wed Aug 05 12:30:18 2015 -0400
+++ b/tools/sample_seqs/sample_seqs.xml	Wed Feb 01 09:39:36 2017 -0500
@@ -1,4 +1,4 @@
-<tool id="sample_seqs" name="Sub-sample sequences files" version="0.2.2">
+<tool id="sample_seqs" name="Sub-sample sequences files" version="0.2.3">
     <description>e.g. to reduce coverage</description>
     <requirements>
         <requirement type="package" version="1.65">biopython</requirement>
@@ -205,9 +205,13 @@
 For example using 20% would take every 5th pair of records, or you
 could request 1000 read pairs.
 
+If instead of interleaved paired reads you have two matched files (one
+for each pair), run the tool twice with the same sampling options to
+make to matched smaller files.
+
 .. class:: warningmark
 
-Note interleaves/pair mode does *not* actually check your read names
+Note interleaved/pair mode does *not* actually check your read names
 match a known pair naming scheme!
 
 **Example Usage**
@@ -215,8 +219,9 @@
 Suppose you have some Illumina paired end data as files ``R1.fastq`` and
 ``R2.fastq`` which give an estimated x200 coverage, and you wish to do a
 *de novo* assembly with a tool like MIRA which recommends lower coverage.
-Taking every 3rd read would reduce the estimated coverage to about x66,
-and would preserve the pairing as well.
+Running the tool twice (on ``R1.fastq`` and ``R2.fastq``) taking every
+3rd read would reduce the estimated coverage to about x66, and would
+preserve the pairing as well (as two smaller FASTQ files).
 
 Similarly, if you had some Illumina paired end data interleaved into one
 file with an estimated x200 coverage, you would run this tool in
author	peterjc
date	Wed, 01 Feb 2017 09:39:36 -0500
parents	d3aa9f25c24c
children	31f5701cd2e9