changeset 8:415a165d92bb draft

Uploaded v0.36.4.
author pjbriggs
date Thu, 22 Jun 2017 09:07:16 -0400
parents 6eeacf19a38e
children 53af7b5b1b56
files README.rst test-data/trimmomatic_pe_r1_paired_out1_clip.fastq test-data/trimmomatic_pe_r2_unpaired_out1_clip.fastq trimmomatic.xml
diffstat 4 files changed, 132 insertions(+), 13 deletions(-) [+]
line wrap: on
line diff
--- a/README.rst	Tue Mar 21 08:42:05 2017 -0400
+++ b/README.rst	Thu Jun 22 09:07:16 2017 -0400
@@ -71,6 +71,9 @@
 ========== ======================================================================
 Version    Changes
 ---------- ----------------------------------------------------------------------
+0.36.4     - Add option to provide custom adapter sequences for ILLUMINACLIP
+           - Add options ``minAdapterLength`` and ``keepBothReads`` for ILLUMINACLIP
+             in palindrome mode
 0.36.3     - Fix naming of output collections. Instead of all outputs being called
              "Trimmomatic on collection NN" these will now be called "Trimmomatic
              on collection NN: paired" or "Trimmomatic on collection NN: unpaired".
@@ -106,7 +109,8 @@
 
 This wrapper has been developed and is maintained by Peter Briggs (@pjbriggs).
 Peter van Heusden (@pvanheus) and Marius van den Beek (@mvdbeek) contributed
-support for gz compressed FastQ files.
+support for gz compressed FastQ files. Charles Girardot (@cgirardot) and
+Jelle Scholtalbers (@scholtalbers) contributed additional options to ILLUMINACLIP.
 
 
 Developers
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/trimmomatic_pe_r1_paired_out1_clip.fastq	Thu Jun 22 09:07:16 2017 -0400
@@ -0,0 +1,32 @@
+@MISEQ:1:000000000-A2Y1L:1:1101:19264:2433 1:N:0:NAAGGCGATAGATCGC
+AGATAGCCGAAGATAAAGAGNT
++
+?????BBB@BBBB?BBFFFF#6
+@MISEQ:1:000000000-A2Y1L:1:1101:18667:2435 1:N:0:NAAGGCGATAGATCGC
+ATATATTCATCCGCCATTATNA
++
+?????BBBDDDDADDDE@FF#6
+@MISEQ:1:000000000-A2Y1L:1:1101:17523:2436 1:N:0:NAAGGCGATAGATCGC
+CATCACTACCGCTCAGGAATNTGACGGCAGTCTTAGCGGCGCTCTAGTGCGGGAGGCCGTATCTTGGAATAAGGTGTCGTCAAATGCAAGGCAGGTAACCCTACACGCCGAGG
++
+?<,<?BBBBBBBBBBBFFFF#6ACECCEC78CE=CGHEH7CHCHFGHADGHHHCCCHHE<DFHFFFFDFFFFFFDFDFDDEEEEEEEEECAEEEE;?BEEEBEEBCE;C;48;
+@MISEQ:1:000000000-A2Y1L:1:1101:15489:2437 1:N:0:NAAGGCGATAGATCGC
+GAGCAGTCGGGCTCAGCGCTNT
++
+5====>/<@@@@@@>@CCCE#6
+@MISEQ:1:000000000-A2Y1L:1:1101:18851:2442 1:N:0:NAAGGCGATAGATCGC
+GGTATCCCCCGGCAGTGAGGATGGAGCCATGGTCTGCATCATACTCACCGTAGGTGAGAATATCCACGTCCTTCGACTCCTGGGTGCCGTCTATGGTGCCCTCTGTTACCAGGCAGTGACGGAGGACATGGTGAGGTTTCAGTACCTCTGGCCCGGCCTGG
++
+??,<?BBBDDDDDDD<FFF@FC;FFFBEFHHHCDDHHGHHHDGHHHFHHEA?EEAEEHDEFHHHHHFECFHHHFHDHEEHHCFH7CEFHDEHHCFHHFHHF=FFFDFDFFFFEEEEDDEFEEE<BBCEBCE,==AE1::AAEECEE*?*AAEFF??>D?)8
+@MISEQ:1:000000000-A2Y1L:1:1101:15290:2442 1:N:0:NAAGGCGATAGATCGC
+AAAATAATCCTAAAAAATAACCTCTATGCCGCCGAACGCTCCGCCTCTATCTTCGTAAAAACTATCTTCTCCTCCTCACCTCCATAATCAAGCATCAAGCGATCGCCCTCCGCCAACTCACCCCGCAAAATCTTATCCGCTAGCGGATTCTCAATCTCCGTCTGAATGACCCGCCTCAGCGGCCGCGCCCCATAAACCGAATCAAATCCACGC
++
+?????BBBDDDDDDDDGGGGGGIIIHHFFHHHHHHHHHHEHHEHHHHHIIHHHHHFEHIIIHHIHHIHIHIIIIIHHHHHHHHHHHHHHHHDHHHHHHHHGEDFGGGGGGG;CEGEGCEGGGGG8>GGGGEGGEECEGGGGD8EDGGAEGEEGGCE:CGG8CEEGG???CEE<DG8CC*??>DG.8<AGGGGCEEG*C2<GCCECE*:?CE?C
+@MISEQ:1:000000000-A2Y1L:1:1101:15892:2446 1:N:0:NAAGGCGATAGATCGC
+CTTCCCCACGGCCCAGACACAAGAGACGACCTCCATAAATCTTTTAGAGGGTGACCGCATCTCCGACGCAAACCAGGACGCCGATACCCTCGTGGTGGTGTTCGACCGTACGGATGGCGCAGACACCGACGGCACTAGTGCCACGGTATCAGGTACCACCATAACGTATGATTCGGGCACGCTCAAGGGCCAGCGTGACGGAATCGATAGAATACACTACACGGTGACTGATGGGG
++
+?????BBBDBDDDDDDFFFFFFHIHIHHHHHHIHIFGGHFHHHHIIFHIHH?EEGHHHHHH-EGEHHCEHHHHH@FDFFEFF5@EEEFFEFE;AECCE;AEEEEEA?8?AEDDEEDFFDE2>>EEFF<<<2>D?DEEE*:C?AA<>8AEFCEE:?C?EEE?CEFEE0?:E?ACEECD8>EE>)8>E:CEEEEEED.)?AE??A?:A?*??:C0?CCE?AAA:88.88?::C:C?*8
+@MISEQ:1:000000000-A2Y1L:1:1101:17903:2450 1:N:0:TAAGGCGATAGATCGC
+GTGCAGGGGG
++
+=5===<>+5<
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/trimmomatic_pe_r2_unpaired_out1_clip.fastq	Thu Jun 22 09:07:16 2017 -0400
@@ -0,0 +1,8 @@
+@MISEQ:1:000000000-A2Y1L:1:1101:18106:2444 2:N:0:NAAGGCGATAGATCGC
+GAGTTACTATTACAGAGTGGAGCTAATGTACTGGCAGTTGACTGCAAAAGAAATGCTCCAATTCATGTTGCATGTGCAAATATGAGCTTAGAGTGTTTAAAGATAATTTTGTGCCACAAAAAATGCAACCCAAACCAACAAAATGCAGTGGGAGACACTCCACTCCACACCTTATGCAGTTTGGGGACTTGTGATATGAGAATACTAC
++
+?????BBBDDDDDDDDGCFGFGIIIIIHFHIIIIHIHFG=EHHIGIIIIFHIHDGHGHHHIHIH=CGHGGHFHHHFHFGHHFH/ACDFGG?FE?CDFFHHHHHIIHFHHI>CEDGFHHHHHHHHDFHHFHHHFFAFFGGGGGGEDEG>>DACC;?EGG>CEEA>AEACCEE?:C::CC:::C:CE<C<9C:?C?*?CEECCC*:C?C?
+@MISEQ:1:000000000-A2Y1L:1:1101:15113:2451 2:N:0:TAAGGCGATAGATCGC
+GAGGGGAGGAGGGGAAGGGAGAGGGGAAGAGAGGAGAGGAG
++
+?????@9@B?B?BBBBEEEFB@@EEHEC?BF-CE@DDEH,5
--- a/trimmomatic.xml	Tue Mar 21 08:42:05 2017 -0400
+++ b/trimmomatic.xml	Thu Jun 22 09:07:16 2017 -0400
@@ -1,4 +1,4 @@
-<tool id="trimmomatic" name="Trimmomatic" version="0.36.3">
+<tool id="trimmomatic" name="Trimmomatic" version="0.36.4">
   <description>flexible read trimming tool for Illumina NGS data</description>
   <macros>
     <import>trimmomatic_macros.xml</import>
@@ -33,7 +33,19 @@
   #end if
   ## ILLUMINACLIP option
   #if $illuminaclip.do_illuminaclip
-    ILLUMINACLIP:\$TRIMMOMATIC_ADAPTERS_PATH/$illuminaclip.adapter_fasta:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold
+    #if $illuminaclip.adapter_type.standard_or_custom == "custom"
+      #if $readtype.single_or_paired in ["pair_of_files","collection"]
+        ILLUMINACLIP:$adapter_file_from_text:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold:$illuminaclip.min_adapter_len:$illuminaclip.keep_both_reads
+      #else
+        ILLUMINACLIP:$adapter_file_from_text:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold
+      #end if
+    #else
+      #if $readtype.single_or_paired in ["pair_of_files","collection"]
+        ILLUMINACLIP:\$TRIMMOMATIC_ADAPTERS_PATH/$illuminaclip.adapter_type.adapter_fasta:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold:$illuminaclip.min_adapter_len:$illuminaclip.keep_both_reads
+      #else
+        ILLUMINACLIP:\$TRIMMOMATIC_ADAPTERS_PATH/$illuminaclip.adapter_type.adapter_fasta:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold
+      #end if
+    #end if
   #end if
   ## Other operations
   #for $op in $operations
@@ -81,6 +93,14 @@
     mv fastq_out.'$fastq_in.extension' '${fastq_out}'
   #end if
   ]]></command>
+  <configfiles>
+    <configfile name="adapter_file_from_text">#set from_text_area = ''
+#if str( $illuminaclip.do_illuminaclip ) == "yes" and str( $illuminaclip.adapter_type.standard_or_custom ) == "custom":
+#set from_text_area = $illuminaclip.adapter_type.adapter_text
+#end if
+${from_text_area}</configfile>
+  </configfiles>
+
   <inputs>
     <conditional name="readtype">
       <param name="single_or_paired" type="select" label="Single-end or paired-end reads?">
@@ -104,17 +124,37 @@
     <conditional name="illuminaclip">
       <param name="do_illuminaclip" type="boolean" label="Perform initial ILLUMINACLIP step?" help="Cut adapter and other illumina-specific sequences from the read" truevalue="yes" falsevalue="no" checked="False" />
       <when value="yes">
-        <param name="adapter_fasta" type="select" label="Adapter sequences to use">
-          <option value="TruSeq2-SE.fa">TruSeq2 (single-ended, for Illumina GAII)</option>
-          <option value="TruSeq3-SE.fa">TruSeq3 (single-ended, for MiSeq and HiSeq)</option>
-          <option value="TruSeq2-PE.fa">TruSeq2 (paired-ended, for Illumina GAII)</option>
-          <option value="TruSeq3-PE.fa">TruSeq3 (paired-ended, for MiSeq and HiSeq)</option>
-          <option value="TruSeq3-PE-2.fa">TruSeq3 (additional seqs) (paired-ended, for MiSeq and HiSeq)</option>
-          <option value="NexteraPE-PE.fa">Nextera (paired-ended)</option>
-        </param>
+        <conditional name="adapter_type">
+          <param name="standard_or_custom" type="select" label="Select standard adapter sequences or provide custom?">
+            <option value="standard" selected="true">Standard</option>
+            <option value="custom">Custom</option>
+          </param>
+          <when value="standard">
+            <param name="adapter_fasta" type="select" label="Adapter sequences to use">
+              <option value="TruSeq2-SE.fa">TruSeq2 (single-ended, for Illumina GAII)</option>
+              <option value="TruSeq3-SE.fa">TruSeq3 (single-ended, for MiSeq and HiSeq)</option>
+              <option value="TruSeq2-PE.fa">TruSeq2 (paired-ended, for Illumina GAII)</option>
+              <option value="TruSeq3-PE.fa">TruSeq3 (paired-ended, for MiSeq and HiSeq)</option>
+              <option value="TruSeq3-PE-2.fa">TruSeq3 (additional seqs) (paired-ended, for MiSeq and HiSeq)</option>
+              <option value="NexteraPE-PE.fa">Nextera (paired-ended)</option>
+            </param>
+          </when>
+          <when value="custom">
+            <param name="adapter_text" type="text" area="True" size="10x30" value=""
+                   label="Custom adapter sequences in fasta format" help="Write sequences in the fasta format.">
+              <sanitizer>
+                  <valid initial="string.printable"></valid>
+                  <mapping initial="none"/>
+              </sanitizer>
+            </param>
+          </when>
+        </conditional>
         <param name="seed_mismatches" type="integer" label="Maximum mismatch count which will still allow a full match to be performed" value="2" />
         <param name="palindrome_clip_threshold" type="integer" label="How accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment" value="30" />
         <param name="simple_clip_threshold" type="integer" label="How accurate the match between any adapter etc. sequence must be against a read" value="10" />
+        <param name="min_adapter_len" type="integer" label="Minimum length of adapter that needs to be detected (PE specific/palindrome mode)" value="8" />
+        <param name="keep_both_reads" type="boolean" label="Always keep both reads (PE specific/palindrome mode)?" truevalue="true" falsevalue="false" checked="true"
+               help="See help below"/>
       </when>
       <when value="no" /> <!-- empty clause to satisfy planemo lint -->
     </conditional>
@@ -287,6 +327,35 @@
       <param name="operations_0|operation|strictness" value="0.8" />
       <output name="fastq_out" file="trimmomatic_maxinfo.fastq" />
     </test>
+    <test>
+      <!-- Paired-end ILLUMINACLIP - this does not check valid clipping -->
+      <param name="single_or_paired" value="pair_of_files" />
+      <param name="fastq_r1_in" value="Illumina_SG_R1.fastq" ftype="fastqsanger" />
+      <param name="fastq_r2_in" value="Illumina_SG_R2.fastq" ftype="fastqsanger" />
+      <param name="do_illuminaclip" value="true"/>
+      <param name="adapter_fasta" value="TruSeq2-PE.fa"/>
+      <param name="operations_0|operation|name" value="SLIDINGWINDOW" />
+      <output name="fastq_out_r1_paired" file="trimmomatic_pe_r1_paired_out1_clip.fastq" />
+      <output name="fastq_out_r1_unpaired" file="trimmomatic_pe_r1_unpaired_out1.fastq" />
+      <output name="fastq_out_r2_paired" file="trimmomatic_pe_r2_paired_out1.fastq" />
+      <output name="fastq_out_r2_unpaired" file="trimmomatic_pe_r2_unpaired_out1_clip.fastq" />
+    </test>
+    <test>
+      <!-- Paired-end ILLUMINACLIP providing 'custom' adapters - this does not check valid clipping -->
+      <param name="single_or_paired" value="pair_of_files" />
+      <param name="fastq_r1_in" value="Illumina_SG_R1.fastq" ftype="fastqsanger" />
+      <param name="fastq_r2_in" value="Illumina_SG_R2.fastq" ftype="fastqsanger" />
+      <param name="do_illuminaclip" value="true"/>
+      <param name="standard_or_custom" value="custom"/>
+      <param name="adapter_text"
+             value=">PrefixPE/1&#10;AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT&#10;>PrefixPE/2&#10;CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT&#10;>PCR_Primer1&#10;AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT&#10;>PCR_Primer1_rc&#10;AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT&#10;>PCR_Primer2&#10;CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT&#10;>PCR_Primer2_rc&#10;AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG&#10;>FlowCell1&#10;TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC&#10;>FlowCell2&#10;TTTTTTTTTTCAAGCAGAAGACGGCATACGA&#10;"/>
+      <param name="adapter_fasta" value="TruSeq2-PE.fa"/>
+      <param name="operations_0|operation|name" value="SLIDINGWINDOW" />
+      <output name="fastq_out_r1_paired" file="trimmomatic_pe_r1_paired_out1_clip.fastq" />
+      <output name="fastq_out_r1_unpaired" file="trimmomatic_pe_r1_unpaired_out1.fastq" />
+      <output name="fastq_out_r2_paired" file="trimmomatic_pe_r2_paired_out1.fastq" />
+      <output name="fastq_out_r2_unpaired" file="trimmomatic_pe_r2_unpaired_out1_clip.fastq" />
+    </test>
   </tests>
   <help><![CDATA[
 .. class:: infomark
@@ -299,6 +368,12 @@
 This tool allows the following trimming steps to be performed:
 
  * **ILLUMINACLIP:** Cut adapter and other illumina-specific sequences from the read
+
+   * If **Always keep both reads (PE specific/palindrome mode)** is True, the reverse read will also be retained in palindrome mode.
+     After read-though has been detected by palindrome mode, and the adapter sequence removed,
+     the reverse read contains the same sequence information as the forward read, albeit in reverse complement.
+     For this reason, the default behaviour is to entirely drop the reverse read.
+     Retaining the reverse read may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads.
  * **SLIDINGWINDOW:** Perform a sliding window trimming, cutting once the average
    quality within the window falls below a threshold
  * **MINLEN:** Drop the read if it is below a specified length
@@ -359,8 +434,8 @@
 **Credits**
 
 This Galaxy tool has been developed within the Bioinformatics Core Facility at the
-University of Manchester, with contributions from Peter van Heusden and Marius
-van den Beek.
+University of Manchester, with contributions from Peter van Heusden, Marius
+van den Beek, Jelle Scholtalbers and Charles Girardot.
 
 It runs the Trimmomatic program which has been developed
 within Bjorn Usadel's group at RWTH Aachen university.