Mercurial > repos > pjbriggs > trimmomatic
changeset 8:415a165d92bb draft
Uploaded v0.36.4.
author | pjbriggs |
---|---|
date | Thu, 22 Jun 2017 09:07:16 -0400 |
parents | 6eeacf19a38e |
children | 53af7b5b1b56 |
files | README.rst test-data/trimmomatic_pe_r1_paired_out1_clip.fastq test-data/trimmomatic_pe_r2_unpaired_out1_clip.fastq trimmomatic.xml |
diffstat | 4 files changed, 132 insertions(+), 13 deletions(-) [+] |
line wrap: on
line diff
--- a/README.rst Tue Mar 21 08:42:05 2017 -0400 +++ b/README.rst Thu Jun 22 09:07:16 2017 -0400 @@ -71,6 +71,9 @@ ========== ====================================================================== Version Changes ---------- ---------------------------------------------------------------------- +0.36.4 - Add option to provide custom adapter sequences for ILLUMINACLIP + - Add options ``minAdapterLength`` and ``keepBothReads`` for ILLUMINACLIP + in palindrome mode 0.36.3 - Fix naming of output collections. Instead of all outputs being called "Trimmomatic on collection NN" these will now be called "Trimmomatic on collection NN: paired" or "Trimmomatic on collection NN: unpaired". @@ -106,7 +109,8 @@ This wrapper has been developed and is maintained by Peter Briggs (@pjbriggs). Peter van Heusden (@pvanheus) and Marius van den Beek (@mvdbeek) contributed -support for gz compressed FastQ files. +support for gz compressed FastQ files. Charles Girardot (@cgirardot) and +Jelle Scholtalbers (@scholtalbers) contributed additional options to ILLUMINACLIP. Developers
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trimmomatic_pe_r1_paired_out1_clip.fastq Thu Jun 22 09:07:16 2017 -0400 @@ -0,0 +1,32 @@ +@MISEQ:1:000000000-A2Y1L:1:1101:19264:2433 1:N:0:NAAGGCGATAGATCGC +AGATAGCCGAAGATAAAGAGNT ++ +?????BBB@BBBB?BBFFFF#6 +@MISEQ:1:000000000-A2Y1L:1:1101:18667:2435 1:N:0:NAAGGCGATAGATCGC +ATATATTCATCCGCCATTATNA ++ +?????BBBDDDDADDDE@FF#6 +@MISEQ:1:000000000-A2Y1L:1:1101:17523:2436 1:N:0:NAAGGCGATAGATCGC +CATCACTACCGCTCAGGAATNTGACGGCAGTCTTAGCGGCGCTCTAGTGCGGGAGGCCGTATCTTGGAATAAGGTGTCGTCAAATGCAAGGCAGGTAACCCTACACGCCGAGG ++ +?<,<?BBBBBBBBBBBFFFF#6ACECCEC78CE=CGHEH7CHCHFGHADGHHHCCCHHE<DFHFFFFDFFFFFFDFDFDDEEEEEEEEECAEEEE;?BEEEBEEBCE;C;48; +@MISEQ:1:000000000-A2Y1L:1:1101:15489:2437 1:N:0:NAAGGCGATAGATCGC +GAGCAGTCGGGCTCAGCGCTNT ++ +5====>/<@@@@@@>@CCCE#6 +@MISEQ:1:000000000-A2Y1L:1:1101:18851:2442 1:N:0:NAAGGCGATAGATCGC +GGTATCCCCCGGCAGTGAGGATGGAGCCATGGTCTGCATCATACTCACCGTAGGTGAGAATATCCACGTCCTTCGACTCCTGGGTGCCGTCTATGGTGCCCTCTGTTACCAGGCAGTGACGGAGGACATGGTGAGGTTTCAGTACCTCTGGCCCGGCCTGG ++ +??,<?BBBDDDDDDD<FFF@FC;FFFBEFHHHCDDHHGHHHDGHHHFHHEA?EEAEEHDEFHHHHHFECFHHHFHDHEEHHCFH7CEFHDEHHCFHHFHHF=FFFDFDFFFFEEEEDDEFEEE<BBCEBCE,==AE1::AAEECEE*?*AAEFF??>D?)8 +@MISEQ:1:000000000-A2Y1L:1:1101:15290:2442 1:N:0:NAAGGCGATAGATCGC +AAAATAATCCTAAAAAATAACCTCTATGCCGCCGAACGCTCCGCCTCTATCTTCGTAAAAACTATCTTCTCCTCCTCACCTCCATAATCAAGCATCAAGCGATCGCCCTCCGCCAACTCACCCCGCAAAATCTTATCCGCTAGCGGATTCTCAATCTCCGTCTGAATGACCCGCCTCAGCGGCCGCGCCCCATAAACCGAATCAAATCCACGC ++ +?????BBBDDDDDDDDGGGGGGIIIHHFFHHHHHHHHHHEHHEHHHHHIIHHHHHFEHIIIHHIHHIHIHIIIIIHHHHHHHHHHHHHHHHDHHHHHHHHGEDFGGGGGGG;CEGEGCEGGGGG8>GGGGEGGEECEGGGGD8EDGGAEGEEGGCE:CGG8CEEGG???CEE<DG8CC*??>DG.8<AGGGGCEEG*C2<GCCECE*:?CE?C +@MISEQ:1:000000000-A2Y1L:1:1101:15892:2446 1:N:0:NAAGGCGATAGATCGC +CTTCCCCACGGCCCAGACACAAGAGACGACCTCCATAAATCTTTTAGAGGGTGACCGCATCTCCGACGCAAACCAGGACGCCGATACCCTCGTGGTGGTGTTCGACCGTACGGATGGCGCAGACACCGACGGCACTAGTGCCACGGTATCAGGTACCACCATAACGTATGATTCGGGCACGCTCAAGGGCCAGCGTGACGGAATCGATAGAATACACTACACGGTGACTGATGGGG ++ +?????BBBDBDDDDDDFFFFFFHIHIHHHHHHIHIFGGHFHHHHIIFHIHH?EEGHHHHHH-EGEHHCEHHHHH@FDFFEFF5@EEEFFEFE;AECCE;AEEEEEA?8?AEDDEEDFFDE2>>EEFF<<<2>D?DEEE*:C?AA<>8AEFCEE:?C?EEE?CEFEE0?:E?ACEECD8>EE>)8>E:CEEEEEED.)?AE??A?:A?*??:C0?CCE?AAA:88.88?::C:C?*8 +@MISEQ:1:000000000-A2Y1L:1:1101:17903:2450 1:N:0:TAAGGCGATAGATCGC +GTGCAGGGGG ++ +=5===<>+5<
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trimmomatic_pe_r2_unpaired_out1_clip.fastq Thu Jun 22 09:07:16 2017 -0400 @@ -0,0 +1,8 @@ +@MISEQ:1:000000000-A2Y1L:1:1101:18106:2444 2:N:0:NAAGGCGATAGATCGC +GAGTTACTATTACAGAGTGGAGCTAATGTACTGGCAGTTGACTGCAAAAGAAATGCTCCAATTCATGTTGCATGTGCAAATATGAGCTTAGAGTGTTTAAAGATAATTTTGTGCCACAAAAAATGCAACCCAAACCAACAAAATGCAGTGGGAGACACTCCACTCCACACCTTATGCAGTTTGGGGACTTGTGATATGAGAATACTAC ++ +?????BBBDDDDDDDDGCFGFGIIIIIHFHIIIIHIHFG=EHHIGIIIIFHIHDGHGHHHIHIH=CGHGGHFHHHFHFGHHFH/ACDFGG?FE?CDFFHHHHHIIHFHHI>CEDGFHHHHHHHHDFHHFHHHFFAFFGGGGGGEDEG>>DACC;?EGG>CEEA>AEACCEE?:C::CC:::C:CE<C<9C:?C?*?CEECCC*:C?C? +@MISEQ:1:000000000-A2Y1L:1:1101:15113:2451 2:N:0:TAAGGCGATAGATCGC +GAGGGGAGGAGGGGAAGGGAGAGGGGAAGAGAGGAGAGGAG ++ +?????@9@B?B?BBBBEEEFB@@EEHEC?BF-CE@DDEH,5
--- a/trimmomatic.xml Tue Mar 21 08:42:05 2017 -0400 +++ b/trimmomatic.xml Thu Jun 22 09:07:16 2017 -0400 @@ -1,4 +1,4 @@ -<tool id="trimmomatic" name="Trimmomatic" version="0.36.3"> +<tool id="trimmomatic" name="Trimmomatic" version="0.36.4"> <description>flexible read trimming tool for Illumina NGS data</description> <macros> <import>trimmomatic_macros.xml</import> @@ -33,7 +33,19 @@ #end if ## ILLUMINACLIP option #if $illuminaclip.do_illuminaclip - ILLUMINACLIP:\$TRIMMOMATIC_ADAPTERS_PATH/$illuminaclip.adapter_fasta:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold + #if $illuminaclip.adapter_type.standard_or_custom == "custom" + #if $readtype.single_or_paired in ["pair_of_files","collection"] + ILLUMINACLIP:$adapter_file_from_text:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold:$illuminaclip.min_adapter_len:$illuminaclip.keep_both_reads + #else + ILLUMINACLIP:$adapter_file_from_text:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold + #end if + #else + #if $readtype.single_or_paired in ["pair_of_files","collection"] + ILLUMINACLIP:\$TRIMMOMATIC_ADAPTERS_PATH/$illuminaclip.adapter_type.adapter_fasta:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold:$illuminaclip.min_adapter_len:$illuminaclip.keep_both_reads + #else + ILLUMINACLIP:\$TRIMMOMATIC_ADAPTERS_PATH/$illuminaclip.adapter_type.adapter_fasta:$illuminaclip.seed_mismatches:$illuminaclip.palindrome_clip_threshold:$illuminaclip.simple_clip_threshold + #end if + #end if #end if ## Other operations #for $op in $operations @@ -81,6 +93,14 @@ mv fastq_out.'$fastq_in.extension' '${fastq_out}' #end if ]]></command> + <configfiles> + <configfile name="adapter_file_from_text">#set from_text_area = '' +#if str( $illuminaclip.do_illuminaclip ) == "yes" and str( $illuminaclip.adapter_type.standard_or_custom ) == "custom": +#set from_text_area = $illuminaclip.adapter_type.adapter_text +#end if +${from_text_area}</configfile> + </configfiles> + <inputs> <conditional name="readtype"> <param name="single_or_paired" type="select" label="Single-end or paired-end reads?"> @@ -104,17 +124,37 @@ <conditional name="illuminaclip"> <param name="do_illuminaclip" type="boolean" label="Perform initial ILLUMINACLIP step?" help="Cut adapter and other illumina-specific sequences from the read" truevalue="yes" falsevalue="no" checked="False" /> <when value="yes"> - <param name="adapter_fasta" type="select" label="Adapter sequences to use"> - <option value="TruSeq2-SE.fa">TruSeq2 (single-ended, for Illumina GAII)</option> - <option value="TruSeq3-SE.fa">TruSeq3 (single-ended, for MiSeq and HiSeq)</option> - <option value="TruSeq2-PE.fa">TruSeq2 (paired-ended, for Illumina GAII)</option> - <option value="TruSeq3-PE.fa">TruSeq3 (paired-ended, for MiSeq and HiSeq)</option> - <option value="TruSeq3-PE-2.fa">TruSeq3 (additional seqs) (paired-ended, for MiSeq and HiSeq)</option> - <option value="NexteraPE-PE.fa">Nextera (paired-ended)</option> - </param> + <conditional name="adapter_type"> + <param name="standard_or_custom" type="select" label="Select standard adapter sequences or provide custom?"> + <option value="standard" selected="true">Standard</option> + <option value="custom">Custom</option> + </param> + <when value="standard"> + <param name="adapter_fasta" type="select" label="Adapter sequences to use"> + <option value="TruSeq2-SE.fa">TruSeq2 (single-ended, for Illumina GAII)</option> + <option value="TruSeq3-SE.fa">TruSeq3 (single-ended, for MiSeq and HiSeq)</option> + <option value="TruSeq2-PE.fa">TruSeq2 (paired-ended, for Illumina GAII)</option> + <option value="TruSeq3-PE.fa">TruSeq3 (paired-ended, for MiSeq and HiSeq)</option> + <option value="TruSeq3-PE-2.fa">TruSeq3 (additional seqs) (paired-ended, for MiSeq and HiSeq)</option> + <option value="NexteraPE-PE.fa">Nextera (paired-ended)</option> + </param> + </when> + <when value="custom"> + <param name="adapter_text" type="text" area="True" size="10x30" value="" + label="Custom adapter sequences in fasta format" help="Write sequences in the fasta format."> + <sanitizer> + <valid initial="string.printable"></valid> + <mapping initial="none"/> + </sanitizer> + </param> + </when> + </conditional> <param name="seed_mismatches" type="integer" label="Maximum mismatch count which will still allow a full match to be performed" value="2" /> <param name="palindrome_clip_threshold" type="integer" label="How accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment" value="30" /> <param name="simple_clip_threshold" type="integer" label="How accurate the match between any adapter etc. sequence must be against a read" value="10" /> + <param name="min_adapter_len" type="integer" label="Minimum length of adapter that needs to be detected (PE specific/palindrome mode)" value="8" /> + <param name="keep_both_reads" type="boolean" label="Always keep both reads (PE specific/palindrome mode)?" truevalue="true" falsevalue="false" checked="true" + help="See help below"/> </when> <when value="no" /> <!-- empty clause to satisfy planemo lint --> </conditional> @@ -287,6 +327,35 @@ <param name="operations_0|operation|strictness" value="0.8" /> <output name="fastq_out" file="trimmomatic_maxinfo.fastq" /> </test> + <test> + <!-- Paired-end ILLUMINACLIP - this does not check valid clipping --> + <param name="single_or_paired" value="pair_of_files" /> + <param name="fastq_r1_in" value="Illumina_SG_R1.fastq" ftype="fastqsanger" /> + <param name="fastq_r2_in" value="Illumina_SG_R2.fastq" ftype="fastqsanger" /> + <param name="do_illuminaclip" value="true"/> + <param name="adapter_fasta" value="TruSeq2-PE.fa"/> + <param name="operations_0|operation|name" value="SLIDINGWINDOW" /> + <output name="fastq_out_r1_paired" file="trimmomatic_pe_r1_paired_out1_clip.fastq" /> + <output name="fastq_out_r1_unpaired" file="trimmomatic_pe_r1_unpaired_out1.fastq" /> + <output name="fastq_out_r2_paired" file="trimmomatic_pe_r2_paired_out1.fastq" /> + <output name="fastq_out_r2_unpaired" file="trimmomatic_pe_r2_unpaired_out1_clip.fastq" /> + </test> + <test> + <!-- Paired-end ILLUMINACLIP providing 'custom' adapters - this does not check valid clipping --> + <param name="single_or_paired" value="pair_of_files" /> + <param name="fastq_r1_in" value="Illumina_SG_R1.fastq" ftype="fastqsanger" /> + <param name="fastq_r2_in" value="Illumina_SG_R2.fastq" ftype="fastqsanger" /> + <param name="do_illuminaclip" value="true"/> + <param name="standard_or_custom" value="custom"/> + <param name="adapter_text" + value=">PrefixPE/1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >PrefixPE/2 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT >PCR_Primer1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >PCR_Primer1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT >PCR_Primer2 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT >PCR_Primer2_rc AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG >FlowCell1 TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC >FlowCell2 TTTTTTTTTTCAAGCAGAAGACGGCATACGA "/> + <param name="adapter_fasta" value="TruSeq2-PE.fa"/> + <param name="operations_0|operation|name" value="SLIDINGWINDOW" /> + <output name="fastq_out_r1_paired" file="trimmomatic_pe_r1_paired_out1_clip.fastq" /> + <output name="fastq_out_r1_unpaired" file="trimmomatic_pe_r1_unpaired_out1.fastq" /> + <output name="fastq_out_r2_paired" file="trimmomatic_pe_r2_paired_out1.fastq" /> + <output name="fastq_out_r2_unpaired" file="trimmomatic_pe_r2_unpaired_out1_clip.fastq" /> + </test> </tests> <help><![CDATA[ .. class:: infomark @@ -299,6 +368,12 @@ This tool allows the following trimming steps to be performed: * **ILLUMINACLIP:** Cut adapter and other illumina-specific sequences from the read + + * If **Always keep both reads (PE specific/palindrome mode)** is True, the reverse read will also be retained in palindrome mode. + After read-though has been detected by palindrome mode, and the adapter sequence removed, + the reverse read contains the same sequence information as the forward read, albeit in reverse complement. + For this reason, the default behaviour is to entirely drop the reverse read. + Retaining the reverse read may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. * **SLIDINGWINDOW:** Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold * **MINLEN:** Drop the read if it is below a specified length @@ -359,8 +434,8 @@ **Credits** This Galaxy tool has been developed within the Bioinformatics Core Facility at the -University of Manchester, with contributions from Peter van Heusden and Marius -van den Beek. +University of Manchester, with contributions from Peter van Heusden, Marius +van den Beek, Jelle Scholtalbers and Charles Girardot. It runs the Trimmomatic program which has been developed within Bjorn Usadel's group at RWTH Aachen university.