comparison dada2_removeBimeraDenovo.xml @ 0:1937c2b4da7a draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/dada2 commit f8b6b6e72914ad6bcca8423dfa03f59bde80992e"
author iuc
date Fri, 08 Nov 2019 18:49:03 -0500
parents
children 7f8f0e9fcb37
comparison
equal deleted inserted replaced
-1:000000000000 0:1937c2b4da7a
1 <tool id="dada2_removeBimeraDenovo" name="dada2: removeBimeraDenovo" version="@DADA2_VERSION@+galaxy@WRAPPER_VERSION@" profile="19.09">
2 <description>Remove bimeras from collections of unique sequences</description>
3 <macros>
4 <import>macros.xml</import>
5 </macros>
6 <expand macro="requirements"/>
7 <expand macro="stdio"/>
8 <expand macro="version_command"/>
9 <command detect_errors="exit_code"><![CDATA[
10 Rscript '$dada2_script'
11 ]]></command>
12 <configfiles>
13 <configfile name="dada2_script"><![CDATA[
14 library(dada2, quietly=T)
15 @READ_FOO@
16 @WRITE_FOO@
17
18 unqs <- $read_data($unqs)
19
20 seqtab.nochim <- removeBimeraDenovo(unqs, method = "$method")
21
22 ## - output is data.frame (mergepairs) or sequencetable (if this was the input)
23 ## in the former case the (multi item) list is stored as RDS
24 ## in the latter case the named int matrix is stored as tabular (rows=samples, columns=ASVs)
25 ## - otherwise uniques-vector, i.e. a named integer vector
26
27 #if $unqs.is_of_type('dada2_dada')
28 write.data( seqtab.nochim, '$stable_uniques', "dada2_uniques" )
29 #else if $unqs.is_of_type('dada2_sequencetable')
30 write.data( seqtab.nochim, '$stable_sequencetable', "dada2_sequencetable" )
31 #else if $unqs.is_of_type('dada2_mergepairs')
32 write.data( seqtab.nochim, '$stable_mergepairs', "dada2_mergepairs" )
33 #end if
34
35 ## dada input
36 if(class(unqs)=="list"){
37 unqssum<-lapply(lapply(unqs, getUniques),sum)
38 outsum<-lapply(seqtab.nochim,sum)
39 mapply(function(X,Y) { 100*X/Y }, X=nonchimsum, Y=unqssum)
40 cat("remaining nonchimeric: ", mapply(function(X,Y) { 100*X/Y }, X=nonchimsum, Y=unqssum), "%")
41 }else{
42 cat("remaining nonchimeric: ", 100*sum(getUniques(seqtab.nochim))/sum(getUniques(unqs)), "%")
43 }
44 ]]></configfile>
45 </configfiles>
46 <inputs>
47 <param name="unqs" type="data" format="@DADA_UNIQUES@,dada2_sequencetable" label="sequence table" help=""/>
48 <param argument="method" type="select" label="Method">
49 <option value="consensus">check samples independently for bimeras and make a consensus decision on each sequence variant</option>
50 <option value="pooled">pool samples for bimera identification</option>
51 <option value="per-sample">check samples independently for bimeras and remove (0-out) sequence variants from samples independently</option>
52 </param>
53 </inputs>
54 <outputs>
55 <!-- fix output filters in a later release https://github.com/galaxyproject/galaxy/issues/7464 -->
56 <data name="stable_uniques" format="dada2_uniques" label="${tool.name} on ${on_string}" from_work_dir="nonchim.uniques">
57 <filter>unqs.ext == "dada2_dada"</filter>
58 </data>
59 <data name="stable_mergepairs" format="dada2_mergepairs" label="${tool.name} on ${on_string}" from_work_dir="nonchim.mergepairs">
60 <filter>unqs.ext == "dada2_mergepairs"</filter>
61 </data>
62 <data name="stable_sequencetable" format="dada2_sequencetable" label="${tool.name} on ${on_string}" from_work_dir="nonchim.mergepairs">
63 <filter>unqs.ext == "dada2_sequencetable"</filter>
64 </data>
65 </outputs>
66 <tests>
67 <test expect_num_outputs="1">
68 <param name="unqs" ftype="dada2_sequencetable" value="makeSequenceTable_F3D0.tab"/>
69 <output name="stable_sequencetable" value="removeBimeraDenovo_F3D0.tab" ftype="dada2_sequencetable" />
70 </test>
71 <!-- dada input -->
72 <test expect_num_outputs="1">
73 <param name="unqs" ftype="dada2_dada" value="dada_F3D0_R1.Rdata"/>
74 <output name="stable_uniques" value="removeBimeraDenovo_F3D0_dada_uniques.tab" ftype="dada2_uniques" />
75 </test>
76 <!-- mergepairs input + non default-->
77 <test expect_num_outputs="1">
78 <param name="unqs" ftype="dada2_mergepairs" value="mergePairs_F3D0.Rdata"/>
79 <param name="method" value="pooled"/>
80 <output name="stable_mergepairs" value="removeBimeraDenovo_F3D0_mergepairs.Rdata" ftype="dada2_mergepairs" />
81 </test>
82 </tests>
83 <help><![CDATA[
84 Description
85 ...........
86
87 This tool can be used to remove chimeric sequences, i.e. sequences that can be constructed by combining a left-segment and a right-segment from two more abundant “parent” sequences.. Two methods to identify chimeras are supported: Identification from pooled sequences and identification by consensus across samples.
88
89 - from **pooled** sequences: Each sequence is evaluated against a set of "parents" drawn from the sequence collection that are sufficiently more abundant than the sequence being evaluated. Sequences that are bimera are removed, i.e. a two-parent chimera, in which the left side is made up of one parent sequence, and the right-side made up of a second parent sequence.
90 - by **consensus**: In short, bimeric sequences are flagged on a sample-by-sample basis. Then, a vote is performed for each sequence across all samples in which it appeared. If the sequence is flagged in a sufficiently high fraction of samples, it is identified as a bimera. A logical vector is returned, with an entry for each sequence in the table indicating whether it was identified as bimeric by this consensus procedure.
91
92 Usage
93 .....
94
95 **Input**
96
97 - the results of makeSequenceTable (note that also the results of dada, and mergePairs are accepted)
98
99 **Output**
100
101 A data set of type:
102 - dada2_sequenceTable (resp. dada2_mergepairs) if the input is of type dada2_sequenceTable (resp. dada2_mergepairs)
103 - dada2_uniques otherwise
104
105 Details
106 .......
107
108 The frequency of chimeric sequences varies substantially from dataset to dataset, and depends on on factors including experimental procedures and sample complexity. Here chimeras make up about 21% of the merged sequence variants, but when we account for the abundances of those variants we see they account for only about 4% of the merged sequence reads.
109
110 Considerations for your own data: Most of your reads should remain after chimera removal (it is not uncommon for a majority of sequence variants to be removed though). If most of your reads were removed as chimeric, upstream processing may need to be revisited. In almost all cases this is caused by primer sequences with ambiguous nucleotides that were not removed prior to beginning the DADA2 pipeline.
111
112 @HELP_OVERVIEW@
113 ]]></help>
114 <expand macro="citations"/>
115 </tool>