annotate fastaptamer_cluster_1.xml @ 0:307254415eb1 draft

Uploaded
author fastaptamer
date Tue, 10 Feb 2015 14:30:29 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
1 <tool id="fastaptamer_cluster_1_0_2" name="FASTAptamer-Cluster" version="1.0.2">
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
2
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
3 <description>Cluster closely-related sequences using Levenshtein edit distance.</description>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
4
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
5 <version_command>fastaptamer_cluster -v</version_command>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
6
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
7 <command interpreter="perl">fastaptamer_cluster -i $input -o $output -d $distance -f $filter > $report
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
8 </command>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
9
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
10 <inputs>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
11 <param name="input" type="data" format="fasta" label="Input file" help="Must use FASTA output from FASTAptamer-Count"></param>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
12 <param name="distance" type="integer" label="Levenshtein Edit Distance" value="1" help="Minimum number of insertions, deletions, or substitutions required to transfer a sequence into another"></param>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
13 <param name="filter" type="integer" label="Read Filter" optional="true" value="1" help="Only sequences with total reads greater than the value supplied will be clustered."></param>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
14 </inputs>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
15
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
16 <outputs>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
17 <data name="output" format="fasta" label="FASTAptamer-Cluster output file"></data>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
18 <data name="report" format="txt" label="FASTAptamer-Cluster Report"></data>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
19 </outputs>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
20
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
21 <help>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
22
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
23 .. class:: warningmark
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
24
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
25 FASTAptamer-Cluster requires a FASTA formatted input file generated by FASTAptamer-Count.
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
26
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
27 .. class:: warningmark
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
28
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
29 FASTAptamer-Cluster uses an exhaustive approach to clustering and can take *several* hours to process. For faster processing utilize the "Read Filter" option to exclude low read sequences.
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
30
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
31 ------
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
32
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
33 **FASTAptamer-Cluster** uses the Levenshtein algorithm to cluster together closely-related sequences based on a user-defined edit distance (*the minimum number of insertions, deletions, or subsitutions required to transform one string into another*).
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
34
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
35 FASTAptamer-Cluster begins with the most abundant sequence in a population, referred to as the "seed sequence," and clusters with it every sequence in the file within an edit distance less than or equal to the specified edit distance (Cluster #1). The next most abundant unclustered sequence then serves as the next seed sequence for assembling the second cluster from the remaining sequences (Cluster #2), followed by the next most abundant unclustered sequence (Cluster #3), and so on. This process is iterated until every sequence is clustered.
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
36
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
37 Output is FASTA formatted with the following information on the FASTA identifier line:
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
38
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
39 >Rank-Reads-RPM-Cluster#-RankWithinCluster-EditDistanceFromSeedSequence
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
40
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
41 .. class:: infomark
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
42
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
43 The "Read Filter" excludes from the clustering process sequences with a total number of reads less than or equal to the integer supplied. Because of the computational complexity of clustering large datasets, the default filter setting of 1 is designed to eliminate singleton sequences from clustering.
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
44
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
45 ------
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
46
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
47 .. image::
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
48 http://burkelab.missouri.edu/images/fastaptamer-logo-xs.png
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
49 :height: 98
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
50 :width: 300
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
51
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
52 For more information on FASTAptamer, visit our website_.
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
53
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
54 FASTAptamer is distributed under a GNU GPL v3.0 license. For complete license click here_.
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
55
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
56 .. _here: http://burkelab.missouri.edu/fastaptamer/LICENSE.txt
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
57 .. _website: http://burkelab.missouri.edu/fastaptamer.html
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
58
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
59 </help>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
60
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
61 <citations>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
62 <citation type="doi">doi:10.1038/mtna.2015.4</citation>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
63 </citations>
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
64
307254415eb1 Uploaded
fastaptamer
parents:
diff changeset
65 </tool>