annotate ExtractSeqsFromFasta.xml @ 0:163892325845 draft default tip

Initial commit.
author galaxyp
date Fri, 10 May 2013 17:15:08 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
163892325845 Initial commit.
galaxyp
parents:
diff changeset
1 <!--
163892325845 Initial commit.
galaxyp
parents:
diff changeset
2 # =====================================================
163892325845 Initial commit.
galaxyp
parents:
diff changeset
3 # $Id: ExtractSeqsFromFasta.xml 90 2011-01-19 13:20:31Z pieter.neerincx@gmail.com $
163892325845 Initial commit.
galaxyp
parents:
diff changeset
4 # $URL: https://trac.nbic.nl/svn/galaxytools/trunk/tools/general/FastaTools/ExtractSeqsFromFasta.xml $
163892325845 Initial commit.
galaxyp
parents:
diff changeset
5 # $LastChangedDate: 2011-01-19 07:20:31 -0600 (Wed, 19 Jan 2011) $
163892325845 Initial commit.
galaxyp
parents:
diff changeset
6 # $LastChangedRevision: 90 $
163892325845 Initial commit.
galaxyp
parents:
diff changeset
7 # $LastChangedBy: pieter.neerincx@gmail.com $
163892325845 Initial commit.
galaxyp
parents:
diff changeset
8 # =====================================================
163892325845 Initial commit.
galaxyp
parents:
diff changeset
9 -->
163892325845 Initial commit.
galaxyp
parents:
diff changeset
10 <tool id="ExtractSeqsFromFasta1" version="1.1" name="ExtractSeqsFromFasta">
163892325845 Initial commit.
galaxyp
parents:
diff changeset
11 <description>Extract sequences from a FASTA file based on a list of IDs</description>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
12 <command interpreter="perl">ExtractSeqsFromFasta.pl $ignore_accession_number_versions -f $identifiers -i $input -o $output -l WARN</command>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
13 <inputs>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
14 <param format="fasta" name="input" type="data" label="FASTA sequences"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
15 <param format="txt" name="identifiers" type="data" label="List of IDs to extract sequences for"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
16 <param name="ignore_accession_number_versions" type="boolean" truevalue="-u" falsevalue="" optional="true" label="Ignore accession number versions"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
17 </inputs>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
18 <outputs>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
19 <data format="fasta" name="output" label="FASTA sequences for ${identifiers.name}"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
20 </outputs>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
21 <!--
163892325845 Initial commit.
galaxyp
parents:
diff changeset
22 <tests>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
23 <test>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
24 <param name="input" value="*.fasta"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
25 <param name="identifiers" value="*.txt"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
26 <output name="output" file="*.fasta"/>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
27 </test>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
28 </tests>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
29 -->
163892325845 Initial commit.
galaxyp
parents:
diff changeset
30 <help>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
31
163892325845 Initial commit.
galaxyp
parents:
diff changeset
32 .. class:: infomark
163892325845 Initial commit.
galaxyp
parents:
diff changeset
33
163892325845 Initial commit.
galaxyp
parents:
diff changeset
34 **What it does**
163892325845 Initial commit.
galaxyp
parents:
diff changeset
35
163892325845 Initial commit.
galaxyp
parents:
diff changeset
36 This tool filters a set of FASTA sequences for certain identifiers (IDs) or accession numbers. \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
37 Only sequences whose ID or accession number is present in the supplied list will remain in the filtered FASTA output. \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
38 The list of IDs or accession numbers to filter for must be a flat text file with one ID or accession per line.
163892325845 Initial commit.
galaxyp
parents:
diff changeset
39
163892325845 Initial commit.
galaxyp
parents:
diff changeset
40 This tool can match IDs with and without colon prefixed database namespaces in FASTA sequence header line. \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
41 Hence your FASTA header can contain both &gt;UniProtKB:Q86Y46 ... or just plain &gt;Q86Y46 ... . \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
42 Database namespace prefixes should not be present in the list of IDs that you want to extract sequences for.
163892325845 Initial commit.
galaxyp
parents:
diff changeset
43
163892325845 Initial commit.
galaxyp
parents:
diff changeset
44 FASTA headers may contain multiple IDs separated with pipe symbols (|) or semi colons (;). \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
45 If multiple IDs are supplied these should not contain any white space as everything after the \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
46 first white space is considered to be the (optional) description, which will not be matched against the list \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
47 of IDs to extract.
163892325845 Initial commit.
galaxyp
parents:
diff changeset
48
163892325845 Initial commit.
galaxyp
parents:
diff changeset
49 If your FASTA file contains versioned IDs / accessions, your list of IDs / accessions to extract must also contain \
163892325845 Initial commit.
galaxyp
parents:
diff changeset
50 versioned IDs / accessions and the version numbers must match.
163892325845 Initial commit.
galaxyp
parents:
diff changeset
51
163892325845 Initial commit.
galaxyp
parents:
diff changeset
52 -----
163892325845 Initial commit.
galaxyp
parents:
diff changeset
53
163892325845 Initial commit.
galaxyp
parents:
diff changeset
54 **Example**
163892325845 Initial commit.
galaxyp
parents:
diff changeset
55
163892325845 Initial commit.
galaxyp
parents:
diff changeset
56 If the FASTA header is this::
163892325845 Initial commit.
galaxyp
parents:
diff changeset
57
163892325845 Initial commit.
galaxyp
parents:
diff changeset
58 &gt;IPI:CON_IPI00174775.2|TREMBL:Q32MB2;Q86Y46 Tax_Id=9606 Gene_Symbol=KRT73 Keratin-73
163892325845 Initial commit.
galaxyp
parents:
diff changeset
59
163892325845 Initial commit.
galaxyp
parents:
diff changeset
60 The following IDs / accession numbers will match this sequence header::
163892325845 Initial commit.
galaxyp
parents:
diff changeset
61
163892325845 Initial commit.
galaxyp
parents:
diff changeset
62 CON_IPI00174775.2
163892325845 Initial commit.
galaxyp
parents:
diff changeset
63 Q32MB2
163892325845 Initial commit.
galaxyp
parents:
diff changeset
64 Q86Y46
163892325845 Initial commit.
galaxyp
parents:
diff changeset
65
163892325845 Initial commit.
galaxyp
parents:
diff changeset
66 These will not match::
163892325845 Initial commit.
galaxyp
parents:
diff changeset
67
163892325845 Initial commit.
galaxyp
parents:
diff changeset
68 IPI:CON_IPI00174775.2 (prefix should be removed)
163892325845 Initial commit.
galaxyp
parents:
diff changeset
69 KRT73 (ID part of description and not part of list of IDs,
163892325845 Initial commit.
galaxyp
parents:
diff changeset
70 which is everything up until the first white space.)
163892325845 Initial commit.
galaxyp
parents:
diff changeset
71
163892325845 Initial commit.
galaxyp
parents:
diff changeset
72 And finally these will not match unless *ignore accession number versions* is enabled::
163892325845 Initial commit.
galaxyp
parents:
diff changeset
73
163892325845 Initial commit.
galaxyp
parents:
diff changeset
74 CON_IPI00174775 (no version number, while FASTA file does contain versioned accession numbers)
163892325845 Initial commit.
galaxyp
parents:
diff changeset
75 CON_IPI00174775.1 (wrong version number)
163892325845 Initial commit.
galaxyp
parents:
diff changeset
76
163892325845 Initial commit.
galaxyp
parents:
diff changeset
77 </help>
163892325845 Initial commit.
galaxyp
parents:
diff changeset
78 </tool>