Mercurial > repos > galaxyp > nbic_fasta
comparison ExtractSeqsFromFasta.xml @ 0:163892325845 draft default tip
Initial commit.
author | galaxyp |
---|---|
date | Fri, 10 May 2013 17:15:08 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:163892325845 |
---|---|
1 <!-- | |
2 # ===================================================== | |
3 # $Id: ExtractSeqsFromFasta.xml 90 2011-01-19 13:20:31Z pieter.neerincx@gmail.com $ | |
4 # $URL: https://trac.nbic.nl/svn/galaxytools/trunk/tools/general/FastaTools/ExtractSeqsFromFasta.xml $ | |
5 # $LastChangedDate: 2011-01-19 07:20:31 -0600 (Wed, 19 Jan 2011) $ | |
6 # $LastChangedRevision: 90 $ | |
7 # $LastChangedBy: pieter.neerincx@gmail.com $ | |
8 # ===================================================== | |
9 --> | |
10 <tool id="ExtractSeqsFromFasta1" version="1.1" name="ExtractSeqsFromFasta"> | |
11 <description>Extract sequences from a FASTA file based on a list of IDs</description> | |
12 <command interpreter="perl">ExtractSeqsFromFasta.pl $ignore_accession_number_versions -f $identifiers -i $input -o $output -l WARN</command> | |
13 <inputs> | |
14 <param format="fasta" name="input" type="data" label="FASTA sequences"/> | |
15 <param format="txt" name="identifiers" type="data" label="List of IDs to extract sequences for"/> | |
16 <param name="ignore_accession_number_versions" type="boolean" truevalue="-u" falsevalue="" optional="true" label="Ignore accession number versions"/> | |
17 </inputs> | |
18 <outputs> | |
19 <data format="fasta" name="output" label="FASTA sequences for ${identifiers.name}"/> | |
20 </outputs> | |
21 <!-- | |
22 <tests> | |
23 <test> | |
24 <param name="input" value="*.fasta"/> | |
25 <param name="identifiers" value="*.txt"/> | |
26 <output name="output" file="*.fasta"/> | |
27 </test> | |
28 </tests> | |
29 --> | |
30 <help> | |
31 | |
32 .. class:: infomark | |
33 | |
34 **What it does** | |
35 | |
36 This tool filters a set of FASTA sequences for certain identifiers (IDs) or accession numbers. \ | |
37 Only sequences whose ID or accession number is present in the supplied list will remain in the filtered FASTA output. \ | |
38 The list of IDs or accession numbers to filter for must be a flat text file with one ID or accession per line. | |
39 | |
40 This tool can match IDs with and without colon prefixed database namespaces in FASTA sequence header line. \ | |
41 Hence your FASTA header can contain both >UniProtKB:Q86Y46 ... or just plain >Q86Y46 ... . \ | |
42 Database namespace prefixes should not be present in the list of IDs that you want to extract sequences for. | |
43 | |
44 FASTA headers may contain multiple IDs separated with pipe symbols (|) or semi colons (;). \ | |
45 If multiple IDs are supplied these should not contain any white space as everything after the \ | |
46 first white space is considered to be the (optional) description, which will not be matched against the list \ | |
47 of IDs to extract. | |
48 | |
49 If your FASTA file contains versioned IDs / accessions, your list of IDs / accessions to extract must also contain \ | |
50 versioned IDs / accessions and the version numbers must match. | |
51 | |
52 ----- | |
53 | |
54 **Example** | |
55 | |
56 If the FASTA header is this:: | |
57 | |
58 >IPI:CON_IPI00174775.2|TREMBL:Q32MB2;Q86Y46 Tax_Id=9606 Gene_Symbol=KRT73 Keratin-73 | |
59 | |
60 The following IDs / accession numbers will match this sequence header:: | |
61 | |
62 CON_IPI00174775.2 | |
63 Q32MB2 | |
64 Q86Y46 | |
65 | |
66 These will not match:: | |
67 | |
68 IPI:CON_IPI00174775.2 (prefix should be removed) | |
69 KRT73 (ID part of description and not part of list of IDs, | |
70 which is everything up until the first white space.) | |
71 | |
72 And finally these will not match unless *ignore accession number versions* is enabled:: | |
73 | |
74 CON_IPI00174775 (no version number, while FASTA file does contain versioned accession numbers) | |
75 CON_IPI00174775.1 (wrong version number) | |
76 | |
77 </help> | |
78 </tool> |