Mercurial > repos > devteam > ncbi_blast_plus
annotate tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml @ 27:6f8ea4b9a2c4 draft
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
author | peterjc |
---|---|
date | Wed, 09 Sep 2020 15:32:17 +0000 |
parents | 2889433c7ae1 |
children | 87a7ee4cb36f |
rev | line source |
---|---|
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
1 <tool id="ncbi_blastdbcmd_wrapper" name="NCBI BLAST+ blastdbcmd entry(s)" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> |
5 | 2 <description>Extract sequence(s) from BLAST database</description> |
11
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
3 <macros> |
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
4 <token name="@BINARY@">blastdbcmd</token> |
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
5 <import>ncbi_macros.xml</import> |
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
6 </macros> |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
7 <expand macro="preamble" /> |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
8 <command detect_errors="aggressive" strict="true"> |
5 | 9 ## The command is a Cheetah template which allows some Python based syntax. |
10 ## Lines starting hash hash are comments. Galaxy will turn newlines into spaces | |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
11 blastdbcmd |
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
12 @DBCMD_OPTS@ |
5 | 13 |
14 ##TODO: What about -ctrl_a and -target_only as advanced options? | |
15 | |
16 #if $id_opts.id_type=="file": | |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
17 -entry_batch '$id_opts.entries' |
5 | 18 #else: |
19 ##Perform some simple search/replaces to remove whitespace | |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
20 ##and make it comma separated. Quoted so don't escape pipes. |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
21 -entry "$id_opts.entries.replace('\r',',').replace('\n',',').replace(' ','').replace(',,',',').replace(',,',',').strip(',')" |
5 | 22 #end if |
23 | |
24 ##When building a BLAST database, to ensure unique IDs makeblastdb will | |
25 ##do things like turning a FASTA entry with ID of ERP44 into lcl|ERP44 | |
26 ##(if using -parse_seqids) or simply assign it an ID using the record | |
27 ##number like gnl|BL_ORD_ID|123 (to cope with duplicate IDs in the FASTA | |
28 ##file). In -parse_seqids mode, a duplicate FASTA ID gives an error. | |
29 ## | |
30 ##The BLAST plain text and XML output will contain these BLAST IDs, but | |
31 ##the tabular output does not (at least, not in BLAST 2.2.25+). | |
32 ##Therefore in general, Galaxy users won't care about the (internal) | |
33 ##BLAST identifiers. | |
34 ## | |
35 ##The blastdbcmd FASTA output will also contain these IDs, but in the | |
36 ##context of the BLAST tabular output they are not helpful. Therefore | |
37 ##to recover the original ID as used in the FASTA file for makeblastdb | |
38 ##we need a litte post processing. | |
39 ## | |
40 ##We remove the NCBI's lcl|... or gnl|BL_ORD_ID|123 prefixes | |
41 ##using sed, however the exact syntax differs for Mac OS X's sed | |
42 | |
43 #if str($outfmt)=="blastid": | |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
44 -out '$seq' |
5 | 45 #else if sys.platform == "darwin": |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
46 | sed -E 's/^>(lcl\||gnl\|BL_ORD_ID\|[0-9]* )/>/1' > "$seq" |
5 | 47 #else: |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
48 | sed 's/>\(lcl|\|gnl|BL_ORD_ID|[0-9]* \)/>/1' > "$seq" |
5 | 49 #end if |
50 </command> | |
51 <inputs> | |
11
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
52 <expand macro="input_conditional_choose_db_type" /> |
5 | 53 <conditional name="id_opts"> |
54 <param name="id_type" type="select" label="Type of identifier list"> | |
55 <option value="file">From file</option> | |
56 <option value="prompt">User entered</option> | |
57 </param> | |
58 <when value="file"> | |
26 | 59 <param name="entries" argument="-entry_batch" type="data" format="txt,tabular" label="Sequence identifier(s)" help="Plain text file with one ID per line, optionally with space separated range, strand, and algorithm."/> |
5 | 60 </when> |
61 <when value="prompt"> | |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
62 <param name="entries" argument="-entry" type="text" optional="false" area="true" size="10x30" label="Sequence identifier(s)" help="Comma or new line separated list"/> |
5 | 63 </when> |
64 </conditional> | |
65 <param name="outfmt" type="select" label="Output format"> | |
66 <option value="original">FASTA with original identifiers</option> | |
67 <option value="blastid">FASTA with BLAST assigned identifiers</option> | |
68 </param> | |
69 </inputs> | |
70 <outputs> | |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
71 |
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
72 <data name="seq" format="fasta" label="Sequences from blastdbcmd @ON_DBCMD_OPTS@"> |
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
73 </data> |
5 | 74 </outputs> |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
75 <tests> |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
76 <test> |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
77 <param name="db_opts|db_type" value="prot" /> |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
78 <param name="db_opts|db_origin|database" value="four_human_proteins" /> |
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
79 <param name="db_opts|db_origin|db_origin_selector" value="db" /> |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
80 <param name="id_opts|id_type" value="prompt" /> |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
81 <param name="id_opts|entries" value="all" /> |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
82 <param name="outfmt" value="original" /> |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
83 <output name="seq" file="four_human_proteins.fasta" ftype="fasta" /> |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
84 </test> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
85 <test> |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
86 <!-- This used to recover the original FASTA file, but had GI numbers --> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
87 <param name="db_opts|db_type" value="nucl" /> |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
88 <param name="db_opts|db_origin|database" value="rhodopsin_nucs" /> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
89 <param name="id_opts|id_type" value="prompt" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
90 <param name="id_opts|entries" value="all" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
91 <param name="outfmt" value="original" /> |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
92 <output name="seq" file="rhodopsin_nucs.no_gi.fasta" ftype="fasta" /> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
93 </test> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
94 <test> |
26 | 95 <!-- This uses various start end frame combinations but all recover full sequence --> |
96 <param name="db_opts|db_type" value="nucl" /> | |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
97 <param name="db_opts|db_origin|database" value="rhodopsin_nucs" /> |
26 | 98 <param name="id_opts|id_type" value="file" /> |
99 <param name="id_opts|entries" value="rhodopsin_nucs.blastdbcmd.txt" ftype="txt" /> | |
100 <param name="outfmt" value="original" /> | |
101 <output name="seq" file="rhodopsin_nucs.no_gi.fasta" ftype="fasta" /> | |
102 </test> | |
103 <test> | |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
104 <param name="db_opts|db_type" value="nucl" /> |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
105 <param name="db_opts|db_origin|database" value="rhodopsin_nucs" /> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
106 <param name="id_opts|id_type" value="prompt" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
107 <param name="id_opts|entries" value="U59921.1" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
108 <param name="outfmt" value="original" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
109 <output name="seq" file="rhodopsin_bufo.fasta" ftype="fasta" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
110 </test> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
111 <test> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
112 <param name="db_opts|db_type" value="nucl" /> |
27
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
113 <!-- look in two databases for this entry --> |
6f8ea4b9a2c4
"planemo upload for repository https://github.com/peterjc/galaxy_blast/tree/master/tools/ncbi_blast_plus commit 3f9f39ad808325a11d9967980d2cb82c96d69324"
peterjc
parents:
26
diff
changeset
|
114 <param name="db_opts|db_origin|database" value="rhodopsin_nucs,three_human_mRNA" /> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
115 <param name="id_opts|id_type" value="prompt" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
116 <param name="id_opts|entries" value="gi|2734705|gb|U59921.1|BBU59921" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
117 <param name="outfmt" value="original" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
118 <output name="seq" file="rhodopsin_bufo.fasta" ftype="fasta" /> |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
19
diff
changeset
|
119 </test> |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
120 </tests> |
5 | 121 <help> |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
122 |
5 | 123 **What it does** |
124 | |
125 Extracts FASTA formatted sequences from a BLAST database | |
126 using the NCBI BLAST+ blastdbcmd command line tool. | |
127 | |
26 | 128 When giving a text file of entries, use one line per sequence. |
129 Optional valies should be space separate - the simplest syntax | |
130 is ``identifier start-end`` (where ``end`` can be just ``-``), | |
131 or ``identifier start-end strand`` (wheere the strand given as | |
132 either ``+`` or ``-``). | |
133 | |
5 | 134 .. class:: warningmark |
135 | |
136 **BLAST assigned identifiers** | |
137 | |
138 When a BLAST database is constructed from a FASTA file, the | |
139 original identifiers can be replaced with BLAST assigned | |
140 identifiers, partly to ensure uniqueness. e.g. Sometimes | |
141 a prefix of 'lcl|' is added (lcl is short for local), | |
142 or an arbitrary name starting 'gnl|BL_ORD_ID|' is created. | |
143 | |
144 If you are using the tabular output from BLAST, it will contain | |
145 the original identifiers - not the BLAST assigned identifiers | |
146 suitable for use with the blastdbcmd tool. | |
147 | |
148 If you are using the XML or plain text output, this will also | |
149 contain the BLAST assigned identifiers. However, this means | |
150 getting a list of BLAST assigned identifiers isn't straightforward. | |
151 | |
152 ------- | |
153 | |
26 | 154 @CLI_OPTIONS@ |
23 | 155 |
156 ------- | |
157 | |
5 | 158 **References** |
159 | |
10
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
9
diff
changeset
|
160 If you use this Galaxy tool in work leading to a scientific publication please |
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
9
diff
changeset
|
161 cite the following papers: |
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
9
diff
changeset
|
162 |
11
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
163 @REFERENCES@ |
5 | 164 </help> |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
165 <expand macro="blast_citations" /> |
5 | 166 </tool> |