Mercurial > repos > devteam > ncbi_blast_plus
comparison tools/ncbi_blast_plus/README.rst @ 13:623f727cdff1 draft
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
author | peterjc |
---|---|
date | Fri, 14 Mar 2014 07:40:46 -0400 |
parents | 4c4a0da938ff |
children | 2fe07f50a41e |
comparison
equal
deleted
inserted
replaced
12:6560192c5098 | 13:623f727cdff1 |
---|---|
1 Galaxy wrappers for NCBI BLAST+ suite | 1 Galaxy wrappers for NCBI BLAST+ suite |
2 ===================================== | 2 ===================================== |
3 | 3 |
4 These wrappers are copyright 2010-2013 by Peter Cock, The James Hutton Institute | 4 These wrappers are copyright 2010-2013 by Peter Cock (The James Hutton Institute, |
5 (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. | 5 UK) and additional contributors. All rights reserved. See the licence text below. |
6 See the licence text below. | |
7 | 6 |
8 Currently tested with NCBI BLAST 2.2.28+ (i.e. version 2.2.28 of BLAST+), | 7 Currently tested with NCBI BLAST 2.2.28+ (i.e. version 2.2.28 of BLAST+), |
9 and does not work with the NCBI 'legacy' BLAST suite (e.g. ``blastall``). | 8 and does not work with the NCBI 'legacy' BLAST suite (e.g. ``blastall``). |
10 | 9 |
11 Note that these wrappers (and the associated datatypes) were originally | 10 Note that these wrappers (and the associated datatypes) were originally |
24 Galaxy should be able to automatically install the dependencies, i.e. the | 23 Galaxy should be able to automatically install the dependencies, i.e. the |
25 ``blast_datatypes`` repository which defines the BLAST XML file format | 24 ``blast_datatypes`` repository which defines the BLAST XML file format |
26 (``blastxml``) and protein and nucleotide BLAST databases (``blastdbp`` and | 25 (``blastxml``) and protein and nucleotide BLAST databases (``blastdbp`` and |
27 ``blastdbn``). | 26 ``blastdbn``). |
28 | 27 |
29 You must tell Galaxy about any system level BLAST databases using configuration | 28 See the configuration notes below. |
30 files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein | |
31 databases like NR), and blastdb_d.loc (protein domain databases like CDD or | |
32 SMART) which are located in the tool-data/ folder. Sample files are included | |
33 which explain the tab-based format to use. | |
34 | |
35 You can download the NCBI provided databases as tar-balls from here: | |
36 | |
37 * ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR) | |
38 * ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD) | |
39 | |
40 | 29 |
41 Manual Installation | 30 Manual Installation |
42 =================== | 31 =================== |
43 | 32 |
44 For those not using Galaxy's automated installation from the Tool Shed, put | 33 For those not using Galaxy's automated installation from the Tool Shed, put |
77 Run the functional tests (adjusting the section identifier to match your | 66 Run the functional tests (adjusting the section identifier to match your |
78 ``tool_conf.xml.sample`` file):: | 67 ``tool_conf.xml.sample`` file):: |
79 | 68 |
80 ./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools | 69 ./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools |
81 | 70 |
71 Configuration | |
72 ============= | |
73 | |
74 You must tell Galaxy about any system level BLAST databases using configuration | |
75 files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein | |
76 databases like NR), and blastdb_d.loc (protein domain databases like CDD or | |
77 SMART) which are located in the tool-data/ folder. Sample files are included | |
78 which explain the tab-based format to use. | |
79 | |
80 You can download the NCBI provided databases as tar-balls from here: | |
81 | |
82 * ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR) | |
83 * ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD) | |
84 | |
85 If using the optional taxonomy columns, you will also need to download the | |
86 NCBI taxonomy files (``taxdb.btd`` and ``taxdb.bti`` from ``taxdb.tar.gz`` on | |
87 the BLAST database FTP site). Currently explicit version tracking of the | |
88 taxonomy is not supported, and in order to use this you must set the | |
89 ``$BLASTDB`` environment variable to include the path where you unzipped the | |
90 taxonomy files. If this is not done, the taxonomy columns like species name | |
91 will appear as ``N/A`` in the tabular output. | |
92 | |
93 The BLAST+ binaries support multi-threaded operation, which is handled via the | |
94 $GALAXY_SLOTS environment variable. This should be set automatically by Galaxy | |
95 via your job runner settings, which allows you to (for example) allocate four | |
96 cores to each BLAST job. | |
97 | |
98 In addition, the BLAST+ wrappers also support high level parallelism by task | |
99 splitting if ``use_tasked_jobs = True`` is enabled in your ``universe_wsgi.ini`` | |
100 configuration file. Essentially, the FASTA input query files are broken up into | |
101 batches of 1000 sequences, a separate BLAST child job is run for each chunk, | |
102 and then the BLAST output files are merged (in order). This is transparent | |
103 for the end user. | |
82 | 104 |
83 History | 105 History |
84 ======= | 106 ======= |
85 | 107 |
86 ======= ====================================================================== | 108 ======= ====================================================================== |
104 'blast_datatypes' repository from the Tool Shed. | 126 'blast_datatypes' repository from the Tool Shed. |
105 v0.0.17 - The BLAST+ search tools now default to extended tabular output | 127 v0.0.17 - The BLAST+ search tools now default to extended tabular output |
106 (all too often our users where having to re-run searches just to | 128 (all too often our users where having to re-run searches just to |
107 get one of the missing columns like query or subject length) | 129 get one of the missing columns like query or subject length) |
108 v0.0.18 - Defensive quoting of filenames in case of spaces (where possible, | 130 v0.0.18 - Defensive quoting of filenames in case of spaces (where possible, |
109 BLAST+ handling of some mult-file arguments is problematic). | 131 BLAST+ handling of some multi-file arguments is problematic). |
110 v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc | 132 v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc |
111 for the domain databases they use (e.g. CDD, PFAM or SMART). | 133 for the domain databases they use (e.g. CDD, PFAM or SMART). |
112 - Correct case of exception regular expression (for error handling | 134 - Correct case of exception regular expression (for error handling |
113 fall-back in case the return code is not set properly). | 135 fall-back in case the return code is not set properly). |
114 - Clearer naming of output files. | 136 - Clearer naming of output files. |
120 - Dependency on new package_blast_plus_2_2_26 in Tool Shed. | 142 - Dependency on new package_blast_plus_2_2_26 in Tool Shed. |
121 - Adopted standard MIT License. | 143 - Adopted standard MIT License. |
122 - Development moved to GitHub, https://github.com/peterjc/galaxy_blast | 144 - Development moved to GitHub, https://github.com/peterjc/galaxy_blast |
123 - Updated citation information (Cock et al. 2013). | 145 - Updated citation information (Cock et al. 2013). |
124 v0.0.21 - Use macros to simplify the XML wrappers. | 146 v0.0.21 - Use macros to simplify the XML wrappers. |
125 - Added wrapper for dustmasker | 147 - Added wrapper for dustmasker. |
126 - Enabled masking for makeblastdb | 148 - Enabled masking for makeblastdb. |
127 - Requires 'maskinfo-asn1' and 'maskinfo-asn1-binary' datatypes | 149 - Requires 'maskinfo-asn1' and 'maskinfo-asn1-binary' datatypes. |
128 defined in updated blast_datatypes on Galaxy ToolShed. | 150 defined in updated blast_datatypes on Galaxy ToolShed. |
129 - Tests updated for BLAST+ 2.2.27 instead of BLAST+ 2.2.26 | 151 - Tests updated for BLAST+ 2.2.27 instead of BLAST+ 2.2.26. |
130 - Now depends on package_blast_plus_2_2_27 in ToolShed | 152 - Now depends on package_blast_plus_2_2_27 in ToolShed. |
131 v0.0.22 - More use macros to simplify the wrappers | 153 v0.0.22 - More use macros to simplify the wrappers. |
132 - Set number of threads via $GALAXY_SLOTS environment variable | 154 - Set number of threads via $GALAXY_SLOTS environment variable. |
133 - More descriptive default output names | 155 - More descriptive default output names. |
134 - Tests require updated BLAST DB definitions (blast_datatypes v0.0.18) | 156 - Tests require updated BLAST DB definitions (blast_datatypes v0.0.18). |
135 - Pre-check for duplicate identifiers in makeblastdb wrapper. | 157 - Pre-check for duplicate identifiers in makeblastdb wrapper. |
136 - Tests updated for BLAST+ 2.2.28 instead of BLAST+ 2.2.27 | 158 - Tests updated for BLAST+ 2.2.28 instead of BLAST+ 2.2.27. |
137 - Now depends on package_blast_plus_2_2_28 in ToolShed | 159 - Now depends on package_blast_plus_2_2_28 in ToolShed. |
138 - Extended tabular output includes 'salltitles' as column 25. | 160 - Extended tabular output includes 'salltitles' as column 25. |
161 v0.1.00 - Now depends on package_blast_plus_2_2_29 in ToolShed. | |
162 - Tabular output now includes option to pick specific columns, | |
163 including previously unavailable taxonomy columns. | |
164 - BLAST XML to tabular tool supports multiple input files. | |
165 - More detailed descriptions for BLASTN and BLASTP task option. | |
166 - Wrappers for segmasker, dustmasker and convert2blastmask. | |
167 - Supports using maskinfo with makeblastdb wrapper. | |
168 - Supports setting a taxonomy ID in makeblastdb wrapper. | |
169 - Subtle changes like new conditional settings will require some old | |
170 workflows be updated to cope. | |
139 ======= ====================================================================== | 171 ======= ====================================================================== |
140 | 172 |
141 | 173 |
142 Bug Reports | 174 Bug Reports |
143 =========== | 175 =========== |