Mercurial > repos > peterjc > blast2go
changeset 6:872cf247c899 draft
Uploaded v0.0.8, auto-installation, RST README, MIT licence, citation information, development moved to GitHub, split out XML formatter into standalone script
| author | peterjc | 
|---|---|
| date | Mon, 23 Sep 2013 05:50:58 -0400 | 
| parents | e4419efbefad | 
| children | 0ac3ef59ea93 | 
| files | blast2go/README.rst blast2go/blast2go.py blast2go/blast2go.xml blast2go/massage_xml_for_blast2go.py blast2go/repository_dependencies.xml blast2go/tool_dependencies.xml test-data/blastp_sample.blast2go.tabular test-data/blastp_sample.xml tool-data/blast2go.loc.sample tools/blast2go/blast2go.py tools/blast2go/blast2go.txt tools/blast2go/blast2go.xml tools/blast2go/repository_dependencies.xml | 
| diffstat | 13 files changed, 893 insertions(+), 434 deletions(-) [+] | 
line wrap: on
 line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/blast2go/README.rst Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,208 @@ +Galaxy wrapper for Blast2GO for pipelines, b2g4pipe +=================================================== + +This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below (MIT licence). + +This is a wrapper for the command line Java tool b2g4pipe v2.5, +Blast2GO for pipelines. It is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go + + +References +========== + +Peter Cock, Bjoern Gruening, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +S. Geotz et al. (2008). +High-throughput functional annotation and data mining with the Blast2GO suite. +Nucleic Acids Res. 36(10):3420-3435. +http://dx.doi.org/10.1093/nar/gkn176 + +A. Conesa and S. Geotz (2008). +Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. +International Journal of Plant Genomics. 619832. +http://dx.doi.org/10.1155/2008/619832 + +A. Conesa et al. (2005). +Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. +Bioinformatics 21:3674-3676. +http://dx.doi.org/10.1093/bioinformatics/bti610 + +See also http://www.blast2go.com/ + + +Automated Installation +====================== + +Installation via the Galaxy Tool Shed should take care of the Galaxy side of +things, including the dependency on 'blast_datatypes' which defines the +'blastxml' file format. However, you will also probably need to configure +the Blast2GO property file(s), for example if you have a local Blast2GO +database (which we recommend for speed). + + +Manual Installation +=================== + +The main dependency is b2g4pipe which must be installed manually. Also we +strongly recommend installing a local Blast2GO database as well (see the +intructions below about the blast2go.loc file). At the time of writing, +the current version is b2g4pipe v2.5 which is available here: + +* http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip + +You can change the path by setting the B2G4PIPE environement variable to +the desired folder, but by default the script looks for the JAR file here:: + + /opt/b2g4pipe_v2.5/blast2go.jar + +To install the wrapper manually, first install 'blast_datatypes', then +copy or move the following files under the Galaxy tools folder, e.g. in a +tools/blast2go/ folder: + +* blast2go.xml (the Galaxy tool definition) +* blast2go.py (the Python wrapper script) +* massage_xml_for_blast2go.py (Python XML reformatting script) +* README.rst (this file) + +For a manual installation of the wrapper you will also need to modify the +tools_conf.xml file to tell Galaxy to offer the tool. We suggest putting +it next to the NCBI BLAST+ wrappers. Just add the line:: + + <tool file="blast2go/blast2go.xml" /> + +If you wish to run the unit tests, also add this to tools_conf.xml.sample +and move/copy the test-data files under Galaxy's test-data folder. Then:: + + $ ./run_functional_tests.sh -id blast2go + + +Configuration +============= + +As part of setting up b2g4pipe you will need to setup one or more Blast2GO +property files which tell the tool which database to use etc. The example +b2gPipe.properties provided with b2g4pipe is often out of date. The current +server IP address and database name may given on the Blast2GO website, or +can be found by running the latest GUI version via Java web-start, and +looking under the tools/options menu. These property files can be anywhere +accessable to the Galaxy Unix user, we put them with the JAR file etc. + +You must tell Galaxy about these Blast2GO property files so that they can be +offered to the user. Copy file blast2go.loc.sample to tool-data/blast2go.loc +under the Galaxy folder and edit this to match your installation. This must +be plain text, tab separated, with three columns: + +1. ID for the setup, e.g. Spain_2012_August +2. Description for the setup, e.g. Database in Spain (August 2012) +3. Properties filename for the setup, e.g. Spain_2012_August.properties + relative to the main JAR file, or with a full path + e.g. /opt/b2g4pipe/Spain_2012_August.properties + +Avoid including "Blast2GO" in the description (column 2) as this text will be +included in the automatically assigned output dataset name. The blast2go.loc +file allows you to customise the database setup. If for example you have a local +Blast2GO server running (which we recommend for speed), and you want this to be +the default setting, include it as the first line in your blast2go.loc file. + +Consult the Blast2GO documentation for details about the property files and +setting up a local MySQL Blast2GO database. + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.1 - Initial public release +v0.0.2 - Documentation clarifications, e.g. concatenated BLAST XML is allowed. + - Fixed error handler in wrapper script (for when b2g4pipe fails). + - Reformats the XML to use old NCBI-style concatenated BLAST XML since + b2g4pipe crashes with heap space error on with large files using + current NCBI output. +v0.0.3 - Include sample loc file, tool-data/blast2go.loc.sample +v0.0.4 - Include repository_dependencies.xml file for 'blastxml' format + (previously included in the core Galaxy installation) +v0.0.5 - Quote arguments in case of spaces in filenames (internal change) + - Last release supporting b2g4pipe v2.3.5 +v0.0.6 - Support for b2g4pipe v2.5 instead of v2.3.5 + + - Now invoked with a class path and es.blast2go.prog.B2GAnnotPipe + rather then simply calling the jar file + - Now uses the switch -annot instead of -a (this change breaks + support for b2g4pipe v2.3.5 unfortunately) + + - Catch a few error messages and treat them explicitly as errors. +v0.0.7 - Update output description in XML file (b2g4pipe v2.3.5 included + the sequence description, b2g4pipe v2.5 omits this). +v0.0.8 - Automated installation via the Galaxy Tool Shed. + - Added unit test. + - Explain how to load the tabular file into the Blast2GO GUI. + - Link to Tool Shed added to help text and this documentation. + - Switch to standard MIT licence. + - Use reStructuredText for this README file. + - Updated citation information (Cock et al. 2013). + - Development moved to GitHub, https://github.com/peterjc/galaxy_blast + - Split out massage_xml_for_blast2go.py as a standalone file. +======= ====================================================================== + + +Developers +========== + +This script and related tools were originally developed on the 'tools' branch +of the following BitBucket Mercurial repository: +https://bitbucket.org/peterjc/galaxy-central/ + +As of September 2013, development is continuing on a dedicated GitHub repository: +https://github.com/peterjc/galaxy_blast + +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball I use +the following command from the Galaxy root folder:: + + $ tar -czf blast2go.tar.gz blast2go/README.rst blast2go/blast2go.xml blast2go/blast2go.py blast2go/massage_xml_for_blast2go.py blast2go/repository_dependencies.xml blast2go/tool_dependencies.xml tool-data/blast2go.loc.sample test-data/blastp_sample.xml test-data/blastp_sample.blast2go.tabular + +Check this worked:: + + $ tar -tzf blast2go.tar.gz + blast2go/README.rst + blast2go/blast2go.xml + blast2go/blast2go.py + blast2go/massage_xml_for_blast2go.py + blast2go/repository_dependencies.xml + blast2go/tool_dependencies.xml + tool-data/blast2go.loc.sample + test-data/blastp_sample.xml + test-data/blastp_sample.blast2go.tabular + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. + + +NOTE: This is the licence for the Galaxy Wrapper only. Blast2GO and +associated data files are available and licenced separately.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/blast2go/blast2go.py Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,134 @@ +#!/usr/bin/env python +"""Galaxy wrapper for Blast2GO for pipelines, b2g4pipe v2.5. + +This script takes exactly three command line arguments: + * Input BLAST XML filename + * Blast2GO properties filename (settings file) + * Output tabular filename + +The properties filename can be a fully qualified path, but if not +this will look next to the blast2go.jar file. + +Sadly b2g4pipe (at least v2.3.5 to v2.5.0) cannot cope with current +style large BLAST XML files (e.g. from BLAST 2.2.25+), so we reformat +these to avoid it crashing with a Java heap space OutOfMemoryError. + +As part of this reformatting, we check for BLASTP or BLASTX output +(otherwise raise an error), and print the query count. + +It then calls the Java command line tool, and moves the output file to +the location Galaxy is expecting, and removes the tempory XML file. + +This script is called from my Galaxy wrapper for Blast2GO for pipelines, +available from the Galaxy Tool Shed here: +http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go + +This script is under version control here: +https://github.com/peterjc/galaxy_blast/tree/master/blast2go +""" +import sys +import os +import subprocess + +#You may need to edit this to match your local setup, +blast2go_dir = os.environ.get("B2G4PIPE", "/opt/b2g4pipe_v2.5/") +blast2go_jar = os.path.join(blast2go_dir, "blast2go.jar") + +def stop_err(msg, error_level=1): + """Print error message to stdout and quit with given error level.""" + sys.stderr.write("%s\n" % msg) + sys.exit(error_level) + +try: + from massage_xml_for_blast2go import prepare_xml +except ImportError: + stop_err("Missing sister file massage_xml_for_blast2go.py") + +if len(sys.argv) != 4: + stop_err("Require three arguments: XML filename, properties filename, output tabular filename") + +xml_file, prop_file, tabular_file = sys.argv[1:] + +#We should have write access here: +tmp_xml_file = tabular_file + ".tmp.xml" + +if not os.path.isfile(blast2go_jar): + stop_err("Blast2GO JAR file not found: %s" % blast2go_jar) + +if not os.path.isfile(xml_file): + stop_err("Input BLAST XML file not found: %s" % xml_file) + +if not os.path.isfile(prop_file): + tmp = os.path.join(os.path.split(blast2go_jar)[0], prop_file) + if os.path.isfile(tmp): + #The properties file seems to have been given relative to the JAR + prop_file = tmp + else: + stop_err("Blast2GO configuration file not found: %s" % prop_file) + del tmp + + +def run(cmd): + #Avoid using shell=True when we call subprocess to ensure if the Python + #script is killed, so too is the child process. + try: + child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + except Exception, err: + stop_err("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err)) + #Use .communicate as can get deadlocks with .wait(), + stdout, stderr = child.communicate() + return_code = child.returncode + + #keep stdout minimal as shown prominently in Galaxy + #Record it in case a silent error needs diagnosis + if stdout: + sys.stderr.write("Standard out:\n%s\n\n" % stdout) + if stderr: + sys.stderr.write("Standard error:\n%s\n\n" % stderr) + + error_msg = None + if return_code: + cmd_str = " ".join(cmd) + error_msg = "Return code %i from command:\n%s" % (return_code, cmd_str) + elif "Database or network connection (timeout) error" in stdout+stderr: + error_msg = "Database or network connection (timeout) error" + elif "Annotation of 0 seqs with 0 annots finished." in stdout+stderr: + error_msg = "No sequences processed!" + + if error_msg: + print error_msg + stop_err(error_msg) + + +blast2go_classpath = os.path.split(blast2go_jar)[0] +assert os.path.isdir(blast2go_classpath) +blast2go_classpath = "%s/*:%s/ext/*:" % (blast2go_classpath, blast2go_classpath) + +prepare_xml(xml_file, tmp_xml_file) +#print "XML file prepared for Blast2GO" + +#We will have write access wherever the output should be, +#so we'll ask Blast2GO to use that as the stem for its output +#(it will append .annot to the filename) +cmd = ["java", "-cp", blast2go_classpath, "es.blast2go.prog.B2GAnnotPipe", + "-in", tmp_xml_file, + "-prop", prop_file, + "-out", tabular_file, #Used as base name for output files + "-annot", # Generate *.annot tabular file + #NOTE: For v2.3.5 must use -a, for v2.5 must use -annot instead + #"-img", # Generate images, feature not in v2.3.5 + ] +#print " ".join(cmd) +run(cmd) + +#Remove the temp XML file +os.remove(tmp_xml_file) + +out_file = tabular_file + ".annot" +if not os.path.isfile(out_file): + stop_err("ERROR - No output annotation file from Blast2GO") + +#Move the output file where Galaxy expects it to be: +os.rename(out_file, tabular_file) + +print "Done"
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/blast2go/blast2go.xml Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,118 @@ +<tool id="blast2go" name="Blast2GO" version="0.0.8"> + <description>Maps BLAST results to GO annotation terms</description> + <requirements> + <requirement type="package" version="2.5">b2g4pipe</requirement> + </requirements> + <command interpreter="python"> + blast2go.py "${xml}" "${prop.fields.path}" "${tab}" + </command> + <stdio> + <!-- Wrapper ensures anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <inputs> + <param name="xml" type="data" format="blastxml" label="BLAST XML results" description="You must have run BLAST against a protein database such as the NCBI non-redundant (NR) database. Use BLASTX for nucleotide queries, BLASTP for protein queries." /> + <param name="prop" type="select" label="Blast2GO settings" description="One or more configurations can be setup, such as using the Blast2GO team's server in Spain, or a local database."> + <options from_file="blast2go.loc"> + <column name="value" index="0"/> + <column name="name" index="1"/> + <column name="path" index="2"/> + </options> + </param> + </inputs> + <outputs> + <data name="tab" format="tabular" label="Blast2GO ${prop.fields.name}" /> + </outputs> + <tests> + <test> + <param name="xml" value="blastp_sample.xml" ftype="blastxml"/> + <param name="prop" value="Spain_2011_June"/> + <output name="tab" file="blastp_sample.blast2go.tabular" ftype="tabular"/> + </test> + </tests> + <help> +.. class:: warningmark + +**Note**. Blast2GO may take a substantial amount of time, especially if +running against the public server in Spain. For large input datasets it +is advisable to allow overnight processing, or consider subdividing. + +----- + +**What it does** + +This runs b2g4Pipe v2.5, which is the command line (no GUI) version of +Blast2GO designed for use in pipelines. + +It takes as input BLAST XML results against a protein database, typically +the NCBI non-redundant (NR) database. This tool will accept concatenated +BLAST XML files (although they are technically invalid XML), which is very +useful if you have sub-divided your protein FASTA files and run BLAST on +them in batches. + +The BLAST matches are used to assign Gene Ontology (GO) annotation terms +to each query sequence. + +The output from this tool is a tabular file containing three columns, with +the order taken from query order in the original BLAST XML file: + +====== ==================== +Column Description +------ -------------------- + 1 ID of query sequence + 2 GO term + 3 GO description +====== ==================== + +Note that if no GO terms are assigned to a sequence (e.g. if it had no +BLAST matches), then it will not be present in the output file. + +This tabular file is called an "Annotation File" in the Blast2GO GUI. +If you download the tabular file, and rename it to use the extension +".annot", then it can be opened with the Blast2GO GUI via the "File", +"Load Annotation (.annot)" menu (keyboard shortcut ALT+L). You can +then run some of the interactive analyses offered in the GUI tool. + + +**Advanced Settings** + +Blast2GO has a properties setting file which includes which database +server to connect to (e.g. the public server in Valencia, Spain, or a +local server), as well as more advanced options such as thresholds and +evidence code weights. To change these settings, your Galaxy administrator +must create a new properties file, and add it to the drop down menu above. + + +**References** + +If you use this Galaxy tool in work leading to a scientific publication please +cite the following papers: + +Peter Cock, Bjoern Gruening, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +S. Götz et al. (2008). +High-throughput functional annotation and data mining with the Blast2GO suite. +Nucleic Acids Res. 36(10):3420–3435. +http://dx.doi.org/10.1093/nar/gkn176 + +A. Conesa and S. Götz (2008). +Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. +International Journal of Plant Genomics. 619832. +http://dx.doi.org/10.1155/2008/619832 + +A. Conesa et al. (2005). +Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. +Bioinformatics 21:3674-3676. +http://dx.doi.org/10.1093/bioinformatics/bti610 + +See also http://www.blast2go.com/ + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/blast2go/massage_xml_for_blast2go.py Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,92 @@ +#!/usr/bin/env python +"""Script for reformatting Blast XML to suite Blast2GO. + +This script takes exactly two command line arguments: + * Input BLAST XML filename + * Output BLAST XML filename + +Sadly b2g4pipe (at least v2.3.5 to v2.5.0) cannot cope with current +style large BLAST XML files (e.g. from BLAST 2.2.25+), so we reformat +these to avoid it crashing with a Java heap space OutOfMemoryError. + +As part of this reformatting, we check for BLASTP or BLASTX output +(otherwise raise an error), and print the query count. + +This script is called from my Galaxy wrapper for Blast2GO for pipelines, +available from the Galaxy Tool Shed here: +http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go + +This script is under version control here: +https://github.com/peterjc/galaxy_blast/tree/master/blast2go +""" +import sys +import os +import subprocess + +def stop_err(msg, error_level=1): + """Print error message to stdout and quit with given error level.""" + sys.stderr.write("%s\n" % msg) + sys.exit(error_level) + +def prepare_xml(original_xml, mangled_xml): + """Reformat BLAST XML to suit Blast2GO. + + Blast2GO can't cope with 1000s of <Iteration> tags within a + single <BlastResult> tag, so instead split this into one + full XML record per interation (i.e. per query). This gives + a concatenated XML file mimicing old versions of BLAST. + + This also checks for BLASTP or BLASTX output, and outputs + the number of queries. Galaxy will show this as "info". + """ + in_handle = open(original_xml) + footer = " </BlastOutput_iterations>\n</BlastOutput>\n" + header = "" + while True: + line = in_handle.readline() + if not line: + #No hits? + stop_err("Problem with XML file?") + if line.strip() == "<Iteration>": + break + header += line + + if "<BlastOutput_program>blastx</BlastOutput_program>" in header: + print "BLASTX output identified" + elif "<BlastOutput_program>blastp</BlastOutput_program>" in header: + print "BLASTP output identified" + else: + in_handle.close() + stop_err("Expect BLASTP or BLASTX output") + + out_handle = open(mangled_xml, "w") + out_handle.write(header) + out_handle.write(line) + count = 1 + while True: + line = in_handle.readline() + if not line: + break + elif line.strip() == "<Iteration>": + #Insert footer/header + out_handle.write(footer) + out_handle.write(header) + count += 1 + out_handle.write(line) + + out_handle.close() + in_handle.close() + print "Input has %i queries" % count + + +if __name__ == "__main__": + # Run the conversion... + if len(sys.argv) != 3: + stop_err("Require two arguments: XML input filename, XML output filename") + + xml_file, out_xml_file = sys.argv[1:] + + if not os.path.isfile(xml_file): + stop_err("Input BLAST XML file not found: %s" % xml_file) + + prepare_xml(xml_file, out_xml_file)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/blast2go/repository_dependencies.xml Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,4 @@ +<?xml version="1.0"?> +<repositories description="Requires BLAST XML and database datatype definitions."> +<repository changeset_revision="a44a7a5456e1" name="blast_datatypes" owner="devteam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> +</repositories>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/blast2go/tool_dependencies.xml Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,32 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="b2g4pipe" version="2.5"> + <install version="1.0"> + <actions> + <!-- If used, download_by_url must be the first action --> + <!-- The ZIP file decompresses to give a folder b2g4pipe --> + <action type="download_by_url">http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip</action> + <!-- Galaxy moves into the unzipped folder b2g4pipe --> + <action type="shell_command"> +cp b2gPipe.properties Spain_2012_August.properties && +sed -i "s/Dbacces.dbname=b2g_apr12/Dbacces.dbname=b2g_aug12/g" Spain_2012_August.properties && +sed -i "s/Dbacces.dbhost=10.10.100.203/Dbacces.dbhost=publicdb.blast2go.com/g" Spain_2012_August.properties + </action> + <action type="shell_command"> +cp b2gPipe.properties Spain_2011_June.properties && +sed -i "s/Dbacces.dbname=b2g_apr12/Dbacces.dbname=b2g_jun11/g" Spain_2011_June.properties && +sed -i "s/Dbacces.dbhost=10.10.100.203/Dbacces.dbhost=publicdb.blast2go.com/g" Spain_2011_June.properties + </action> + <action type="move_directory_files"><source_directory>.</source_directory><destination_directory>$INSTALL_DIR/</destination_directory></action> + <!-- Set environment variable $B2G4PIPE so Python script knows where to look --> + <action type="set_environment"> + <environment_variable name="B2G4PIPE" action="set_to">$INSTALL_DIR</environment_variable> + </action> + </actions> + </install> + <readme> +Downloads b2g4pipe v2.5 + </readme> + </package> +</tool_dependency> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/blastp_sample.blast2go.tabular Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,1 @@ +Sample GO:0005488 tail tape measure protein
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/blastp_sample.xml Mon Sep 23 05:50:58 2013 -0400 @@ -0,0 +1,293 @@ +<?xml version="1.0"?> +<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd"> +<BlastOutput> + <BlastOutput_program>blastp</BlastOutput_program> + <BlastOutput_version>BLASTP 2.2.24+</BlastOutput_version> + <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> + <BlastOutput_db>nr</BlastOutput_db> + <BlastOutput_query-ID>Query_1</BlastOutput_query-ID> + <BlastOutput_query-def>Sample</BlastOutput_query-def> + <BlastOutput_query-len>516</BlastOutput_query-len> + <BlastOutput_param> + <Parameters> + <Parameters_matrix>BLOSUM62</Parameters_matrix> + <Parameters_expect>1e-30</Parameters_expect> + <Parameters_gap-open>11</Parameters_gap-open> + <Parameters_gap-extend>1</Parameters_gap-extend> + <Parameters_filter>F</Parameters_filter> + </Parameters> + </BlastOutput_param> + <BlastOutput_iterations> + <Iteration> + <Iteration_iter-num>1</Iteration_iter-num> + <Iteration_query-ID>Query_1</Iteration_query-ID> + <Iteration_query-def>Sample</Iteration_query-def> + <Iteration_query-len>516</Iteration_query-len> + <Iteration_hits> + <Hit> + <Hit_num>1</Hit_num> + <Hit_id>gi|119953746|ref|YP_950551.1|</Hit_id> + <Hit_def>tail tape measure protein [Streptococcus phage SMP] >gi|118430558|gb|ABK91882.1| tail tape measure protein [Streptococcus suis phage SMP]</Hit_def> + <Hit_accession>YP_950551</Hit_accession> + <Hit_len>659</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>949.117592429394</Hsp_bit-score> + <Hsp_score>2452</Hsp_score> + <Hsp_evalue>0</Hsp_evalue> + <Hsp_query-from>1</Hsp_query-from> + <Hsp_query-to>516</Hsp_query-to> + <Hsp_hit-from>27</Hsp_hit-from> + <Hsp_hit-to>542</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>500</Hsp_identity> + <Hsp_positive>500</Hsp_positive> + <Hsp_gaps>0</Hsp_gaps> + <Hsp_align-len>516</Hsp_align-len> + <Hsp_qseq>FHLLNSGGSALSVMFAKLVGIIAGISAPIWXXXXXXXXXXXXXXXXYNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS</Hsp_qseq> + <Hsp_hseq>FHLLNSGGSALSVMFAKLVGIIAGISAPIWAVIGVIAALVAGFVLLYNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS</Hsp_hseq> + <Hsp_midline>FHLLNSGGSALSVMFAKLVGIIAGISAPIW YNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>2</Hit_num> + <Hit_id>gi|148986157|ref|ZP_01819143.1|</Hit_id> + <Hit_def>unknown phage protein [Streptococcus pneumoniae SP3-BS71] >gi|147921871|gb|EDK72998.1| unknown phage protein [Streptococcus pneumoniae SP3-BS71]</Hit_def> + <Hit_accession>ZP_01819143</Hit_accession> + <Hit_len>1031</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>174.481245259597</Hsp_bit-score> + <Hsp_score>441</Hsp_score> + <Hsp_evalue>1.54640812741294e-41</Hsp_evalue> + <Hsp_query-from>49</Hsp_query-from> + <Hsp_query-to>300</Hsp_query-to> + <Hsp_hit-from>679</Hsp_hit-from> + <Hsp_hit-to>897</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>104</Hsp_identity> + <Hsp_positive>148</Hsp_positive> + <Hsp_gaps>33</Hsp_gaps> + <Hsp_align-len>252</Hsp_align-len> + <Hsp_qseq>TNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWS</Hsp_qseq> + <Hsp_hseq>TNEGFRDAVTTVWNAILEVINAVVSEISNFVMSIFGTVVTWWTENQELIRTSAETVWNAIYTVISTILDILGPLLQAGWDNIQLIITTTWEIIKIVVETAINVVLGVIQAVMQIITGDWSGAWETIKGVFSTVWQAIQSIVQT-------IFSAIQSYISNILNGISGT----VSNIWNSIKDTVSN----------------------VLNAISSTVSSVWEGIKSTISSAINGARDAVSSAIEAIKGLFN</Hsp_hseq> + <Hsp_midline>TNE FR V W AI I+ V + +FVM ++G +V WW ENQELIR +AETVWNAI TV+ T++ L P++Q WD I ++TT +IK VV+T + VVLG+I+AVMQ+I GDWSGAWET+KGV T+W+ I+S+VQ IF +++ +I + + GT V IW+ IK TVSN V NAIS+ S++W I +T+ S + + + +E IK +++</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>3</Hit_num> + <Hit_id>gi|77411259|ref|ZP_00787609.1|</Hit_id> + <Hit_def>tail tape meausure protein [Streptococcus agalactiae CJB111] >gi|77162685|gb|EAO73646.1| tail tape meausure protein [Streptococcus agalactiae CJB111]</Hit_def> + <Hit_accession>ZP_00787609</Hit_accession> + <Hit_len>1039</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>165.621655013498</Hsp_bit-score> + <Hsp_score>418</Hsp_score> + <Hsp_evalue>7.61538823982138e-39</Hsp_evalue> + <Hsp_query-from>50</Hsp_query-from> + <Hsp_query-to>310</Hsp_query-to> + <Hsp_hit-from>655</Hsp_hit-from> + <Hsp_hit-to>904</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>107</Hsp_identity> + <Hsp_positive>158</Hsp_positive> + <Hsp_gaps>11</Hsp_gaps> + <Hsp_align-len>261</Hsp_align-len> + <Hsp_qseq>NEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVF</Hsp_qseq> + <Hsp_hseq>HEGFRTAVTEIWNAIYAFLSVIIQQISSFVMSIWGTLTTWWTENQQLILNAANTVWTAISTVIQTIMTILGPYLQASWENIKLIITTAWDIIKVVVETAINVVLGIIKAVMQIITGDWSGAWETIKQVVSTVWEAIKSLISIVLSAIAQ-------FISNSWNGIKGTMTNLL----NSIKSVVSNVWNSIKSTISSILSSIGSTVSSVWNGMKATISGVLSGISNTVSSVWNGVKSTITNAINGAKNAVSSAINAIKNLF</Hsp_hseq> + <Hsp_midline>+E FRT V W AI + +S ++ + SFVM +WG + WW ENQ+LI A TVW AI TV++T+MT L P +Q +W+ I ++TT ++IK VV+T + VVLGIIKAVMQ+I GDWSGAWET+K V T+WE IKSL+ + + + Q F+ + W+ + GT+ ++ + IK+ VSN ++ I +I++SI +T +VWN + S + + IS TV SV + I + K S+A IK +F</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>4</Hit_num> + <Hit_id>gi|76786754|ref|YP_329383.1|</Hit_id> + <Hit_def>prophage LambdaSa04, tail tape measure protein, TP901 family [Streptococcus agalactiae A909] >gi|76561811|gb|ABA44395.1| prophage LambdaSa04, tail tape measure protein, TP901 family [Streptococcus agalactiae A909]</Hit_def> + <Hit_accession>YP_329383</Hit_accession> + <Hit_len>1039</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>159.073262222903</Hsp_bit-score> + <Hsp_score>401</Hsp_score> + <Hsp_evalue>6.55719737745379e-37</Hsp_evalue> + <Hsp_query-from>50</Hsp_query-from> + <Hsp_query-to>310</Hsp_query-to> + <Hsp_hit-from>655</Hsp_hit-from> + <Hsp_hit-to>904</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>103</Hsp_identity> + <Hsp_positive>156</Hsp_positive> + <Hsp_gaps>11</Hsp_gaps> + <Hsp_align-len>261</Hsp_align-len> + <Hsp_qseq>NEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVF</Hsp_qseq> + <Hsp_hseq>HEGFRTAVTEIWNAIYAFLTVIIQQISSFVMSIWGTLITWWTENQQLILNATNTVWTAISTVIQTIMTILAPYLQASWENIKLIITTAWDIIKVVVETAINVVLGIIKAVMQIITGDWSGAWETIKQVVSTVWEVIKSLISIVLSAIAQ-------FISNSWNGIKGTMTNLL----NSIKGVVSNVWNGIKSTISSILSSIGSTVSSIWNGMKATISGVLSGISSTVSFVWNGVKSTITNAINGAKNAVSSAINAIKNLF</Hsp_hseq> + <Hsp_midline>+E FRT V W AI + ++ ++ + SFVM +WG ++ WW ENQ+LI TVW AI TV++T+MT L P +Q +W+ I ++TT ++IK VV+T + VVLGIIKAVMQ+I GDWSGAWET+K V T+WE IKSL+ + + + Q F+ + W+ + GT+ ++ + IK VSN + I +I++SI +T ++WN + S + + IS+TV V + I + K S+A IK +F</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>5</Hit_num> + <Hit_id>gi|153811333|ref|ZP_01964001.1|</Hit_id> + <Hit_def>hypothetical protein RUMOBE_01725 [Ruminococcus obeum ATCC 29174] >gi|149832460|gb|EDM87544.1| hypothetical protein RUMOBE_01725 [Ruminococcus obeum ATCC 29174]</Hit_def> + <Hit_accession>ZP_01964001</Hit_accession> + <Hit_len>1228</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>157.147264343316</Hsp_bit-score> + <Hsp_score>396</Hsp_score> + <Hsp_evalue>2.33083876931167e-36</Hsp_evalue> + <Hsp_query-from>3</Hsp_query-from> + <Hsp_query-to>516</Hsp_query-to> + <Hsp_hit-from>573</Hsp_hit-from> + <Hsp_hit-to>1059</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>167</Hsp_identity> + <Hsp_positive>247</Hsp_positive> + <Hsp_gaps>113</Hsp_gaps> + <Hsp_align-len>557</Hsp_align-len> + <Hsp_qseq>LLNSGGSALSVMFAKLVGIIAGISAPIWXXXXXXXXXXXXXXXXYNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQV---AIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLI-----------KQAISNAWEIIKTKT-----------------------SEIWNAITTFLSGIWEGIKTAASTAWEWIKTT-ISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIEN-IKSTVSNGWNNL---VSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLI-NGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS</Hsp_qseq> + <Hsp_hseq>LVKAGG--FSGVFTKALGLI---TSPAAIVVGVIAAITAVIIHLWNTNEDFRNTITAIWQKIKDAFTT---------------FAAGISERLSALGITFSDVTSAIKTIWDGFCNLLAPVLEAAFSTIAIALQTAFNVI-----------LGIWDVFSAVFSGDWSGAWEAIKGIFSSIWDGLKEYFSTIIGAVKGVADVF---LGWFGTNWETVWNGVKTFFEGIW--------NGISSFFEGI--------------WNGISTFCTTVWNGIVTNVTAFCTTVHDTISTIFNAVKDVVSNVWETIKNVVQVAIMFIVEVVKAAFELITVPFRFIWENCRDTIISVWETIKSAVQTAINFVKDNIITPVMNAISATITTVWNAIQTTFTTVINAIKSAVQTAWNFMKDNVVTPVMNAISTTISTVWNTIKTTFTTVINAIKSAVQTAWNFMKNSVITPVMNGIKTVITTVWNAIKTAVQTVVNA---IKTTVQTVF-NAVKTTVTTIWNAIKTGTSTAWN----AVKTAVTTPINAAKSAVTSAIN------GIKS</Hsp_hseq> + <Hsp_midline>L+ +GG S +F K +G+I ++P +NTNE+FR + A W+ IK A +T A +E + T V +AI+T+ + L P+++ A+ I + T NVI LGI + +GDWSGAWE +KG+ +IW+G+K A+ G+ +F L + + W+TVW + IW N I++ +E I WN IST + +W I T V + TT+ I T +K V S WE IK V ++ IV +V F+LI + I + WE IK+ + +WNAI T + + IK+A TAW ++K ++ VM I + I T WN IKT+ + +N IKSA + AWN +K+++ T + N IK+ ++ WN + V TV NA I + V+T F NAV I NAI G N VK A I+A AV+ AI+ GIKS</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>6</Hit_num> + <Hit_id>gi|56962696|ref|YP_174422.1|</Hit_id> + <Hit_def>hypothetical protein ABC0922 [Bacillus clausii KSM-K16] >gi|56908934|dbj|BAD63461.1| phage-related protein [Bacillus clausii KSM-K16]</Hit_def> + <Hit_accession>YP_174422</Hit_accession> + <Hit_len>593</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>146.746875793547</Hsp_bit-score> + <Hsp_score>369</Hsp_score> + <Hsp_evalue>3.12404663750498e-33</Hsp_evalue> + <Hsp_query-from>48</Hsp_query-from> + <Hsp_query-to>433</Hsp_query-to> + <Hsp_hit-from>123</Hsp_hit-from> + <Hsp_hit-to>465</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>112</Hsp_identity> + <Hsp_positive>187</Hsp_positive> + <Hsp_gaps>49</Hsp_gaps> + <Hsp_align-len>389</Hsp_align-len> + <Hsp_qseq>NTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGL---VQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSN</Hsp_qseq> + <Hsp_hseq>QTNETFRNGVIQAWEAIKTTMETVVATIVTFVSEKLAQIKAFWDEHGAAVMQAVTNIFNGIKSIIEPVMNGILAIMQFVWPFIVSLIQMVWGNIQGVISGALNIIMGLVKAFAGLFTGDFS-----------LMWEGIKQLFSGALEAIWNVVQLLLFGR--LLKIASSLFTGLMGVFSKMWGAISNLFLTALNGIRSFFSTIFTPIQ-------NVVMTVMGFIRNAISTG----LTTASNVVQTVLTAIRTVFLTVFNAVRNV-----------VTTAISFVQNFISTGISAARTAVTSALNAIKTTFTTIFNAVRSSVTTAMTNIKTAISN-------GIQSAWQ----AVLNFVGRFREAGKNIVNSIAEGITSAIGAVKNAISN</Hsp_hseq> + <Hsp_midline> TNE FR V AWEAIK+ + T V +V+FV + Q+ A+W+E+ + Q ++N I++++E VM ++ I+Q W I++++ V I+ V+ L +++G++KA + GD+S +WEGIK L A++ + VQ+ G L I +++ +M V +W I A+ + I T IQ N + TV I AIST LTT +QT L I+TV+ + ++ V VT ++ IS +T + NAI T + I+ ++++ +TA IKT ISN I++AW ++ N + + A +N N+I I++AI +K+ +SN</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>7</Hit_num> + <Hit_id>gi|50914476|ref|YP_060448.1|</Hit_id> + <Hit_def>unknown phage protein [Streptococcus pyogenes MGAS10394] >gi|40218580|gb|AAR83234.1| prophage pi2 protein [Streptococcus pyogenes] >gi|50261625|gb|AAT72393.1| unknown [Streptococcus pyogenes] >gi|50903550|gb|AAT87265.1| unknown phage protein [Streptococcus pyogenes MGAS10394]</Hit_def> + <Hit_accession>YP_060448</Hit_accession> + <Hit_len>1039</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>146.36167621763</Hsp_bit-score> + <Hsp_score>368</Hsp_score> + <Hsp_evalue>4.74132513340056e-33</Hsp_evalue> + <Hsp_query-from>50</Hsp_query-from> + <Hsp_query-to>227</Hsp_query-to> + <Hsp_hit-from>655</Hsp_hit-from> + <Hsp_hit-to>832</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>78</Hsp_identity> + <Hsp_positive>112</Hsp_positive> + <Hsp_gaps>0</Hsp_gaps> + <Hsp_align-len>178</Hsp_align-len> + <Hsp_qseq>NEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWD</Hsp_qseq> + <Hsp_hseq>NEGFRTAVIEIWNAIYAFISVIIQEISTFIMTIWGTLTTWWTENQALIQAAVETVWNAISTVIQTVMSLIGPYLEAAWANIQLIITTAWEIIKTVVETAITVVLGIIKAIMQAITGDWSGAWETIKGVLQRVWQAIQQIVTTILSAIGQFISNTWNGIKNTFSNILSAISGIVSSIWN</Hsp_hseq> + <Hsp_midline>NE FRT V W AI + IS ++ + +F+M +WG + WW ENQ LI+ ETVWNAI TV++TVM+ + P ++ AW I ++TT +IKTVV+T + VVLGIIKA+MQ I GDWSGAWET+KGV +W+ I+ +V + + Q +K+ + + I +V IW+</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>8</Hit_num> + <Hit_id>gi|29374987|ref|NP_814140.1|</Hit_id> + <Hit_def>tail protein [Enterococcus faecalis V583] >gi|29342445|gb|AAO80211.1| tail protein [Enterococcus faecalis V583]</Hit_def> + <Hit_accession>NP_814140</Hit_accession> + <Hit_len>1049</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>139.0428842752</Hsp_bit-score> + <Hsp_score>349</Hsp_score> + <Hsp_evalue>6.84844401007043e-31</Hsp_evalue> + <Hsp_query-from>73</Hsp_query-from> + <Hsp_query-to>482</Hsp_query-to> + <Hsp_hit-from>545</Hsp_hit-from> + <Hsp_hit-to>920</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>110</Hsp_identity> + <Hsp_positive>196</Hsp_positive> + <Hsp_gaps>78</Hsp_gaps> + <Hsp_align-len>432</Hsp_align-len> + <Hsp_qseq>EAVVSFVMDLWGQMVAWWNENQELIRQ-------TAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVL----NVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSIS-----------NALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGF</Hsp_qseq> + <Hsp_hseq>DSIVKTASGLKGSLVKTWNDITAKVSEIWKKFTDAGKKTFDGFKKTVENVFNGIKNFLQTVWNVIYAVVGAIIVNTINIWKGIFDG--------FKAYFQYL-------WDLIKAIATGVWEKIGDTVTGIINGFIGVIKGIFDAFKTFFQQIWDAVVYSVTIAWNGIKNTVTSVSTAIKNFVTPIFNAIKTTITNVFNAIKNTATNVWNAIKTTISNVVQTILNF---------------------------------VTPIFNTMKNTITNIFNAIRNTASSVWNSIKTTISNIVTSVKNTVINIFNALKNSITNIFNAIRNTASTVWNSIKSTVSNIVSATVNTVKNLFNGMKNTVSSIWDGVRNTISNVVNAVKNTISNVWGGITGTVSN----IFNGVKNAIDGPMNAAKNLVKNVV----DAIKGF</Hsp_hseq> + <Hsp_midline>+++V L G +V WN+ + + + ++ + VE V + +QT W++I AVV ++ N+ K + D KA Q + W+ +K +A +WE I V I+G + + + K+ + +W ++ V W+ IK TV++ TA+ + I +I+TT V+NAI A+N+W AI TT+ +V+ TI + VT F+ +K I+N + I+ S +WN+I T +S I +K + +K +I+N+ I++ T WN+IK+++S N N +K+ + W+ +++ IS + +K+T+SN W + TV+N I + V+ D +NAA+N + N + D I GF</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + <Hit> + <Hit_num>9</Hit_num> + <Hit_id>gi|163941333|ref|YP_001646217.1|</Hit_id> + <Hit_def>prophage LambdaBa01, membrane protein, putative [Bacillus weihenstephanensis KBAB4] >gi|163863530|gb|ABY44589.1| prophage LambdaBa01, membrane protein, putative [Bacillus weihenstephanensis KBAB4]</Hit_def> + <Hit_accession>YP_001646217</Hit_accession> + <Hit_len>725</Hit_len> + <Hit_hsps> + <Hsp> + <Hsp_num>1</Hsp_num> + <Hsp_bit-score>138.657684699283</Hsp_bit-score> + <Hsp_score>348</Hsp_score> + <Hsp_evalue>8.15996781441799e-31</Hsp_evalue> + <Hsp_query-from>61</Hsp_query-from> + <Hsp_query-to>480</Hsp_query-to> + <Hsp_hit-from>142</Hsp_hit-from> + <Hsp_hit-to>560</Hsp_hit-to> + <Hsp_query-frame>0</Hsp_query-frame> + <Hsp_hit-frame>0</Hsp_hit-frame> + <Hsp_identity>118</Hsp_identity> + <Hsp_positive>203</Hsp_positive> + <Hsp_gaps>29</Hsp_gaps> + <Hsp_align-len>434</Hsp_align-len> + <Hsp_qseq>WEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIK---AVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKT----VWS-------AAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLIN</Hsp_qseq> + <Hsp_hseq>WDAIKQWTIDAWNAIGEFLVGIWDGIVQWASEAWNSISESTSAVWNSIKEFLIGIWNGIVEFVVT-WGT--AILETYVGIWTSIFNFCMEIWNGIVEYLTSVLQGIATFFTEIWTSISTFFQEIWNGLVAFITPVLQGIADFFAM-----------IWNGISTVIQTVWNFITQYLQAIWTAILYFATPLFESIKNFISECWNKISSTTSLVWETIKNFLVSCWNGLVSFVTPIFEKIKSWIISVWDTISSATMAVWNAVKNFLQACWNGLVSIVTPIFDAIKNWIVNVWNAISSTTSAVWNAIKSYLSSLWNSIVSTASSIFNSIKSAISTVWNMISSASSSVWNGIKSTLSSIWNGIKSTASSVWNGLKDAIMTPVRWVTSAVSGAFNGMKSAVLGVWDGIKSGIRTAINGIIRIINKFI-DGFNTPAELLN</Hsp_hseq> + <Hsp_midline>W+AIK A A+ F++ +W +V W +E I ++ VWN+I+ + + ++ V T W A++ T + + ++ + +++ GI++ +V+Q I ++ W ++ IW G+ + + + G+ F +W I V+ +W++I + TA+ + SI+ WN IS+ S +W I ++S + ++ E IK+ VW A W +K A +V +VT FD IK I N W I + TS +WNAI ++LS +W I + AS+ + IK+ IS V I S + WN IK+++S+ N IKS A + WN +K AI T + + S VS +N + S V I S +RT + + FI + + +L+N</Hsp_midline> + </Hsp> + </Hit_hsps> + </Hit> + </Iteration_hits> + <Iteration_stat> + <Statistics> + <Statistics_db-num>6589360</Statistics_db-num> + <Statistics_db-len>-2041834015</Statistics_db-len> + <Statistics_hsp-len>0</Statistics_hsp-len> + <Statistics_eff-space>504129014857</Statistics_eff-space> + <Statistics_kappa>0.041</Statistics_kappa> + <Statistics_lambda>0.267</Statistics_lambda> + <Statistics_entropy>0.14</Statistics_entropy> + </Statistics> + </Iteration_stat> + </Iteration> + </BlastOutput_iterations> +</BlastOutput>
--- a/tool-data/blast2go.loc.sample Fri Feb 22 08:47:27 2013 -0500 +++ b/tool-data/blast2go.loc.sample Mon Sep 23 05:50:58 2013 -0400 @@ -6,19 +6,22 @@ # Column 3 - Filename, Galaxy will use this when calling the tool # # Probably the most important setting in the properties file is the -# Blast2GO database to use. Currently b2g4pipe v2.3.5 ships with an -# old configuration so consult http://blast2go.org for the latest -# public database they host in Spain. We also strongly recommend +# Blast2GO database to use. Currently b2g4pipe v2.5 ships with an +# old configuration so consult http://www.blast2go.com for the latest +# public database they host in Spain (or find this by running the GUI +# version of Blast2GO via Java Web Start under the menu entry "Tools", +# "General Settings", "DataAccess setting"). We also strongly recommend # configuring a local Blast2GO database. # -# The property filenames can be fullied qualified paths like +# The property filenames can be fully qualified paths like # /opt/b2g4pipe/Spain_2012_August.properties or provided they are # in the same folder as the Blast2GO JAR file, just the filename # like Spain_2012_August.properties instead. This is intended to -# make migrating between versions of Blast2GO easier (as the +# make migrating between future versions of Blast2GO easier (as the # property files change between versions), and simpler overall. # -Local_2011_May Local database (May 2011) Local_2011_May.properties -Spain_2010_May Database in Spain (May 2010) Spain_2010_May.properties +#Local_2011_May Local database (May 2011) Local_2011_May.properties +#Spain_2010_May Database in Spain (May 2010) Spain_2010_May.properties +Spain_2012_August Database in Spain (August 2012) Spain_2012_August.properties Spain_2011_June Database in Spain (June 2011) Spain_2011_June.properties -Spain_2012_August Database in Spain (August 2012) Spain_2012_August.properties +#default Default settings b2gPipe.properties
--- a/tools/blast2go/blast2go.py Fri Feb 22 08:47:27 2013 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,173 +0,0 @@ -#!/usr/bin/env python -"""Galaxy wrapper for Blast2GO for pipelines, b2g4pipe v2.5. - -This script takes exactly three command line arguments: - * Input BLAST XML filename - * Blast2GO properties filename (settings file) - * Output tabular filename - -The properties filename can be a fully qualified path, but if not -this will look next to the blast2go.jar file. - -Sadly b2g4pipe (at least v2.3.5 to v2.5.0) cannot cope with current -style large BLAST XML files (e.g. from BLAST 2.2.25+), so we reformat -these to avoid it crashing with a Java heap space OutOfMemoryError. - -As part of this reformatting, we check for BLASTP or BLASTX output -(otherwise raise an error), and print the query count. - -It then calls the Java command line tool, and moves the output file to -the location Galaxy is expecting, and removes the tempory XML file. -""" -import sys -import os -import subprocess - -#You may need to edit this to match your local setup, -#blast2go_jar = "/opt/b2g4pipe/blast2go.jar" -blast2go_jar = "/opt/b2g4pipe_v2.5/blast2go.jar" - - -def stop_err(msg, error_level=1): - """Print error message to stdout and quit with given error level.""" - sys.stderr.write("%s\n" % msg) - sys.exit(error_level) - -if len(sys.argv) != 4: - stop_err("Require three arguments: XML filename, properties filename, output tabular filename") - -xml_file, prop_file, tabular_file = sys.argv[1:] - -#We should have write access here: -tmp_xml_file = tabular_file + ".tmp.xml" - -if not os.path.isfile(blast2go_jar): - stop_err("Blast2GO JAR file not found: %s" % blast2go_jar) - -if not os.path.isfile(xml_file): - stop_err("Input BLAST XML file not found: %s" % xml_file) - -if not os.path.isfile(prop_file): - tmp = os.path.join(os.path.split(blast2go_jar)[0], prop_file) - if os.path.isfile(tmp): - #The properties file seems to have been given relative to the JAR - prop_file = tmp - else: - stop_err("Blast2GO configuration file not found: %s" % prop_file) - del tmp - -def prepare_xml(original_xml, mangled_xml): - """Reformat BLAST XML to suit Blast2GO. - - Blast2GO can't cope with 1000s of <Iteration> tags within a - single <BlastResult> tag, so instead split this into one - full XML record per interation (i.e. per query). This gives - a concatenated XML file mimicing old versions of BLAST. - - This also checks for BLASTP or BLASTX output, and outputs - the number of queries. Galaxy will show this as "info". - """ - in_handle = open(original_xml) - footer = " </BlastOutput_iterations>\n</BlastOutput>\n" - header = "" - while True: - line = in_handle.readline() - if not line: - #No hits? - stop_err("Problem with XML file?") - if line.strip() == "<Iteration>": - break - header += line - - if "<BlastOutput_program>blastx</BlastOutput_program>" in header: - print "BLASTX output identified" - elif "<BlastOutput_program>blastp</BlastOutput_program>" in header: - print "BLASTP output identified" - else: - in_handle.close() - stop_err("Expect BLASTP or BLASTX output") - - out_handle = open(mangled_xml, "w") - out_handle.write(header) - out_handle.write(line) - count = 1 - while True: - line = in_handle.readline() - if not line: - break - elif line.strip() == "<Iteration>": - #Insert footer/header - out_handle.write(footer) - out_handle.write(header) - count += 1 - out_handle.write(line) - - out_handle.close() - in_handle.close() - print "Input has %i queries" % count - - -def run(cmd): - #Avoid using shell=True when we call subprocess to ensure if the Python - #script is killed, so too is the child process. - try: - child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) - except Exception, err: - stop_err("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err)) - #Use .communicate as can get deadlocks with .wait(), - stdout, stderr = child.communicate() - return_code = child.returncode - - #keep stdout minimal as shown prominently in Galaxy - #Record it in case a silent error needs diagnosis - if stdout: - sys.stderr.write("Standard out:\n%s\n\n" % stdout) - if stderr: - sys.stderr.write("Standard error:\n%s\n\n" % stderr) - - error_msg = None - if return_code: - cmd_str = " ".join(cmd) - error_msg = "Return code %i from command:\n%s" % (return_code, cmd_str) - elif "Database or network connection (timeout) error" in stdout+stderr: - error_msg = "Database or network connection (timeout) error" - elif "Annotation of 0 seqs with 0 annots finished." in stdout+stderr: - error_msg = "No sequences processed!" - - if error_msg: - print error_msg - stop_err(error_msg) - - -blast2go_classpath = os.path.split(blast2go_jar)[0] -assert os.path.isdir(blast2go_classpath) -blast2go_classpath = "%s/*:%s/ext/*:" % (blast2go_classpath, blast2go_classpath) - -prepare_xml(xml_file, tmp_xml_file) -#print "XML file prepared for Blast2GO" - -#We will have write access wherever the output should be, -#so we'll ask Blast2GO to use that as the stem for its output -#(it will append .annot to the filename) -cmd = ["java", "-cp", blast2go_classpath, "es.blast2go.prog.B2GAnnotPipe", - "-in", tmp_xml_file, - "-prop", prop_file, - "-out", tabular_file, #Used as base name for output files - "-annot", # Generate *.annot tabular file - #NOTE: For v2.3.5 must use -a, for v2.5 must use -annot instead - #"-img", # Generate images, feature not in v2.3.5 - ] -#print " ".join(cmd) -run(cmd) - -#Remove the temp XML file -os.remove(tmp_xml_file) - -out_file = tabular_file + ".annot" -if not os.path.isfile(out_file): - stop_err("ERROR - No output annotation file from Blast2GO") - -#Move the output file where Galaxy expects it to be: -os.rename(out_file, tabular_file) - -print "Done"
--- a/tools/blast2go/blast2go.txt Fri Feb 22 08:47:27 2013 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,153 +0,0 @@ -Galaxy wrapper for Blast2GO for pipelines, b2g4pipe -=================================================== - -This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below. - -This is a wrapper for the command line Java tool b2g4pipe v2.5, -Blast2GO for pipelines. See: - -S. Götz et al. -High-throughput functional annotation and data mining with the Blast2GO suite. -Nucleic Acids Res. 36(10):3420–3435, 2008. -http://dx.doi.org/10.1093/nar/gkn176 - -A. Conesa and S. Götz. -Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. -Int. J. Plant Genomics. 619832, 2008. -http://dx.doi.org/10.1155/2008/619832 - -A. Conesa et al. -Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. -Bioinformatics 21:3674-3676, 2005. -http://dx.doi.org/10.1093/bioinformatics/bti610 - -http://www.blast2go.org/ - - - -Installation -============ - -The main dependency is b2g4pipe which must be installed manually. Also we -strongly recommend installing a local Blast2GO database as well (see the -intructions below about the blast2go.loc file). At the time of writing, -the current version is b2g4pipe v2.5 which is available here: - -http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip - -You can change the path by editing the definition near the start of the Python -script blast2go.py, but by default it expects the underlying tool to be here: - -/opt/b2g4pipe_v2.5/blast2go.jar - -Installation of the Galaxy wrapper should work automatically via the Galaxy -Tool Shed, including the dependency on 'blast_datatypes' for the 'blastxml' -file format definition. To install the wrapper manually, first install -'blast_datatypes', then copy or move the following files under the Galaxy -tools folder, e.g. in a tools/blast2go/ folder: - -* blast2go.xml (the Galaxy tool definition) -* blast2go.py (the Python wrapper script) -* blast2go.txt (this README file) - -For a manual installation of the wrapper you will also need to modify the -tools_conf.xml file to tell Galaxy to offer the tool. We suggest putting -it next to the NCBI BLAST+ wrappers. Just add the line: - -<tool file="blast2go/blast2go.xml" /> - -As part of setting up b2g4pipe you will need to setup one or more Blast2GO -property files which tell the tool which database to use etc. The example -b2gPipe.properties provided with b2g4pipe is often out of date. The current -server IP address and database name may given on the Blast2GO website, or -can be found by running the latest GUI version via Java web-start, and -looking under the tools/options menu. These property files can be anywhere -accessable to the Galaxy Unix user, we put them with the JAR file etc. - -You must tell Galaxy about these Blast2GO property files so that they can be -offered to the user. Copy file blast2go.loc.sample to tool-data/blast2go.loc -under the Galaxy folder and edit this to match your installation. This must -be plain text, tab separated, with three columns: - -(1) ID for the setup, e.g. Spain_2012_August -(2) Description for the setup, e.g. Database in Spain (August 2012) -(3) Properties filename for the setup, e.g. /opt/b2g4pipe/Spain_2012_August.properties - -Avoid including "Blast2GO" in the description (column 2) as this text will be -included in the automatically assigned output dataset name. The blast2go.loc -file allows you to customise the database setup. If for example you have a local -Blast2GO server running (which we recommend for speed), and you want this to be -the default setting, include it as the first line in your blast2go.loc file. - -Consult the Blast2GO documentation for details about the property files and -setting up a local MySQL Blast2GO database. - - -History -======= - -v0.0.1 - Initial public release -v0.0.2 - Documentation clarifications, e.g. concatenated BLAST XML is allowed. - - Fixed error handler in wrapper script (for when b2g4pipe fails). - - Reformats the XML to use old NCBI-style concatenated BLAST XML since - b2g4pipe crashes with heap space error on with large files using - current NCBI output. -v0.0.3 - Include sample loc file, tool-data/blast2go.loc.sample -v0.0.4 - Include repository_dependencies.xml file for 'blastxml' format - (previously included in the core Galaxy installation) -v0.0.5 - Quote arguments in case of spaces in filenames (internal change) - - Last release supporting b2g4pipe v2.3.5 -v0.0.6 - Support for b2g4pipe v2.5 instead of v2.3.5 - - Now invoked with a class path and es.blast2go.prog.B2GAnnotPipe - rather then simply calling the jar file - - Now uses the switch -annot instead of -a (this change breaks - support for b2g4pipe v2.3.5 unfortunately) - - Catch a few error messages and treat them explicitly as errors. - - -Developers -========== - -This script and related tools are being developed on the following hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/tools - -For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball I use -the following command from the Galaxy root folder: - -$ tar -czf blast2go.tar.gz tools/blast2go/blast2go.xml tools/blast2go/blast2go.py tools/blast2go/blast2go.txt tools/blast2go/repository_dependencies.xml tool-data/blast2go.loc.sample - -Check this worked: - -$ tar -tzf blast2go.tar.gz -tools/blast2go/blast2go.xml -tools/blast2go/blast2go.py -tools/blast2go/blast2go.txt -tools/blast2go/repository_dependencies.xml -tool-data/blast2go.loc.sample - - -Licence (MIT/BSD style) -======================= - -Permission to use, copy, modify, and distribute this software and its -documentation with or without modifications and for any purpose and -without fee is hereby granted, provided that any copyright notices -appear in all copies and that both those copyright notices and this -permission notice appear in supporting documentation, and that the -names of the contributors or copyright holders not be used in -advertising or publicity pertaining to distribution of the software -without specific prior permission. - -THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL -WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE -CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT -OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS -OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE -OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE -OR PERFORMANCE OF THIS SOFTWARE. - -NOTE: This is the licence for the Galaxy Wrapper only. Blast2GO and -associated data files are available and licenced separately.
--- a/tools/blast2go/blast2go.xml Fri Feb 22 08:47:27 2013 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,95 +0,0 @@ -<tool id="blast2go" name="Blast2GO" version="0.0.6"> - <description>Maps BLAST results to GO annotation terms</description> - <command interpreter="python"> - blast2go.py "${xml}" "${prop.fields.path}" "${tab}" - </command> - <stdio> - <!-- Wrapper ensures anything other than zero is an error --> - <exit_code range="1:" /> - <exit_code range=":-1" /> - </stdio> - <inputs> - <param name="xml" type="data" format="blastxml" label="BLAST XML results" description="You must have run BLAST against a protein database such as the NCBI non-redundant (NR) database. Use BLASTX for nucleotide queries, BLASTP for protein queries." /> - <param name="prop" type="select" label="Blast2GO settings" description="One or more configurations can be setup, such as using the Blast2GO team's server in Spain, or a local database."> - <options from_file="blast2go.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> - </inputs> - <outputs> - <data name="tab" format="tabular" label="Blast2GO ${prop.fields.name}" /> - </outputs> - <requirements> - </requirements> - <tests> - </tests> - <help> -.. class:: warningmark - -**Note**. Blast2GO may take a substantial amount of time, especially if -running against the public server in Spain. For large input datasets it -is advisable to allow overnight processing, or consider subdividing. - ------ - -**What it does** - -This runs b2g4Pipe, the command line (no GUI) version of Blast2GO designed -for use in pipelines. - -It takes as input BLAST XML results against a protein database, typically -the NCBI non-redundant (NR) database. This tool will accept concatenated -BLAST XML files (although they are technically invalid XML), which is very -useful if you have sub-divided your protein FASTA files and run BLAST on -them in batches. - -The BLAST matches are used to assign Gene Ontology (GO) annotation terms -to each query sequence. - -The output from this tool is a tabular file containing three columns, with -the order taken from query order in the original BLAST XML file: - -====== ==================================== -Column Description ------- ------------------------------------ - 1 ID and description of query sequence - 2 GO term - 3 GO description -====== ==================================== - -Note that if no GO terms are assigned to a sequence (e.g. if it had no -BLAST matches), then it will not be present in the output file. - - -**Advanced Settings** - -Blast2GO has a properties setting file which includes which database -server to connect to (e.g. the public server in Valencia, Spain, or a -local server), as well as more advanced options such as thresholds and -evidence code weights. To change these settings, your Galaxy administrator -must create a new properties file, and add it to the drop down menu above. - - -**References** - -S. Götz et al. -High-throughput functional annotation and data mining with the Blast2GO suite. -Nucleic Acids Res. 36(10):3420–3435, 2008. -http://dx.doi.org/10.1093/nar/gkn176 - -A. Conesa and S. Götz. -Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. -Int. J. Plant Genomics. 619832, 2008. -http://dx.doi.org/10.1155/2008/619832 - -A. Conesa et al. -Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. -Bioinformatics 21:3674-3676, 2005. -http://dx.doi.org/10.1093/bioinformatics/bti610 - -http://www.blast2go.org/ - - </help> -</tool>
--- a/tools/blast2go/repository_dependencies.xml Fri Feb 22 08:47:27 2013 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,5 +0,0 @@ -<?xml version="1.0"?> -<repositories description="This requires the BLAST datatype definitions (e.g. the BLAST XML format)."> -<!-- Revision 4:f9a7783ed7b6 on the main tool shed is v0.0.14 which added BLAST databases --> -<repository toolshed="http://toolshed.g2.bx.psu.edu" name="blast_datatypes" owner="devteam" changeset_revision="f9a7783ed7b6" /> -</repositories>
