# HG changeset patch
# User peterjc
# Date 1379929858 14400
# Node ID 872cf247c899798a1ce27eead0b67fed5bd21572
# Parent e4419efbefad36dfc4c9894f84191e4eeda4ca8c
Uploaded v0.0.8, auto-installation, RST README, MIT licence, citation information, development moved to GitHub, split out XML formatter into standalone script
diff -r e4419efbefad -r 872cf247c899 blast2go/README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/blast2go/README.rst Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,208 @@
+Galaxy wrapper for Blast2GO for pipelines, b2g4pipe
+===================================================
+
+This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below (MIT licence).
+
+This is a wrapper for the command line Java tool b2g4pipe v2.5,
+Blast2GO for pipelines. It is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go
+
+
+References
+==========
+
+Peter Cock, Bjoern Gruening, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+S. Geotz et al. (2008).
+High-throughput functional annotation and data mining with the Blast2GO suite.
+Nucleic Acids Res. 36(10):3420-3435.
+http://dx.doi.org/10.1093/nar/gkn176
+
+A. Conesa and S. Geotz (2008).
+Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics.
+International Journal of Plant Genomics. 619832.
+http://dx.doi.org/10.1155/2008/619832
+
+A. Conesa et al. (2005).
+Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research.
+Bioinformatics 21:3674-3676.
+http://dx.doi.org/10.1093/bioinformatics/bti610
+
+See also http://www.blast2go.com/
+
+
+Automated Installation
+======================
+
+Installation via the Galaxy Tool Shed should take care of the Galaxy side of
+things, including the dependency on 'blast_datatypes' which defines the
+'blastxml' file format. However, you will also probably need to configure
+the Blast2GO property file(s), for example if you have a local Blast2GO
+database (which we recommend for speed).
+
+
+Manual Installation
+===================
+
+The main dependency is b2g4pipe which must be installed manually. Also we
+strongly recommend installing a local Blast2GO database as well (see the
+intructions below about the blast2go.loc file). At the time of writing,
+the current version is b2g4pipe v2.5 which is available here:
+
+* http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip
+
+You can change the path by setting the B2G4PIPE environement variable to
+the desired folder, but by default the script looks for the JAR file here::
+
+ /opt/b2g4pipe_v2.5/blast2go.jar
+
+To install the wrapper manually, first install 'blast_datatypes', then
+copy or move the following files under the Galaxy tools folder, e.g. in a
+tools/blast2go/ folder:
+
+* blast2go.xml (the Galaxy tool definition)
+* blast2go.py (the Python wrapper script)
+* massage_xml_for_blast2go.py (Python XML reformatting script)
+* README.rst (this file)
+
+For a manual installation of the wrapper you will also need to modify the
+tools_conf.xml file to tell Galaxy to offer the tool. We suggest putting
+it next to the NCBI BLAST+ wrappers. Just add the line::
+
+
+
+If you wish to run the unit tests, also add this to tools_conf.xml.sample
+and move/copy the test-data files under Galaxy's test-data folder. Then::
+
+ $ ./run_functional_tests.sh -id blast2go
+
+
+Configuration
+=============
+
+As part of setting up b2g4pipe you will need to setup one or more Blast2GO
+property files which tell the tool which database to use etc. The example
+b2gPipe.properties provided with b2g4pipe is often out of date. The current
+server IP address and database name may given on the Blast2GO website, or
+can be found by running the latest GUI version via Java web-start, and
+looking under the tools/options menu. These property files can be anywhere
+accessable to the Galaxy Unix user, we put them with the JAR file etc.
+
+You must tell Galaxy about these Blast2GO property files so that they can be
+offered to the user. Copy file blast2go.loc.sample to tool-data/blast2go.loc
+under the Galaxy folder and edit this to match your installation. This must
+be plain text, tab separated, with three columns:
+
+1. ID for the setup, e.g. Spain_2012_August
+2. Description for the setup, e.g. Database in Spain (August 2012)
+3. Properties filename for the setup, e.g. Spain_2012_August.properties
+ relative to the main JAR file, or with a full path
+ e.g. /opt/b2g4pipe/Spain_2012_August.properties
+
+Avoid including "Blast2GO" in the description (column 2) as this text will be
+included in the automatically assigned output dataset name. The blast2go.loc
+file allows you to customise the database setup. If for example you have a local
+Blast2GO server running (which we recommend for speed), and you want this to be
+the default setting, include it as the first line in your blast2go.loc file.
+
+Consult the Blast2GO documentation for details about the property files and
+setting up a local MySQL Blast2GO database.
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1 - Initial public release
+v0.0.2 - Documentation clarifications, e.g. concatenated BLAST XML is allowed.
+ - Fixed error handler in wrapper script (for when b2g4pipe fails).
+ - Reformats the XML to use old NCBI-style concatenated BLAST XML since
+ b2g4pipe crashes with heap space error on with large files using
+ current NCBI output.
+v0.0.3 - Include sample loc file, tool-data/blast2go.loc.sample
+v0.0.4 - Include repository_dependencies.xml file for 'blastxml' format
+ (previously included in the core Galaxy installation)
+v0.0.5 - Quote arguments in case of spaces in filenames (internal change)
+ - Last release supporting b2g4pipe v2.3.5
+v0.0.6 - Support for b2g4pipe v2.5 instead of v2.3.5
+
+ - Now invoked with a class path and es.blast2go.prog.B2GAnnotPipe
+ rather then simply calling the jar file
+ - Now uses the switch -annot instead of -a (this change breaks
+ support for b2g4pipe v2.3.5 unfortunately)
+
+ - Catch a few error messages and treat them explicitly as errors.
+v0.0.7 - Update output description in XML file (b2g4pipe v2.3.5 included
+ the sequence description, b2g4pipe v2.5 omits this).
+v0.0.8 - Automated installation via the Galaxy Tool Shed.
+ - Added unit test.
+ - Explain how to load the tabular file into the Blast2GO GUI.
+ - Link to Tool Shed added to help text and this documentation.
+ - Switch to standard MIT licence.
+ - Use reStructuredText for this README file.
+ - Updated citation information (Cock et al. 2013).
+ - Development moved to GitHub, https://github.com/peterjc/galaxy_blast
+ - Split out massage_xml_for_blast2go.py as a standalone file.
+======= ======================================================================
+
+
+Developers
+==========
+
+This script and related tools were originally developed on the 'tools' branch
+of the following BitBucket Mercurial repository:
+https://bitbucket.org/peterjc/galaxy-central/
+
+As of September 2013, development is continuing on a dedicated GitHub repository:
+https://github.com/peterjc/galaxy_blast
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball I use
+the following command from the Galaxy root folder::
+
+ $ tar -czf blast2go.tar.gz blast2go/README.rst blast2go/blast2go.xml blast2go/blast2go.py blast2go/massage_xml_for_blast2go.py blast2go/repository_dependencies.xml blast2go/tool_dependencies.xml tool-data/blast2go.loc.sample test-data/blastp_sample.xml test-data/blastp_sample.blast2go.tabular
+
+Check this worked::
+
+ $ tar -tzf blast2go.tar.gz
+ blast2go/README.rst
+ blast2go/blast2go.xml
+ blast2go/blast2go.py
+ blast2go/massage_xml_for_blast2go.py
+ blast2go/repository_dependencies.xml
+ blast2go/tool_dependencies.xml
+ tool-data/blast2go.loc.sample
+ test-data/blastp_sample.xml
+ test-data/blastp_sample.blast2go.tabular
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+NOTE: This is the licence for the Galaxy Wrapper only. Blast2GO and
+associated data files are available and licenced separately.
diff -r e4419efbefad -r 872cf247c899 blast2go/blast2go.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/blast2go/blast2go.py Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,134 @@
+#!/usr/bin/env python
+"""Galaxy wrapper for Blast2GO for pipelines, b2g4pipe v2.5.
+
+This script takes exactly three command line arguments:
+ * Input BLAST XML filename
+ * Blast2GO properties filename (settings file)
+ * Output tabular filename
+
+The properties filename can be a fully qualified path, but if not
+this will look next to the blast2go.jar file.
+
+Sadly b2g4pipe (at least v2.3.5 to v2.5.0) cannot cope with current
+style large BLAST XML files (e.g. from BLAST 2.2.25+), so we reformat
+these to avoid it crashing with a Java heap space OutOfMemoryError.
+
+As part of this reformatting, we check for BLASTP or BLASTX output
+(otherwise raise an error), and print the query count.
+
+It then calls the Java command line tool, and moves the output file to
+the location Galaxy is expecting, and removes the tempory XML file.
+
+This script is called from my Galaxy wrapper for Blast2GO for pipelines,
+available from the Galaxy Tool Shed here:
+http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go
+
+This script is under version control here:
+https://github.com/peterjc/galaxy_blast/tree/master/blast2go
+"""
+import sys
+import os
+import subprocess
+
+#You may need to edit this to match your local setup,
+blast2go_dir = os.environ.get("B2G4PIPE", "/opt/b2g4pipe_v2.5/")
+blast2go_jar = os.path.join(blast2go_dir, "blast2go.jar")
+
+def stop_err(msg, error_level=1):
+ """Print error message to stdout and quit with given error level."""
+ sys.stderr.write("%s\n" % msg)
+ sys.exit(error_level)
+
+try:
+ from massage_xml_for_blast2go import prepare_xml
+except ImportError:
+ stop_err("Missing sister file massage_xml_for_blast2go.py")
+
+if len(sys.argv) != 4:
+ stop_err("Require three arguments: XML filename, properties filename, output tabular filename")
+
+xml_file, prop_file, tabular_file = sys.argv[1:]
+
+#We should have write access here:
+tmp_xml_file = tabular_file + ".tmp.xml"
+
+if not os.path.isfile(blast2go_jar):
+ stop_err("Blast2GO JAR file not found: %s" % blast2go_jar)
+
+if not os.path.isfile(xml_file):
+ stop_err("Input BLAST XML file not found: %s" % xml_file)
+
+if not os.path.isfile(prop_file):
+ tmp = os.path.join(os.path.split(blast2go_jar)[0], prop_file)
+ if os.path.isfile(tmp):
+ #The properties file seems to have been given relative to the JAR
+ prop_file = tmp
+ else:
+ stop_err("Blast2GO configuration file not found: %s" % prop_file)
+ del tmp
+
+
+def run(cmd):
+ #Avoid using shell=True when we call subprocess to ensure if the Python
+ #script is killed, so too is the child process.
+ try:
+ child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+ except Exception, err:
+ stop_err("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+ #Use .communicate as can get deadlocks with .wait(),
+ stdout, stderr = child.communicate()
+ return_code = child.returncode
+
+ #keep stdout minimal as shown prominently in Galaxy
+ #Record it in case a silent error needs diagnosis
+ if stdout:
+ sys.stderr.write("Standard out:\n%s\n\n" % stdout)
+ if stderr:
+ sys.stderr.write("Standard error:\n%s\n\n" % stderr)
+
+ error_msg = None
+ if return_code:
+ cmd_str = " ".join(cmd)
+ error_msg = "Return code %i from command:\n%s" % (return_code, cmd_str)
+ elif "Database or network connection (timeout) error" in stdout+stderr:
+ error_msg = "Database or network connection (timeout) error"
+ elif "Annotation of 0 seqs with 0 annots finished." in stdout+stderr:
+ error_msg = "No sequences processed!"
+
+ if error_msg:
+ print error_msg
+ stop_err(error_msg)
+
+
+blast2go_classpath = os.path.split(blast2go_jar)[0]
+assert os.path.isdir(blast2go_classpath)
+blast2go_classpath = "%s/*:%s/ext/*:" % (blast2go_classpath, blast2go_classpath)
+
+prepare_xml(xml_file, tmp_xml_file)
+#print "XML file prepared for Blast2GO"
+
+#We will have write access wherever the output should be,
+#so we'll ask Blast2GO to use that as the stem for its output
+#(it will append .annot to the filename)
+cmd = ["java", "-cp", blast2go_classpath, "es.blast2go.prog.B2GAnnotPipe",
+ "-in", tmp_xml_file,
+ "-prop", prop_file,
+ "-out", tabular_file, #Used as base name for output files
+ "-annot", # Generate *.annot tabular file
+ #NOTE: For v2.3.5 must use -a, for v2.5 must use -annot instead
+ #"-img", # Generate images, feature not in v2.3.5
+ ]
+#print " ".join(cmd)
+run(cmd)
+
+#Remove the temp XML file
+os.remove(tmp_xml_file)
+
+out_file = tabular_file + ".annot"
+if not os.path.isfile(out_file):
+ stop_err("ERROR - No output annotation file from Blast2GO")
+
+#Move the output file where Galaxy expects it to be:
+os.rename(out_file, tabular_file)
+
+print "Done"
diff -r e4419efbefad -r 872cf247c899 blast2go/blast2go.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/blast2go/blast2go.xml Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,118 @@
+
+ Maps BLAST results to GO annotation terms
+
+ b2g4pipe
+
+
+ blast2go.py "${xml}" "${prop.fields.path}" "${tab}"
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. class:: warningmark
+
+**Note**. Blast2GO may take a substantial amount of time, especially if
+running against the public server in Spain. For large input datasets it
+is advisable to allow overnight processing, or consider subdividing.
+
+-----
+
+**What it does**
+
+This runs b2g4Pipe v2.5, which is the command line (no GUI) version of
+Blast2GO designed for use in pipelines.
+
+It takes as input BLAST XML results against a protein database, typically
+the NCBI non-redundant (NR) database. This tool will accept concatenated
+BLAST XML files (although they are technically invalid XML), which is very
+useful if you have sub-divided your protein FASTA files and run BLAST on
+them in batches.
+
+The BLAST matches are used to assign Gene Ontology (GO) annotation terms
+to each query sequence.
+
+The output from this tool is a tabular file containing three columns, with
+the order taken from query order in the original BLAST XML file:
+
+====== ====================
+Column Description
+------ --------------------
+ 1 ID of query sequence
+ 2 GO term
+ 3 GO description
+====== ====================
+
+Note that if no GO terms are assigned to a sequence (e.g. if it had no
+BLAST matches), then it will not be present in the output file.
+
+This tabular file is called an "Annotation File" in the Blast2GO GUI.
+If you download the tabular file, and rename it to use the extension
+".annot", then it can be opened with the Blast2GO GUI via the "File",
+"Load Annotation (.annot)" menu (keyboard shortcut ALT+L). You can
+then run some of the interactive analyses offered in the GUI tool.
+
+
+**Advanced Settings**
+
+Blast2GO has a properties setting file which includes which database
+server to connect to (e.g. the public server in Valencia, Spain, or a
+local server), as well as more advanced options such as thresholds and
+evidence code weights. To change these settings, your Galaxy administrator
+must create a new properties file, and add it to the drop down menu above.
+
+
+**References**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite the following papers:
+
+Peter Cock, Bjoern Gruening, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+S. Götz et al. (2008).
+High-throughput functional annotation and data mining with the Blast2GO suite.
+Nucleic Acids Res. 36(10):3420–3435.
+http://dx.doi.org/10.1093/nar/gkn176
+
+A. Conesa and S. Götz (2008).
+Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics.
+International Journal of Plant Genomics. 619832.
+http://dx.doi.org/10.1155/2008/619832
+
+A. Conesa et al. (2005).
+Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research.
+Bioinformatics 21:3674-3676.
+http://dx.doi.org/10.1093/bioinformatics/bti610
+
+See also http://www.blast2go.com/
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go
+
+
+
diff -r e4419efbefad -r 872cf247c899 blast2go/massage_xml_for_blast2go.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/blast2go/massage_xml_for_blast2go.py Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,92 @@
+#!/usr/bin/env python
+"""Script for reformatting Blast XML to suite Blast2GO.
+
+This script takes exactly two command line arguments:
+ * Input BLAST XML filename
+ * Output BLAST XML filename
+
+Sadly b2g4pipe (at least v2.3.5 to v2.5.0) cannot cope with current
+style large BLAST XML files (e.g. from BLAST 2.2.25+), so we reformat
+these to avoid it crashing with a Java heap space OutOfMemoryError.
+
+As part of this reformatting, we check for BLASTP or BLASTX output
+(otherwise raise an error), and print the query count.
+
+This script is called from my Galaxy wrapper for Blast2GO for pipelines,
+available from the Galaxy Tool Shed here:
+http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go
+
+This script is under version control here:
+https://github.com/peterjc/galaxy_blast/tree/master/blast2go
+"""
+import sys
+import os
+import subprocess
+
+def stop_err(msg, error_level=1):
+ """Print error message to stdout and quit with given error level."""
+ sys.stderr.write("%s\n" % msg)
+ sys.exit(error_level)
+
+def prepare_xml(original_xml, mangled_xml):
+ """Reformat BLAST XML to suit Blast2GO.
+
+ Blast2GO can't cope with 1000s of tags within a
+ single tag, so instead split this into one
+ full XML record per interation (i.e. per query). This gives
+ a concatenated XML file mimicing old versions of BLAST.
+
+ This also checks for BLASTP or BLASTX output, and outputs
+ the number of queries. Galaxy will show this as "info".
+ """
+ in_handle = open(original_xml)
+ footer = " \n\n"
+ header = ""
+ while True:
+ line = in_handle.readline()
+ if not line:
+ #No hits?
+ stop_err("Problem with XML file?")
+ if line.strip() == "":
+ break
+ header += line
+
+ if "blastx" in header:
+ print "BLASTX output identified"
+ elif "blastp" in header:
+ print "BLASTP output identified"
+ else:
+ in_handle.close()
+ stop_err("Expect BLASTP or BLASTX output")
+
+ out_handle = open(mangled_xml, "w")
+ out_handle.write(header)
+ out_handle.write(line)
+ count = 1
+ while True:
+ line = in_handle.readline()
+ if not line:
+ break
+ elif line.strip() == "":
+ #Insert footer/header
+ out_handle.write(footer)
+ out_handle.write(header)
+ count += 1
+ out_handle.write(line)
+
+ out_handle.close()
+ in_handle.close()
+ print "Input has %i queries" % count
+
+
+if __name__ == "__main__":
+ # Run the conversion...
+ if len(sys.argv) != 3:
+ stop_err("Require two arguments: XML input filename, XML output filename")
+
+ xml_file, out_xml_file = sys.argv[1:]
+
+ if not os.path.isfile(xml_file):
+ stop_err("Input BLAST XML file not found: %s" % xml_file)
+
+ prepare_xml(xml_file, out_xml_file)
diff -r e4419efbefad -r 872cf247c899 blast2go/repository_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/blast2go/repository_dependencies.xml Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,4 @@
+
+
+
+
diff -r e4419efbefad -r 872cf247c899 blast2go/tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/blast2go/tool_dependencies.xml Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,32 @@
+
+
+
+
+
+
+
+ http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip
+
+
+cp b2gPipe.properties Spain_2012_August.properties &&
+sed -i "s/Dbacces.dbname=b2g_apr12/Dbacces.dbname=b2g_aug12/g" Spain_2012_August.properties &&
+sed -i "s/Dbacces.dbhost=10.10.100.203/Dbacces.dbhost=publicdb.blast2go.com/g" Spain_2012_August.properties
+
+
+cp b2gPipe.properties Spain_2011_June.properties &&
+sed -i "s/Dbacces.dbname=b2g_apr12/Dbacces.dbname=b2g_jun11/g" Spain_2011_June.properties &&
+sed -i "s/Dbacces.dbhost=10.10.100.203/Dbacces.dbhost=publicdb.blast2go.com/g" Spain_2011_June.properties
+
+ .$INSTALL_DIR/
+
+
+ $INSTALL_DIR
+
+
+
+
+Downloads b2g4pipe v2.5
+
+
+
+
diff -r e4419efbefad -r 872cf247c899 test-data/blastp_sample.blast2go.tabular
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/blastp_sample.blast2go.tabular Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,1 @@
+Sample GO:0005488 tail tape measure protein
diff -r e4419efbefad -r 872cf247c899 test-data/blastp_sample.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/blastp_sample.xml Mon Sep 23 05:50:58 2013 -0400
@@ -0,0 +1,293 @@
+
+
+
+ blastp
+ BLASTP 2.2.24+
+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
+ nr
+ Query_1
+ Sample
+ 516
+
+
+ BLOSUM62
+ 1e-30
+ 11
+ 1
+ F
+
+
+
+
+ 1
+ Query_1
+ Sample
+ 516
+
+
+ 1
+ gi|119953746|ref|YP_950551.1|
+ tail tape measure protein [Streptococcus phage SMP] >gi|118430558|gb|ABK91882.1| tail tape measure protein [Streptococcus suis phage SMP]
+ YP_950551
+ 659
+
+
+ 1
+ 949.117592429394
+ 2452
+ 0
+ 1
+ 516
+ 27
+ 542
+ 0
+ 0
+ 500
+ 500
+ 0
+ 516
+ FHLLNSGGSALSVMFAKLVGIIAGISAPIWXXXXXXXXXXXXXXXXYNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS
+ FHLLNSGGSALSVMFAKLVGIIAGISAPIWAVIGVIAALVAGFVLLYNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS
+ FHLLNSGGSALSVMFAKLVGIIAGISAPIW YNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS
+
+
+
+
+ 2
+ gi|148986157|ref|ZP_01819143.1|
+ unknown phage protein [Streptococcus pneumoniae SP3-BS71] >gi|147921871|gb|EDK72998.1| unknown phage protein [Streptococcus pneumoniae SP3-BS71]
+ ZP_01819143
+ 1031
+
+
+ 1
+ 174.481245259597
+ 441
+ 1.54640812741294e-41
+ 49
+ 300
+ 679
+ 897
+ 0
+ 0
+ 104
+ 148
+ 33
+ 252
+ TNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWS
+ TNEGFRDAVTTVWNAILEVINAVVSEISNFVMSIFGTVVTWWTENQELIRTSAETVWNAIYTVISTILDILGPLLQAGWDNIQLIITTTWEIIKIVVETAINVVLGVIQAVMQIITGDWSGAWETIKGVFSTVWQAIQSIVQT-------IFSAIQSYISNILNGISGT----VSNIWNSIKDTVSN----------------------VLNAISSTVSSVWEGIKSTISSAINGARDAVSSAIEAIKGLFN
+ TNE FR V W AI I+ V + +FVM ++G +V WW ENQELIR +AETVWNAI TV+ T++ L P++Q WD I ++TT +IK VV+T + VVLG+I+AVMQ+I GDWSGAWET+KGV T+W+ I+S+VQ IF +++ +I + + GT V IW+ IK TVSN V NAIS+ S++W I +T+ S + + + +E IK +++
+
+
+
+
+ 3
+ gi|77411259|ref|ZP_00787609.1|
+ tail tape meausure protein [Streptococcus agalactiae CJB111] >gi|77162685|gb|EAO73646.1| tail tape meausure protein [Streptococcus agalactiae CJB111]
+ ZP_00787609
+ 1039
+
+
+ 1
+ 165.621655013498
+ 418
+ 7.61538823982138e-39
+ 50
+ 310
+ 655
+ 904
+ 0
+ 0
+ 107
+ 158
+ 11
+ 261
+ NEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVF
+ HEGFRTAVTEIWNAIYAFLSVIIQQISSFVMSIWGTLTTWWTENQQLILNAANTVWTAISTVIQTIMTILGPYLQASWENIKLIITTAWDIIKVVVETAINVVLGIIKAVMQIITGDWSGAWETIKQVVSTVWEAIKSLISIVLSAIAQ-------FISNSWNGIKGTMTNLL----NSIKSVVSNVWNSIKSTISSILSSIGSTVSSVWNGMKATISGVLSGISNTVSSVWNGVKSTITNAINGAKNAVSSAINAIKNLF
+ +E FRT V W AI + +S ++ + SFVM +WG + WW ENQ+LI A TVW AI TV++T+MT L P +Q +W+ I ++TT ++IK VV+T + VVLGIIKAVMQ+I GDWSGAWET+K V T+WE IKSL+ + + + Q F+ + W+ + GT+ ++ + IK+ VSN ++ I +I++SI +T +VWN + S + + IS TV SV + I + K S+A IK +F
+
+
+
+
+ 4
+ gi|76786754|ref|YP_329383.1|
+ prophage LambdaSa04, tail tape measure protein, TP901 family [Streptococcus agalactiae A909] >gi|76561811|gb|ABA44395.1| prophage LambdaSa04, tail tape measure protein, TP901 family [Streptococcus agalactiae A909]
+ YP_329383
+ 1039
+
+
+ 1
+ 159.073262222903
+ 401
+ 6.55719737745379e-37
+ 50
+ 310
+ 655
+ 904
+ 0
+ 0
+ 103
+ 156
+ 11
+ 261
+ NEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVF
+ HEGFRTAVTEIWNAIYAFLTVIIQQISSFVMSIWGTLITWWTENQQLILNATNTVWTAISTVIQTIMTILAPYLQASWENIKLIITTAWDIIKVVVETAINVVLGIIKAVMQIITGDWSGAWETIKQVVSTVWEVIKSLISIVLSAIAQ-------FISNSWNGIKGTMTNLL----NSIKGVVSNVWNGIKSTISSILSSIGSTVSSIWNGMKATISGVLSGISSTVSFVWNGVKSTITNAINGAKNAVSSAINAIKNLF
+ +E FRT V W AI + ++ ++ + SFVM +WG ++ WW ENQ+LI TVW AI TV++T+MT L P +Q +W+ I ++TT ++IK VV+T + VVLGIIKAVMQ+I GDWSGAWET+K V T+WE IKSL+ + + + Q F+ + W+ + GT+ ++ + IK VSN + I +I++SI +T ++WN + S + + IS+TV V + I + K S+A IK +F
+
+
+
+
+ 5
+ gi|153811333|ref|ZP_01964001.1|
+ hypothetical protein RUMOBE_01725 [Ruminococcus obeum ATCC 29174] >gi|149832460|gb|EDM87544.1| hypothetical protein RUMOBE_01725 [Ruminococcus obeum ATCC 29174]
+ ZP_01964001
+ 1228
+
+
+ 1
+ 157.147264343316
+ 396
+ 2.33083876931167e-36
+ 3
+ 516
+ 573
+ 1059
+ 0
+ 0
+ 167
+ 247
+ 113
+ 557
+ LLNSGGSALSVMFAKLVGIIAGISAPIWXXXXXXXXXXXXXXXXYNTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQV---AIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLI-----------KQAISNAWEIIKTKT-----------------------SEIWNAITTFLSGIWEGIKTAASTAWEWIKTT-ISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIEN-IKSTVSNGWNNL---VSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLI-NGFVEGVKGAAGRLIDAVGGAVSGAIDWAKGLLGIKS
+ LVKAGG--FSGVFTKALGLI---TSPAAIVVGVIAAITAVIIHLWNTNEDFRNTITAIWQKIKDAFTT---------------FAAGISERLSALGITFSDVTSAIKTIWDGFCNLLAPVLEAAFSTIAIALQTAFNVI-----------LGIWDVFSAVFSGDWSGAWEAIKGIFSSIWDGLKEYFSTIIGAVKGVADVF---LGWFGTNWETVWNGVKTFFEGIW--------NGISSFFEGI--------------WNGISTFCTTVWNGIVTNVTAFCTTVHDTISTIFNAVKDVVSNVWETIKNVVQVAIMFIVEVVKAAFELITVPFRFIWENCRDTIISVWETIKSAVQTAINFVKDNIITPVMNAISATITTVWNAIQTTFTTVINAIKSAVQTAWNFMKDNVVTPVMNAISTTISTVWNTIKTTFTTVINAIKSAVQTAWNFMKNSVITPVMNGIKTVITTVWNAIKTAVQTVVNA---IKTTVQTVF-NAVKTTVTTIWNAIKTGTSTAWN----AVKTAVTTPINAAKSAVTSAIN------GIKS
+ L+ +GG S +F K +G+I ++P +NTNE+FR + A W+ IK A +T A +E + T V +AI+T+ + L P+++ A+ I + T NVI LGI + +GDWSGAWE +KG+ +IW+G+K A+ G+ +F L + + W+TVW + IW N I++ +E I WN IST + +W I T V + TT+ I T +K V S WE IK V ++ IV +V F+LI + I + WE IK+ + +WNAI T + + IK+A TAW ++K ++ VM I + I T WN IKT+ + +N IKSA + AWN +K+++ T + N IK+ ++ WN + V TV NA I + V+T F NAV I NAI G N VK A I+A AV+ AI+ GIKS
+
+
+
+
+ 6
+ gi|56962696|ref|YP_174422.1|
+ hypothetical protein ABC0922 [Bacillus clausii KSM-K16] >gi|56908934|dbj|BAD63461.1| phage-related protein [Bacillus clausii KSM-K16]
+ YP_174422
+ 593
+
+
+ 1
+ 146.746875793547
+ 369
+ 3.12404663750498e-33
+ 48
+ 433
+ 123
+ 465
+ 0
+ 0
+ 112
+ 187
+ 49
+ 389
+ NTNEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGL---VQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSN
+ QTNETFRNGVIQAWEAIKTTMETVVATIVTFVSEKLAQIKAFWDEHGAAVMQAVTNIFNGIKSIIEPVMNGILAIMQFVWPFIVSLIQMVWGNIQGVISGALNIIMGLVKAFAGLFTGDFS-----------LMWEGIKQLFSGALEAIWNVVQLLLFGR--LLKIASSLFTGLMGVFSKMWGAISNLFLTALNGIRSFFSTIFTPIQ-------NVVMTVMGFIRNAISTG----LTTASNVVQTVLTAIRTVFLTVFNAVRNV-----------VTTAISFVQNFISTGISAARTAVTSALNAIKTTFTTIFNAVRSSVTTAMTNIKTAISN-------GIQSAWQ----AVLNFVGRFREAGKNIVNSIAEGITSAIGAVKNAISN
+ TNE FR V AWEAIK+ + T V +V+FV + Q+ A+W+E+ + Q ++N I++++E VM ++ I+Q W I++++ V I+ V+ L +++G++KA + GD+S +WEGIK L A++ + VQ+ G L I +++ +M V +W I A+ + I T IQ N + TV I AIST LTT +QT L I+TV+ + ++ V VT ++ IS +T + NAI T + I+ ++++ +TA IKT ISN I++AW ++ N + + A +N N+I I++AI +K+ +SN
+
+
+
+
+ 7
+ gi|50914476|ref|YP_060448.1|
+ unknown phage protein [Streptococcus pyogenes MGAS10394] >gi|40218580|gb|AAR83234.1| prophage pi2 protein [Streptococcus pyogenes] >gi|50261625|gb|AAT72393.1| unknown [Streptococcus pyogenes] >gi|50903550|gb|AAT87265.1| unknown phage protein [Streptococcus pyogenes MGAS10394]
+ YP_060448
+ 1039
+
+
+ 1
+ 146.36167621763
+ 368
+ 4.74132513340056e-33
+ 50
+ 227
+ 655
+ 832
+ 0
+ 0
+ 78
+ 112
+ 0
+ 178
+ NEEFRTKVQAAWEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWD
+ NEGFRTAVIEIWNAIYAFISVIIQEISTFIMTIWGTLTTWWTENQALIQAAVETVWNAISTVIQTVMSLIGPYLEAAWANIQLIITTAWEIIKTVVETAITVVLGIIKAIMQAITGDWSGAWETIKGVLQRVWQAIQQIVTTILSAIGQFISNTWNGIKNTFSNILSAISGIVSSIWN
+ NE FRT V W AI + IS ++ + +F+M +WG + WW ENQ LI+ ETVWNAI TV++TVM+ + P ++ AW I ++TT +IKTVV+T + VVLGIIKA+MQ I GDWSGAWET+KGV +W+ I+ +V + + Q +K+ + + I +V IW+
+
+
+
+
+ 8
+ gi|29374987|ref|NP_814140.1|
+ tail protein [Enterococcus faecalis V583] >gi|29342445|gb|AAO80211.1| tail protein [Enterococcus faecalis V583]
+ NP_814140
+ 1049
+
+
+ 1
+ 139.0428842752
+ 349
+ 6.84844401007043e-31
+ 73
+ 482
+ 545
+ 920
+ 0
+ 0
+ 110
+ 196
+ 78
+ 432
+ EAVVSFVMDLWGQMVAWWNENQELIRQ-------TAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVL----NVIKTVVDTGLKVVLGIIKAVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKTVWSAAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSIS-----------NALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLINGF
+ DSIVKTASGLKGSLVKTWNDITAKVSEIWKKFTDAGKKTFDGFKKTVENVFNGIKNFLQTVWNVIYAVVGAIIVNTINIWKGIFDG--------FKAYFQYL-------WDLIKAIATGVWEKIGDTVTGIINGFIGVIKGIFDAFKTFFQQIWDAVVYSVTIAWNGIKNTVTSVSTAIKNFVTPIFNAIKTTITNVFNAIKNTATNVWNAIKTTISNVVQTILNF---------------------------------VTPIFNTMKNTITNIFNAIRNTASSVWNSIKTTISNIVTSVKNTVINIFNALKNSITNIFNAIRNTASTVWNSIKSTVSNIVSATVNTVKNLFNGMKNTVSSIWDGVRNTISNVVNAVKNTISNVWGGITGTVSN----IFNGVKNAIDGPMNAAKNLVKNVV----DAIKGF
+ +++V L G +V WN+ + + + ++ + VE V + +QT W++I AVV ++ N+ K + D KA Q + W+ +K +A +WE I V I+G + + + K+ + +W ++ V W+ IK TV++ TA+ + I +I+TT V+NAI A+N+W AI TT+ +V+ TI + VT F+ +K I+N + I+ S +WN+I T +S I +K + +K +I+N+ I++ T WN+IK+++S N N +K+ + W+ +++ IS + +K+T+SN W + TV+N I + V+ D +NAA+N + N + D I GF
+
+
+
+
+ 9
+ gi|163941333|ref|YP_001646217.1|
+ prophage LambdaBa01, membrane protein, putative [Bacillus weihenstephanensis KBAB4] >gi|163863530|gb|ABY44589.1| prophage LambdaBa01, membrane protein, putative [Bacillus weihenstephanensis KBAB4]
+ YP_001646217
+ 725
+
+
+ 1
+ 138.657684699283
+ 348
+ 8.15996781441799e-31
+ 61
+ 480
+ 142
+ 560
+ 0
+ 0
+ 118
+ 203
+ 29
+ 434
+ WEAIKSAISTAVEAVVSFVMDLWGQMVAWWNENQELIRQTAETVWNAIRTVVETVMTALIPIVQTAWDLILAVVTTVLNVIKTVVDTGLKVVLGIIK---AVMQMINGDWSGAWETLKGVAGTIWEGIKSLVQVAIDGLVQIFQTGLAFLKSIWDTVWGTIMAVVGPIWDWIKTTVSNAITAVWEIIQNIMTSIQTTWDTVWNAISTVASNIWTAISTTVMSVLTTIWGYIQTYLELIKT----VWS-------AAWEIIKAVFAAILLTIVGLVTGNFDLIKQAISNAWEIIKTKTSEIWNAITTFLSGIWEGIKTAASTAWEWIKTTISNVMTTIKSNIETAWNNIKTSISNALNNIKSAAENAWNNIKSAISTAIENIKSTVSNGWNNLVSTVTNAGPRIVSAVRTGFDNAVNAARNFISNAISVGGDLIN
+ WDAIKQWTIDAWNAIGEFLVGIWDGIVQWASEAWNSISESTSAVWNSIKEFLIGIWNGIVEFVVT-WGT--AILETYVGIWTSIFNFCMEIWNGIVEYLTSVLQGIATFFTEIWTSISTFFQEIWNGLVAFITPVLQGIADFFAM-----------IWNGISTVIQTVWNFITQYLQAIWTAILYFATPLFESIKNFISECWNKISSTTSLVWETIKNFLVSCWNGLVSFVTPIFEKIKSWIISVWDTISSATMAVWNAVKNFLQACWNGLVSIVTPIFDAIKNWIVNVWNAISSTTSAVWNAIKSYLSSLWNSIVSTASSIFNSIKSAISTVWNMISSASSSVWNGIKSTLSSIWNGIKSTASSVWNGLKDAIMTPVRWVTSAVSGAFNGMKSAVLGVWDGIKSGIRTAINGIIRIINKFI-DGFNTPAELLN
+ W+AIK A A+ F++ +W +V W +E I ++ VWN+I+ + + ++ V T W A++ T + + ++ + +++ GI++ +V+Q I ++ W ++ IW G+ + + + G+ F +W I V+ +W++I + TA+ + SI+ WN IS+ S +W I ++S + ++ E IK+ VW A W +K A +V +VT FD IK I N W I + TS +WNAI ++LS +W I + AS+ + IK+ IS V I S + WN IK+++S+ N IKS A + WN +K AI T + + S VS +N + S V I S +RT + + FI + + +L+N
+
+
+
+
+
+
+ 6589360
+ -2041834015
+ 0
+ 504129014857
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+
diff -r e4419efbefad -r 872cf247c899 tool-data/blast2go.loc.sample
--- a/tool-data/blast2go.loc.sample Fri Feb 22 08:47:27 2013 -0500
+++ b/tool-data/blast2go.loc.sample Mon Sep 23 05:50:58 2013 -0400
@@ -6,19 +6,22 @@
# Column 3 - Filename, Galaxy will use this when calling the tool
#
# Probably the most important setting in the properties file is the
-# Blast2GO database to use. Currently b2g4pipe v2.3.5 ships with an
-# old configuration so consult http://blast2go.org for the latest
-# public database they host in Spain. We also strongly recommend
+# Blast2GO database to use. Currently b2g4pipe v2.5 ships with an
+# old configuration so consult http://www.blast2go.com for the latest
+# public database they host in Spain (or find this by running the GUI
+# version of Blast2GO via Java Web Start under the menu entry "Tools",
+# "General Settings", "DataAccess setting"). We also strongly recommend
# configuring a local Blast2GO database.
#
-# The property filenames can be fullied qualified paths like
+# The property filenames can be fully qualified paths like
# /opt/b2g4pipe/Spain_2012_August.properties or provided they are
# in the same folder as the Blast2GO JAR file, just the filename
# like Spain_2012_August.properties instead. This is intended to
-# make migrating between versions of Blast2GO easier (as the
+# make migrating between future versions of Blast2GO easier (as the
# property files change between versions), and simpler overall.
#
-Local_2011_May Local database (May 2011) Local_2011_May.properties
-Spain_2010_May Database in Spain (May 2010) Spain_2010_May.properties
+#Local_2011_May Local database (May 2011) Local_2011_May.properties
+#Spain_2010_May Database in Spain (May 2010) Spain_2010_May.properties
+Spain_2012_August Database in Spain (August 2012) Spain_2012_August.properties
Spain_2011_June Database in Spain (June 2011) Spain_2011_June.properties
-Spain_2012_August Database in Spain (August 2012) Spain_2012_August.properties
+#default Default settings b2gPipe.properties
diff -r e4419efbefad -r 872cf247c899 tools/blast2go/blast2go.py
--- a/tools/blast2go/blast2go.py Fri Feb 22 08:47:27 2013 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,173 +0,0 @@
-#!/usr/bin/env python
-"""Galaxy wrapper for Blast2GO for pipelines, b2g4pipe v2.5.
-
-This script takes exactly three command line arguments:
- * Input BLAST XML filename
- * Blast2GO properties filename (settings file)
- * Output tabular filename
-
-The properties filename can be a fully qualified path, but if not
-this will look next to the blast2go.jar file.
-
-Sadly b2g4pipe (at least v2.3.5 to v2.5.0) cannot cope with current
-style large BLAST XML files (e.g. from BLAST 2.2.25+), so we reformat
-these to avoid it crashing with a Java heap space OutOfMemoryError.
-
-As part of this reformatting, we check for BLASTP or BLASTX output
-(otherwise raise an error), and print the query count.
-
-It then calls the Java command line tool, and moves the output file to
-the location Galaxy is expecting, and removes the tempory XML file.
-"""
-import sys
-import os
-import subprocess
-
-#You may need to edit this to match your local setup,
-#blast2go_jar = "/opt/b2g4pipe/blast2go.jar"
-blast2go_jar = "/opt/b2g4pipe_v2.5/blast2go.jar"
-
-
-def stop_err(msg, error_level=1):
- """Print error message to stdout and quit with given error level."""
- sys.stderr.write("%s\n" % msg)
- sys.exit(error_level)
-
-if len(sys.argv) != 4:
- stop_err("Require three arguments: XML filename, properties filename, output tabular filename")
-
-xml_file, prop_file, tabular_file = sys.argv[1:]
-
-#We should have write access here:
-tmp_xml_file = tabular_file + ".tmp.xml"
-
-if not os.path.isfile(blast2go_jar):
- stop_err("Blast2GO JAR file not found: %s" % blast2go_jar)
-
-if not os.path.isfile(xml_file):
- stop_err("Input BLAST XML file not found: %s" % xml_file)
-
-if not os.path.isfile(prop_file):
- tmp = os.path.join(os.path.split(blast2go_jar)[0], prop_file)
- if os.path.isfile(tmp):
- #The properties file seems to have been given relative to the JAR
- prop_file = tmp
- else:
- stop_err("Blast2GO configuration file not found: %s" % prop_file)
- del tmp
-
-def prepare_xml(original_xml, mangled_xml):
- """Reformat BLAST XML to suit Blast2GO.
-
- Blast2GO can't cope with 1000s of tags within a
- single tag, so instead split this into one
- full XML record per interation (i.e. per query). This gives
- a concatenated XML file mimicing old versions of BLAST.
-
- This also checks for BLASTP or BLASTX output, and outputs
- the number of queries. Galaxy will show this as "info".
- """
- in_handle = open(original_xml)
- footer = " \n\n"
- header = ""
- while True:
- line = in_handle.readline()
- if not line:
- #No hits?
- stop_err("Problem with XML file?")
- if line.strip() == "":
- break
- header += line
-
- if "blastx" in header:
- print "BLASTX output identified"
- elif "blastp" in header:
- print "BLASTP output identified"
- else:
- in_handle.close()
- stop_err("Expect BLASTP or BLASTX output")
-
- out_handle = open(mangled_xml, "w")
- out_handle.write(header)
- out_handle.write(line)
- count = 1
- while True:
- line = in_handle.readline()
- if not line:
- break
- elif line.strip() == "":
- #Insert footer/header
- out_handle.write(footer)
- out_handle.write(header)
- count += 1
- out_handle.write(line)
-
- out_handle.close()
- in_handle.close()
- print "Input has %i queries" % count
-
-
-def run(cmd):
- #Avoid using shell=True when we call subprocess to ensure if the Python
- #script is killed, so too is the child process.
- try:
- child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
- except Exception, err:
- stop_err("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
- #Use .communicate as can get deadlocks with .wait(),
- stdout, stderr = child.communicate()
- return_code = child.returncode
-
- #keep stdout minimal as shown prominently in Galaxy
- #Record it in case a silent error needs diagnosis
- if stdout:
- sys.stderr.write("Standard out:\n%s\n\n" % stdout)
- if stderr:
- sys.stderr.write("Standard error:\n%s\n\n" % stderr)
-
- error_msg = None
- if return_code:
- cmd_str = " ".join(cmd)
- error_msg = "Return code %i from command:\n%s" % (return_code, cmd_str)
- elif "Database or network connection (timeout) error" in stdout+stderr:
- error_msg = "Database or network connection (timeout) error"
- elif "Annotation of 0 seqs with 0 annots finished." in stdout+stderr:
- error_msg = "No sequences processed!"
-
- if error_msg:
- print error_msg
- stop_err(error_msg)
-
-
-blast2go_classpath = os.path.split(blast2go_jar)[0]
-assert os.path.isdir(blast2go_classpath)
-blast2go_classpath = "%s/*:%s/ext/*:" % (blast2go_classpath, blast2go_classpath)
-
-prepare_xml(xml_file, tmp_xml_file)
-#print "XML file prepared for Blast2GO"
-
-#We will have write access wherever the output should be,
-#so we'll ask Blast2GO to use that as the stem for its output
-#(it will append .annot to the filename)
-cmd = ["java", "-cp", blast2go_classpath, "es.blast2go.prog.B2GAnnotPipe",
- "-in", tmp_xml_file,
- "-prop", prop_file,
- "-out", tabular_file, #Used as base name for output files
- "-annot", # Generate *.annot tabular file
- #NOTE: For v2.3.5 must use -a, for v2.5 must use -annot instead
- #"-img", # Generate images, feature not in v2.3.5
- ]
-#print " ".join(cmd)
-run(cmd)
-
-#Remove the temp XML file
-os.remove(tmp_xml_file)
-
-out_file = tabular_file + ".annot"
-if not os.path.isfile(out_file):
- stop_err("ERROR - No output annotation file from Blast2GO")
-
-#Move the output file where Galaxy expects it to be:
-os.rename(out_file, tabular_file)
-
-print "Done"
diff -r e4419efbefad -r 872cf247c899 tools/blast2go/blast2go.txt
--- a/tools/blast2go/blast2go.txt Fri Feb 22 08:47:27 2013 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,153 +0,0 @@
-Galaxy wrapper for Blast2GO for pipelines, b2g4pipe
-===================================================
-
-This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below.
-
-This is a wrapper for the command line Java tool b2g4pipe v2.5,
-Blast2GO for pipelines. See:
-
-S. Götz et al.
-High-throughput functional annotation and data mining with the Blast2GO suite.
-Nucleic Acids Res. 36(10):3420–3435, 2008.
-http://dx.doi.org/10.1093/nar/gkn176
-
-A. Conesa and S. Götz.
-Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics.
-Int. J. Plant Genomics. 619832, 2008.
-http://dx.doi.org/10.1155/2008/619832
-
-A. Conesa et al.
-Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research.
-Bioinformatics 21:3674-3676, 2005.
-http://dx.doi.org/10.1093/bioinformatics/bti610
-
-http://www.blast2go.org/
-
-
-
-Installation
-============
-
-The main dependency is b2g4pipe which must be installed manually. Also we
-strongly recommend installing a local Blast2GO database as well (see the
-intructions below about the blast2go.loc file). At the time of writing,
-the current version is b2g4pipe v2.5 which is available here:
-
-http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip
-
-You can change the path by editing the definition near the start of the Python
-script blast2go.py, but by default it expects the underlying tool to be here:
-
-/opt/b2g4pipe_v2.5/blast2go.jar
-
-Installation of the Galaxy wrapper should work automatically via the Galaxy
-Tool Shed, including the dependency on 'blast_datatypes' for the 'blastxml'
-file format definition. To install the wrapper manually, first install
-'blast_datatypes', then copy or move the following files under the Galaxy
-tools folder, e.g. in a tools/blast2go/ folder:
-
-* blast2go.xml (the Galaxy tool definition)
-* blast2go.py (the Python wrapper script)
-* blast2go.txt (this README file)
-
-For a manual installation of the wrapper you will also need to modify the
-tools_conf.xml file to tell Galaxy to offer the tool. We suggest putting
-it next to the NCBI BLAST+ wrappers. Just add the line:
-
-
-
-As part of setting up b2g4pipe you will need to setup one or more Blast2GO
-property files which tell the tool which database to use etc. The example
-b2gPipe.properties provided with b2g4pipe is often out of date. The current
-server IP address and database name may given on the Blast2GO website, or
-can be found by running the latest GUI version via Java web-start, and
-looking under the tools/options menu. These property files can be anywhere
-accessable to the Galaxy Unix user, we put them with the JAR file etc.
-
-You must tell Galaxy about these Blast2GO property files so that they can be
-offered to the user. Copy file blast2go.loc.sample to tool-data/blast2go.loc
-under the Galaxy folder and edit this to match your installation. This must
-be plain text, tab separated, with three columns:
-
-(1) ID for the setup, e.g. Spain_2012_August
-(2) Description for the setup, e.g. Database in Spain (August 2012)
-(3) Properties filename for the setup, e.g. /opt/b2g4pipe/Spain_2012_August.properties
-
-Avoid including "Blast2GO" in the description (column 2) as this text will be
-included in the automatically assigned output dataset name. The blast2go.loc
-file allows you to customise the database setup. If for example you have a local
-Blast2GO server running (which we recommend for speed), and you want this to be
-the default setting, include it as the first line in your blast2go.loc file.
-
-Consult the Blast2GO documentation for details about the property files and
-setting up a local MySQL Blast2GO database.
-
-
-History
-=======
-
-v0.0.1 - Initial public release
-v0.0.2 - Documentation clarifications, e.g. concatenated BLAST XML is allowed.
- - Fixed error handler in wrapper script (for when b2g4pipe fails).
- - Reformats the XML to use old NCBI-style concatenated BLAST XML since
- b2g4pipe crashes with heap space error on with large files using
- current NCBI output.
-v0.0.3 - Include sample loc file, tool-data/blast2go.loc.sample
-v0.0.4 - Include repository_dependencies.xml file for 'blastxml' format
- (previously included in the core Galaxy installation)
-v0.0.5 - Quote arguments in case of spaces in filenames (internal change)
- - Last release supporting b2g4pipe v2.3.5
-v0.0.6 - Support for b2g4pipe v2.5 instead of v2.3.5
- - Now invoked with a class path and es.blast2go.prog.B2GAnnotPipe
- rather then simply calling the jar file
- - Now uses the switch -annot instead of -a (this change breaks
- support for b2g4pipe v2.3.5 unfortunately)
- - Catch a few error messages and treat them explicitly as errors.
-
-
-Developers
-==========
-
-This script and related tools are being developed on the following hg branch:
-http://bitbucket.org/peterjc/galaxy-central/src/tools
-
-For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball I use
-the following command from the Galaxy root folder:
-
-$ tar -czf blast2go.tar.gz tools/blast2go/blast2go.xml tools/blast2go/blast2go.py tools/blast2go/blast2go.txt tools/blast2go/repository_dependencies.xml tool-data/blast2go.loc.sample
-
-Check this worked:
-
-$ tar -tzf blast2go.tar.gz
-tools/blast2go/blast2go.xml
-tools/blast2go/blast2go.py
-tools/blast2go/blast2go.txt
-tools/blast2go/repository_dependencies.xml
-tool-data/blast2go.loc.sample
-
-
-Licence (MIT/BSD style)
-=======================
-
-Permission to use, copy, modify, and distribute this software and its
-documentation with or without modifications and for any purpose and
-without fee is hereby granted, provided that any copyright notices
-appear in all copies and that both those copyright notices and this
-permission notice appear in supporting documentation, and that the
-names of the contributors or copyright holders not be used in
-advertising or publicity pertaining to distribution of the software
-without specific prior permission.
-
-THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
-WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
-CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
-OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
-OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
-OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
-OR PERFORMANCE OF THIS SOFTWARE.
-
-NOTE: This is the licence for the Galaxy Wrapper only. Blast2GO and
-associated data files are available and licenced separately.
diff -r e4419efbefad -r 872cf247c899 tools/blast2go/blast2go.xml
--- a/tools/blast2go/blast2go.xml Fri Feb 22 08:47:27 2013 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,95 +0,0 @@
-
- Maps BLAST results to GO annotation terms
-
- blast2go.py "${xml}" "${prop.fields.path}" "${tab}"
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. class:: warningmark
-
-**Note**. Blast2GO may take a substantial amount of time, especially if
-running against the public server in Spain. For large input datasets it
-is advisable to allow overnight processing, or consider subdividing.
-
------
-
-**What it does**
-
-This runs b2g4Pipe, the command line (no GUI) version of Blast2GO designed
-for use in pipelines.
-
-It takes as input BLAST XML results against a protein database, typically
-the NCBI non-redundant (NR) database. This tool will accept concatenated
-BLAST XML files (although they are technically invalid XML), which is very
-useful if you have sub-divided your protein FASTA files and run BLAST on
-them in batches.
-
-The BLAST matches are used to assign Gene Ontology (GO) annotation terms
-to each query sequence.
-
-The output from this tool is a tabular file containing three columns, with
-the order taken from query order in the original BLAST XML file:
-
-====== ====================================
-Column Description
------- ------------------------------------
- 1 ID and description of query sequence
- 2 GO term
- 3 GO description
-====== ====================================
-
-Note that if no GO terms are assigned to a sequence (e.g. if it had no
-BLAST matches), then it will not be present in the output file.
-
-
-**Advanced Settings**
-
-Blast2GO has a properties setting file which includes which database
-server to connect to (e.g. the public server in Valencia, Spain, or a
-local server), as well as more advanced options such as thresholds and
-evidence code weights. To change these settings, your Galaxy administrator
-must create a new properties file, and add it to the drop down menu above.
-
-
-**References**
-
-S. Götz et al.
-High-throughput functional annotation and data mining with the Blast2GO suite.
-Nucleic Acids Res. 36(10):3420–3435, 2008.
-http://dx.doi.org/10.1093/nar/gkn176
-
-A. Conesa and S. Götz.
-Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics.
-Int. J. Plant Genomics. 619832, 2008.
-http://dx.doi.org/10.1155/2008/619832
-
-A. Conesa et al.
-Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research.
-Bioinformatics 21:3674-3676, 2005.
-http://dx.doi.org/10.1093/bioinformatics/bti610
-
-http://www.blast2go.org/
-
-
-
diff -r e4419efbefad -r 872cf247c899 tools/blast2go/repository_dependencies.xml
--- a/tools/blast2go/repository_dependencies.xml Fri Feb 22 08:47:27 2013 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,5 +0,0 @@
-
-
-
-
-