# HG changeset patch
# User peterjc
# Date 1402657655 14400
# Node ID 98f8431dab44174d0c33765caff1af3e27ea4043
# Parent 09a68a90d552671a80608953205c2f311e00e367
Uploaded v0.1.0, now also handles extended tabular BLAST output.
diff -r 09a68a90d552 -r 98f8431dab44 blastxml_to_top_descr/README.rst
--- a/blastxml_to_top_descr/README.rst Wed Sep 18 06:07:53 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,121 +0,0 @@
-Galaxy tool to extract top BLAST hit descriptions from BLAST XML
-================================================================
-
-This tool is copyright 2012-2013 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below.
-
-This tool is a short Python script to parse a BLAST XML file, and extract the
-identifiers with description for the top matches (by default the top 3), and
-output these as a simple tabular file along with the query identifiers.
-
-It is available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/peterjc/blastxml_to_top_descr
-
-This requires the 'blast_datatypes' repository from the Galaxy Tool Shed
-to provide the 'blastxml' file format definition.
-
-
-Automated Installation
-======================
-
-This should be straightforward, Galaxy should automatically install the
-'blast_datatypes' dependency.
-
-
-Manual Installation
-===================
-
-If you haven't done so before, first install the 'blast_datatypes' repository.
-
-There are just two files to install (if doing this manually):
-
-* blastxml_to_top_descr.py (the Python script)
-* blastxml_to_top_descr.xml (the Galaxy tool definition)
-
-The suggested location is in the Galaxy folder tools/ncbi_blast_plus next to
-the NCBI BLAST+ tool wrappers.
-
-You will also need to modify the tools_conf.xml file to tell Galaxy to offer
-the tool. e.g. next to the NCBI BLAST+ tools. Simply add the line::
-
-
-
-To run the tool's tests, also add this line to tools_conf.xml.sample then::
-
- $ sh run_functional_tests.sh -id blastxml_to_top_descr
-
-
-History
-=======
-
-======= ======================================================================
-Version Changes
-------- ----------------------------------------------------------------------
-v0.0.1 - Initial version.
-v0.0.2 - Since BLAST+ was moved out of the Galaxy core, now have a dependency
- on the 'blast_datatypes' repository in the Tool Shed.
-v0.0.3 - Include the test files required to run the unit tests
-v0.0.4 - Quote filenames in case they contain spaces (internal change)
-v0.0.5 - Include number of queries with BLAST matches in stdout (peek text)
-v0.0.6 - Check for errors via the script's return code (internal change)
-v0.0.7 - Link to Tool Shed added to help text and this documentation.
- - Tweak dependency on blast_datatypes to also work on Test Tool Shed
- - Adopt standard MIT License.
-v0.0.8 - Development moved to GitHub, https://github.com/peterjc/galaxy_blast
-v0.0.9 - Updated citation information (Cock et al. 2013).
-======= ======================================================================
-
-
-Bug Reports
-===========
-
-You can file an issue here https://github.com/peterjc/galaxy_blast/issues or ask
-us on the Galaxy development list http://lists.bx.psu.edu/listinfo/galaxy-dev
-
-
-Developers
-==========
-
-This script and related tools were originally developed on the 'tools' branch of
-the following Mercurial repository: https://bitbucket.org/peterjc/galaxy-central/
-
-As of July 2013, development is continuing on a dedicated GitHub repository:
-https://github.com/peterjc/galaxy_blast
-
-For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
-the following command from the GitHub repository root folder::
-
- $ tar -czf blastxml_to_top_descr.tar.gz blastxml_to_top_descr/README.rst blastxml_to_top_descr/blastxml_to_top_descr.* blastxml_to_top_descr/repository_dependencies.xml test-data/blastp_four_human_vs_rhodopsin.xml test-data/blastp_four_human_vs_rhodopsin_top3.tabular
-
-Check this worked::
-
- $ tar -tzf blastxml_to_top_descr.tar.gz
- blastxml_to_top_descr/README.rst
- blastxml_to_top_descr/blastxml_to_top_descr.py
- blastxml_to_top_descr/blastxml_to_top_descr.xml
- blastxml_to_top_descr/repository_dependencies.xml
- test-data/blastp_four_human_vs_rhodopsin.xml
- test-data/blastp_four_human_vs_rhodopsin_top3.tabular
-
-
-Licence (MIT)
-=============
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
diff -r 09a68a90d552 -r 98f8431dab44 blastxml_to_top_descr/blastxml_to_top_descr.py
--- a/blastxml_to_top_descr/blastxml_to_top_descr.py Wed Sep 18 06:07:53 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,122 +0,0 @@
-#!/usr/bin/env python
-"""Convert a BLAST XML file to a top hits description table.
-
-Takes three command line options, input BLAST XML filename, output tabular
-BLAST filename, number of hits to collect the descriptions of.
-"""
-import sys
-import re
-
-if "-v" in sys.argv or "--version" in sys.argv:
- print "v0.0.5"
- sys.exit(0)
-
-if sys.version_info[:2] >= ( 2, 5 ):
- import xml.etree.cElementTree as ElementTree
-else:
- from galaxy import eggs
- import pkg_resources; pkg_resources.require( "elementtree" )
- from elementtree import ElementTree
-
-def stop_err( msg ):
- sys.stderr.write("%s\n" % msg)
- sys.exit(1)
-
-#Parse Command Line
-try:
- in_file, out_file, topN = sys.argv[1:]
-except:
- stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, number of hits")
-
-
-try:
- topN = int(topN)
-except ValueError:
- stop_err("Number of hits argument should be an integer (at least 1)")
-if topN < 1:
- stop_err("Number of hits argument should be an integer (at least 1)")
-
-# get an iterable
-try:
- context = ElementTree.iterparse(in_file, events=("start", "end"))
-except:
- stop_err("Invalid data format.")
-# turn it into an iterator
-context = iter(context)
-# get the root element
-try:
- event, root = context.next()
-except:
- stop_err( "Invalid data format." )
-
-
-re_default_query_id = re.compile("^Query_\d+$")
-assert re_default_query_id.match("Query_101")
-assert not re_default_query_id.match("Query_101a")
-assert not re_default_query_id.match("MyQuery_101")
-re_default_subject_id = re.compile("^Subject_\d+$")
-assert re_default_subject_id.match("Subject_1")
-assert not re_default_subject_id.match("Subject_")
-assert not re_default_subject_id.match("Subject_12a")
-assert not re_default_subject_id.match("TheSubject_1")
-
-
-count = 0
-pos_count = 0
-outfile = open(out_file, 'w')
-outfile.write("#Query\t%s\n" % "\t".join("BLAST hit %i" % (i+1) for i in range(topN)))
-for event, elem in context:
- # for every tag
- if event == "end" and elem.tag == "Iteration":
- #Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
- # sp|Q9BS26|ERP44_HUMAN
- # Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- # 406
- #
- #
- #Or, from BLAST 2.2.24+ run online
- # Query_1
- # Sample
- # 516
- # ...
- qseqid = elem.findtext("Iteration_query-ID")
- if qseqid is None:
- stop_err("Missing (could be really old BLAST XML data?)")
- if re_default_query_id.match(qseqid):
- #Place holder ID, take the first word of the query definition
- qseqid = elem.findtext("Iteration_query-def").split(None,1)[0]
- # for every within
- hit_descrs = []
- for hit in elem.findall("Iteration_hits/Hit"):
- #Expecting either this,
- # gi|3024260|sp|P56514.1|OPSD_BUFBU
- # RecName: Full=Rhodopsin
- # P56514
- #or,
- # Subject_1
- # gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]
- # Subject_1
- #
- #apparently depending on the parse_deflines switch
- sseqid = hit.findtext("Hit_id").split(None,1)[0]
- hit_def = sseqid + " " + hit.findtext("Hit_def")
- if re_default_subject_id.match(sseqid) \
- and sseqid == hit.findtext("Hit_accession"):
- #Place holder ID, take the first word of the subject definition
- hit_def = hit.findtext("Hit_def")
- sseqid = hit_def.split(None,1)[0]
- assert hit_def not in hit_descrs
- hit_descrs.append(hit_def)
- #print "%r has %i hits" % (qseqid, len(hit_descrs))
- if hit_descrs:
- pos_count += 1
- hit_descrs = hit_descrs[:topN]
- while len(hit_descrs) < topN:
- hit_descrs.append("")
- outfile.write("%s\t%s\n" % (qseqid, "\t".join(hit_descrs)))
- count += 1
- # prevents ElementTree from growing large datastructure
- root.clear()
- elem.clear()
-outfile.close()
-print "Of %i queries, %i had BLAST results" % (count, pos_count)
diff -r 09a68a90d552 -r 98f8431dab44 blastxml_to_top_descr/blastxml_to_top_descr.xml
--- a/blastxml_to_top_descr/blastxml_to_top_descr.xml Wed Sep 18 06:07:53 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,66 +0,0 @@
-
- Make a table from BLAST XML
- blastxml_to_top_descr.py --version
-
- blastxml_to_top_descr.py "${blastxml_file}" "${tabular_file}" ${topN}
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of
-formats including text, tabular and a more detailed XML format. You can
-do a lot of things with tabular files in Galaxy (sorting, filtering, joins,
-etc) however currently the BLAST tabular output omits the hit descriptions
-found in the other output formats.
-
-This tool turns a BLAST XML file into a simple tabular file containing
-one row per query sequence, containing the query identifier and then
-the three (by default) top hit descriptions. If a query doesn't have
-that many hits, then these entries are left blank.
-
-**Example Usage**
-
-One simple usage would be to take a transcriptome assembly or set of
-gene predictions, run a BLAST search against the NCBI NR database, and
-then use this tool to make a table of the top three BLAST hits. This
-can give you a 'quick and dirty' crude annotation, potentially enough
-to spot some problems (e.g. bacterial contaimination could be very
-obvious).
-
-**References**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/blastxml_to_top_descr
-
-
-
diff -r 09a68a90d552 -r 98f8431dab44 blastxml_to_top_descr/repository_dependencies.xml
--- a/blastxml_to_top_descr/repository_dependencies.xml Wed Sep 18 06:07:53 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,4 +0,0 @@
-
-
-
-
diff -r 09a68a90d552 -r 98f8431dab44 test-data/blastp_four_human_vs_rhodopsin.xml
--- a/test-data/blastp_four_human_vs_rhodopsin.xml Wed Sep 18 06:07:53 2013 -0400
+++ b/test-data/blastp_four_human_vs_rhodopsin.xml Fri Jun 13 07:07:35 2014 -0400
@@ -2,7 +2,7 @@
blastp
- BLASTP 2.2.26+
+ BLASTP 2.2.29+
Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
sp|Q9BS26|ERP44_HUMAN
@@ -17,630 +17,649 @@
F
-
-
- 1
- sp|Q9BS26|ERP44_HUMAN
- Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- 406
-
-
-
- 0
- 0
- 30
- 119568
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 2
- sp|Q9BS26|ERP44_HUMAN
- Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- 406
-
-
-
- 0
- 0
- 30
- 119568
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 3
- sp|Q9BS26|ERP44_HUMAN
- Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- 406
-
-
-
- 0
- 0
- 30
- 119568
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 4
- sp|Q9BS26|ERP44_HUMAN
- Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- 406
-
-
-
- 0
- 0
- 30
- 119568
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 5
- sp|Q9BS26|ERP44_HUMAN
- Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- 406
-
-
-
- 0
- 0
- 30
- 119568
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 6
- sp|Q9BS26|ERP44_HUMAN
- Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
- 406
-
-
-
- 0
- 0
- 30
- 119568
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 7
- sp|Q9NSY1|BMP2K_HUMAN
- BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
- 1161
-
-
-
- 0
- 0
- 38
- 348130
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 8
- sp|Q9NSY1|BMP2K_HUMAN
- BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
- 1161
-
-
-
- 0
- 0
- 38
- 348130
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 9
- sp|Q9NSY1|BMP2K_HUMAN
- BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
- 1161
-
-
-
- 0
- 0
- 38
- 348130
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 10
- sp|Q9NSY1|BMP2K_HUMAN
- BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
- 1161
-
-
-
- 0
- 0
- 38
- 348130
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 11
- sp|Q9NSY1|BMP2K_HUMAN
- BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
- 1161
-
-
-
- 0
- 0
- 38
- 348130
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 12
- sp|Q9NSY1|BMP2K_HUMAN
- BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
- 1161
-
-
-
- 0
- 0
- 38
- 348130
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 13
- sp|P06213|INSR_HUMAN
- Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
- 1382
-
-
-
- 0
- 0
- 39
- 414987
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 14
- sp|P06213|INSR_HUMAN
- Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
- 1382
-
-
-
- 0
- 0
- 39
- 414987
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 15
- sp|P06213|INSR_HUMAN
- Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
- 1382
-
-
-
- 0
- 0
- 39
- 414987
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 16
- sp|P06213|INSR_HUMAN
- Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
- 1382
-
-
-
- 0
- 0
- 39
- 414987
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 17
- sp|P06213|INSR_HUMAN
- Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
- 1382
-
-
-
- 0
- 0
- 39
- 414987
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 18
- sp|P06213|INSR_HUMAN
- Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
- 1382
-
-
-
- 0
- 0
- 39
- 414987
- 0.041
- 0.267
- 0.14
-
-
- No hits found
-
-
- 19
- sp|P08100|OPSD_HUMAN
- Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
- 348
-
-
- 1
- gi|57163783|ref|NP_001009242.1|
- rhodopsin [Felis catus]
- NP_001009242
- 348
-
-
- 1
- 701.049
- 1808
- 0
- 1
- 348
- 1
- 348
- 0
- 0
- 336
- 343
- 0
- 348
- MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
- MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTGSKTETSQVAPA
- MNGTEGPNFYVPFSN TGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMV GGFT+TLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPL GWSRYIPEG+QCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMI+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMT+PAFFAKS++IYNPVIYIMMNKQFRNCMLTT+CCGKNPLGDDEAS T SKTETSQVAPA
-
-
-
-
-
-
- 0
- 0
- 29
- 101761
- 0.041
- 0.267
- 0.14
-
-
-
-
- 20
- sp|P08100|OPSD_HUMAN
- Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
- 348
-
-
- 1
- gi|3024260|sp|P56514.1|OPSD_BUFBU
- RecName: Full=Rhodopsin
- P56514
- 354
-
-
- 1
- 619.002
- 1595
- 0
- 1
- 341
- 1
- 342
- 0
- 0
- 290
- 322
- 1
- 342
- MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEA-SATVSKTE
- MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSILCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFILGATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMALSCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGEDDASSAATSKTE
- MNGTEGPNFY+P SN TGVVRSPFEYPQYYLAEPWQ+S+L AYMFLLI+LGFPINF+TLYVT+QHKKLRTPLNYILLNLA A+ FMVL GFT T+Y+S++GYF+ G TGC +EGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRF ENHA+MGVAFTW+MAL+CA PPL GWSRYIPEG+QCSCG+DYYTLKPEVNNESFVIYMFVVHFTIP+IIIFFCYG+LV TVKEAAAQQQESATTQKAEKEVTRMVIIMV+ FLICWVPYASVAF+IF++QGS FGPIFMT+PAFFAKS++IYNPVIYIM+NKQFRNCM+TT+CCGKNP G+D+A SA SKTE
-
-
-
-
-
-
- 0
- 0
- 29
- 101761
- 0.041
- 0.267
- 0.14
-
-
-
-
- 21
- sp|P08100|OPSD_HUMAN
- Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
- 348
-
-
- 1
- gi|283855846|gb|ADB45242.1|
- rhodopsin [Cynopterus brachyotis]
- ADB45242
- 328
-
-
- 1
- 653.284
- 1684
- 0
- 11
- 338
- 1
- 328
- 0
- 0
- 311
- 321
- 0
- 328
- VPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVS
- VPFSNKTGVVRSPFEHPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTAS
- VPFSN TGVVRSPFE+PQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMV GGFT+TLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMG+A TWVMALACAAPPL GWSRYIPEG+QCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMI+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICW+PYA VAFYIFTHQGSNFGPIFMT+PAFFAKS++IYNPVIYIMMNKQFRNCMLTT+CCGKNPLGDDEAS T S
-
-
-
-
-
-
- 0
- 0
- 29
- 101761
- 0.041
- 0.267
- 0.14
-
-
-
-
- 22
- sp|P08100|OPSD_HUMAN
- Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
- 348
-
-
- 1
- gi|283855823|gb|ADB45229.1|
- rhodopsin [Myotis pilosus]
- ADB45229
- 328
-
-
- 1
- 631.328
- 1627
- 0
- 11
- 338
- 1
- 328
- 0
- 0
- 311
- 323
- 0
- 328
- VPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVS
- VPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSMHGYFVFGATGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWLPYASVAFYIFTHQGSNFGPVFMTIPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTAS
- VPFSN TGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVA+LFMV GGFT+TLYTS+HGYFVFG TGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMG+AFTWVMALACAAPPLAGWSRYIPEG+QCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMI+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMV+AFLICW+PYASVAFYIFTHQGSNFGP+FMTIPAFFAKS++IYNPVIYIMMNKQFRNCMLTT+CCGKNPLGDDEAS T S
-
-
-
-
-
-
- 0
- 0
- 29
- 101761
- 0.041
- 0.267
- 0.14
-
-
-
-
- 23
- sp|P08100|OPSD_HUMAN
- Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
- 348
-
-
- 1
- gi|223523|prf||0811197A
- rhodopsin [Bos taurus]
- 0811197A
- 347
-
-
- 1
- 673.315
- 1736
- 0
- 1
- 348
- 1
- 347
- 0
- 0
- 324
- 336
- 1
- 348
- MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
- MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGID-YTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA
- MNGTEGPNFYVPFSN TGVVRSPFE PQYYLAEPWQFSMLAAYMFLLI+LGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMV GGFT+TLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPL GWSRYIPEG+QCSCGID YT E NNESFVIYMFVVHF IP+I+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICW+PYA VAFYIFTHQGS+FGPIFMTIPAFFAK++A+YNPVIYIMMNKQFRNCM+TT+CCGKNPLGDDEAS TVSKTETSQVAPA
-
-
-
-
-
-
- 0
- 0
- 29
- 101761
- 0.041
- 0.267
- 0.14
-
-
-
-
- 24
- sp|P08100|OPSD_HUMAN
- Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
- 348
-
-
- 1
- gi|12583665|dbj|BAB21486.1|
- fresh water form rod opsin [Conger myriaster]
- BAB21486
- 354
-
-
- 1
- 599.356
- 1544
- 0
- 1
- 341
- 1
- 342
- 0
- 0
- 281
- 314
- 1
- 342
- MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPL-GDDEASATVSKTE
- MNGTEGPNFYIPMSNATGVVRSPFEYPQYYLAEPWAFSALSAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAIERWMVVCKPVTNFRFGESHAIMGVMVTWTMALACALPPLFGWSRYIPEGLQCSCGIDYYTRAPGINNESFVIYMFTCHFSIPLAVISFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVVIMVISFLVCWVPYASVAWYIFTHQGSTFGPIFMTIPSFFAKSSALYNPMIYICMNKQFRHCMITTLCCGKNPFEEEDGASATSSKTE
- MNGTEGPNFY+P SNATGVVRSPFEYPQYYLAEPW FS L+AYMF LI+ GFPINFLTLYVT++HKKLRTPLNYILLNLAVADLFMV GGFT+T+YTS+HGYFVFGPTGCN+EGFFATLGGEIALW LVVLAIER++VVCKP++NFRFGE+HAIMGV TW MALACA PPL GWSRYIPEGLQCSCGIDYYT P +NNESFVIYMF HF+IP+ +I FCYG+LV TVKEAAAQQQES TTQ+AE+EVTRMV+IMVI+FL+CWVPYASVA+YIFTHQGS FGPIFMTIP+FFAKS+A+YNP+IYI MNKQFR+CM+TT+CCGKNP +D ASAT SKTE
-
-
-
-
-
-
- 0
- 0
- 29
- 101761
- 0.041
- 0.267
- 0.14
-
-
-
-
-
\ No newline at end of file
+
+
+ 1
+ sp|Q9BS26|ERP44_HUMAN
+ Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ 406
+
+
+
+
+ 0
+ 0
+ 30
+ 119568
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 2
+ sp|Q9BS26|ERP44_HUMAN
+ Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ 406
+
+
+
+
+ 0
+ 0
+ 30
+ 119568
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 3
+ sp|Q9BS26|ERP44_HUMAN
+ Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ 406
+
+
+
+
+ 0
+ 0
+ 30
+ 119568
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 4
+ sp|Q9BS26|ERP44_HUMAN
+ Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ 406
+
+
+
+
+ 0
+ 0
+ 30
+ 119568
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 5
+ sp|Q9BS26|ERP44_HUMAN
+ Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ 406
+
+
+
+
+ 0
+ 0
+ 30
+ 119568
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 6
+ sp|Q9BS26|ERP44_HUMAN
+ Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ 406
+
+
+
+
+ 0
+ 0
+ 30
+ 119568
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 7
+ sp|Q9NSY1|BMP2K_HUMAN
+ BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+ 1161
+
+
+
+
+ 0
+ 0
+ 38
+ 348130
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 8
+ sp|Q9NSY1|BMP2K_HUMAN
+ BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+ 1161
+
+
+
+
+ 0
+ 0
+ 38
+ 348130
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 9
+ sp|Q9NSY1|BMP2K_HUMAN
+ BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+ 1161
+
+
+
+
+ 0
+ 0
+ 38
+ 348130
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 10
+ sp|Q9NSY1|BMP2K_HUMAN
+ BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+ 1161
+
+
+
+
+ 0
+ 0
+ 38
+ 348130
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 11
+ sp|Q9NSY1|BMP2K_HUMAN
+ BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+ 1161
+
+
+
+
+ 0
+ 0
+ 38
+ 348130
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 12
+ sp|Q9NSY1|BMP2K_HUMAN
+ BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+ 1161
+
+
+
+
+ 0
+ 0
+ 38
+ 348130
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 13
+ sp|P06213|INSR_HUMAN
+ Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+ 1382
+
+
+
+
+ 0
+ 0
+ 39
+ 414987
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 14
+ sp|P06213|INSR_HUMAN
+ Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+ 1382
+
+
+
+
+ 0
+ 0
+ 39
+ 414987
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 15
+ sp|P06213|INSR_HUMAN
+ Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+ 1382
+
+
+
+
+ 0
+ 0
+ 39
+ 414987
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 16
+ sp|P06213|INSR_HUMAN
+ Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+ 1382
+
+
+
+
+ 0
+ 0
+ 39
+ 414987
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 17
+ sp|P06213|INSR_HUMAN
+ Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+ 1382
+
+
+
+
+ 0
+ 0
+ 39
+ 414987
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 18
+ sp|P06213|INSR_HUMAN
+ Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+ 1382
+
+
+
+
+ 0
+ 0
+ 39
+ 414987
+ 0.041
+ 0.267
+ 0.14
+
+
+ No hits found
+
+
+ 19
+ sp|P08100|OPSD_HUMAN
+ Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+ 348
+
+
+ 1
+ gi|57163783|ref|NP_001009242.1|
+ rhodopsin [Felis catus]
+ NP_001009242
+ 348
+
+
+ 1
+ 701.049
+ 1808
+ 0
+ 1
+ 348
+ 1
+ 348
+ 0
+ 0
+ 336
+ 343
+ 0
+ 348
+ MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
+ MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTGSKTETSQVAPA
+ MNGTEGPNFYVPFSN TGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMV GGFT+TLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPL GWSRYIPEG+QCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMI+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMT+PAFFAKS++IYNPVIYIMMNKQFRNCMLTT+CCGKNPLGDDEAS T SKTETSQVAPA
+
+
+
+
+
+
+ 0
+ 0
+ 29
+ 101761
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+ 20
+ sp|P08100|OPSD_HUMAN
+ Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+ 348
+
+
+ 1
+ gi|3024260|sp|P56514.1|OPSD_BUFBU
+ RecName: Full=Rhodopsin
+ P56514
+ 354
+
+
+ 1
+ 619.002
+ 1595
+ 0
+ 1
+ 341
+ 1
+ 342
+ 0
+ 0
+ 290
+ 322
+ 1
+ 342
+ MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEA-SATVSKTE
+ MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSILCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFILGATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMALSCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGEDDASSAATSKTE
+ MNGTEGPNFY+P SN TGVVRSPFEYPQYYLAEPWQ+S+L AYMFLLI+LGFPINF+TLYVT+QHKKLRTPLNYILLNLA A+ FMVL GFT T+Y+S++GYF+ G TGC +EGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRF ENHA+MGVAFTW+MAL+CA PPL GWSRYIPEG+QCSCG+DYYTLKPEVNNESFVIYMFVVHFTIP+IIIFFCYG+LV TVKEAAAQQQESATTQKAEKEVTRMVIIMV+ FLICWVPYASVAF+IF++QGS FGPIFMT+PAFFAKS++IYNPVIYIM+NKQFRNCM+TT+CCGKNP G+D+A SA SKTE
+
+
+
+
+
+
+ 0
+ 0
+ 29
+ 101761
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+ 21
+ sp|P08100|OPSD_HUMAN
+ Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+ 348
+
+
+ 1
+ gi|283855846|gb|ADB45242.1|
+ rhodopsin [Cynopterus brachyotis]
+ ADB45242
+ 328
+
+
+ 1
+ 653.284
+ 1684
+ 0
+ 11
+ 338
+ 1
+ 328
+ 0
+ 0
+ 311
+ 321
+ 0
+ 328
+ VPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVS
+ VPFSNKTGVVRSPFEHPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTAS
+ VPFSN TGVVRSPFE+PQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMV GGFT+TLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMG+A TWVMALACAAPPL GWSRYIPEG+QCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMI+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICW+PYA VAFYIFTHQGSNFGPIFMT+PAFFAKS++IYNPVIYIMMNKQFRNCMLTT+CCGKNPLGDDEAS T S
+
+
+
+
+
+
+ 0
+ 0
+ 29
+ 101761
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+ 22
+ sp|P08100|OPSD_HUMAN
+ Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+ 348
+
+
+ 1
+ gi|283855823|gb|ADB45229.1|
+ rhodopsin [Myotis pilosus]
+ ADB45229
+ 328
+
+
+ 1
+ 631.328
+ 1627
+ 0
+ 11
+ 338
+ 1
+ 328
+ 0
+ 0
+ 311
+ 323
+ 0
+ 328
+ VPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVS
+ VPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSMHGYFVFGATGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWLPYASVAFYIFTHQGSNFGPVFMTIPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTAS
+ VPFSN TGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVA+LFMV GGFT+TLYTS+HGYFVFG TGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMG+AFTWVMALACAAPPLAGWSRYIPEG+QCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMI+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMV+AFLICW+PYASVAFYIFTHQGSNFGP+FMTIPAFFAKS++IYNPVIYIMMNKQFRNCMLTT+CCGKNPLGDDEAS T S
+
+
+
+
+
+
+ 0
+ 0
+ 29
+ 101761
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+ 23
+ sp|P08100|OPSD_HUMAN
+ Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+ 348
+
+
+ 1
+ gi|223523|prf||0811197A
+ rhodopsin [Bos taurus]
+ 0811197A
+ 347
+
+
+ 1
+ 673.315
+ 1736
+ 0
+ 1
+ 348
+ 1
+ 347
+ 0
+ 0
+ 324
+ 336
+ 1
+ 348
+ MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
+ MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGID-YTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA
+ MNGTEGPNFYVPFSN TGVVRSPFE PQYYLAEPWQFSMLAAYMFLLI+LGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMV GGFT+TLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPL GWSRYIPEG+QCSCGID YT E NNESFVIYMFVVHF IP+I+IFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICW+PYA VAFYIFTHQGS+FGPIFMTIPAFFAK++A+YNPVIYIMMNKQFRNCM+TT+CCGKNPLGDDEAS TVSKTETSQVAPA
+
+
+
+
+
+
+ 0
+ 0
+ 29
+ 101761
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+ 24
+ sp|P08100|OPSD_HUMAN
+ Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+ 348
+
+
+ 1
+ gi|12583665|dbj|BAB21486.1|
+ fresh water form rod opsin [Conger myriaster]
+ BAB21486
+ 354
+
+
+ 1
+ 599.356
+ 1544
+ 0
+ 1
+ 341
+ 1
+ 342
+ 0
+ 0
+ 281
+ 314
+ 1
+ 342
+ MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPL-GDDEASATVSKTE
+ MNGTEGPNFYIPMSNATGVVRSPFEYPQYYLAEPWAFSALSAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAIERWMVVCKPVTNFRFGESHAIMGVMVTWTMALACALPPLFGWSRYIPEGLQCSCGIDYYTRAPGINNESFVIYMFTCHFSIPLAVISFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVVIMVISFLVCWVPYASVAWYIFTHQGSTFGPIFMTIPSFFAKSSALYNPMIYICMNKQFRHCMITTLCCGKNPFEEEDGASATSSKTE
+ MNGTEGPNFY+P SNATGVVRSPFEYPQYYLAEPW FS L+AYMF LI+ GFPINFLTLYVT++HKKLRTPLNYILLNLAVADLFMV GGFT+T+YTS+HGYFVFGPTGCN+EGFFATLGGEIALW LVVLAIER++VVCKP++NFRFGE+HAIMGV TW MALACA PPL GWSRYIPEGLQCSCGIDYYT P +NNESFVIYMF HF+IP+ +I FCYG+LV TVKEAAAQQQES TTQ+AE+EVTRMV+IMVI+FL+CWVPYASVA+YIFTHQGS FGPIFMTIP+FFAKS+A+YNP+IYI MNKQFR+CM+TT+CCGKNP +D ASAT SKTE
+
+
+
+
+
+
+ 0
+ 0
+ 29
+ 101761
+ 0.041
+ 0.267
+ 0.14
+
+
+
+
+
+
diff -r 09a68a90d552 -r 98f8431dab44 test-data/blastp_four_human_vs_rhodopsin_converted_ext.tabular
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/blastp_four_human_vs_rhodopsin_converted_ext.tabular Fri Jun 13 07:07:35 2014 -0400
@@ -0,0 +1,6 @@
+sp|P08100|OPSD_HUMAN gi|57163783|ref|NP_001009242.1| 96.55 348 12 0 1 348 1 348 0.0 701 gi|57163783|ref|NP_001009242.1| 1808 336 343 0 98.56 1 1 MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTGSKTETSQVAPA 348 348 rhodopsin [Felis catus]
+sp|P08100|OPSD_HUMAN gi|3024260|sp|P56514.1|OPSD_BUFBU 84.80 342 51 1 1 341 1 342 0.0 619 gi|3024260|sp|P56514.1|OPSD_BUFBU 1595 290 322 1 94.15 1 1 MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEA-SATVSKTE MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSILCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFILGATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMALSCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGEDDASSAATSKTE 348 354 RecName: Full=Rhodopsin
+sp|P08100|OPSD_HUMAN gi|283855846|gb|ADB45242.1| 94.82 328 17 0 11 338 1 328 0.0 653 gi|283855846|gb|ADB45242.1| 1684 311 321 0 97.87 1 1 VPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVS VPFSNKTGVVRSPFEHPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTAS 348 328 rhodopsin [Cynopterus brachyotis]
+sp|P08100|OPSD_HUMAN gi|283855823|gb|ADB45229.1| 94.82 328 17 0 11 338 1 328 0.0 631 gi|283855823|gb|ADB45229.1| 1627 311 323 0 98.48 1 1 VPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVS VPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSMHGYFVFGATGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWLPYASVAFYIFTHQGSNFGPVFMTIPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTAS 348 328 rhodopsin [Myotis pilosus]
+sp|P08100|OPSD_HUMAN gi|223523|prf||0811197A 93.10 348 23 1 1 348 1 347 0.0 673 gi|223523|prf||0811197A 1736 324 336 1 96.55 1 1 MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGID-YTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA 348 347 rhodopsin [Bos taurus]
+sp|P08100|OPSD_HUMAN gi|12583665|dbj|BAB21486.1| 82.16 342 60 1 1 341 1 342 0.0 599 gi|12583665|dbj|BAB21486.1| 1544 281 314 1 91.81 1 1 MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPL-GDDEASATVSKTE MNGTEGPNFYIPMSNATGVVRSPFEYPQYYLAEPWAFSALSAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAIERWMVVCKPVTNFRFGESHAIMGVMVTWTMALACALPPLFGWSRYIPEGLQCSCGIDYYTRAPGINNESFVIYMFTCHFSIPLAVISFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVVIMVISFLVCWVPYASVAWYIFTHQGSTFGPIFMTIPSFFAKSSALYNPMIYICMNKQFRHCMITTLCCGKNPFEEEDGASATSSKTE 348 354 fresh water form rod opsin [Conger myriaster]
diff -r 09a68a90d552 -r 98f8431dab44 test-data/blastp_four_human_vs_rhodopsin_top3.tabular
--- a/test-data/blastp_four_human_vs_rhodopsin_top3.tabular Wed Sep 18 06:07:53 2013 -0400
+++ b/test-data/blastp_four_human_vs_rhodopsin_top3.tabular Fri Jun 13 07:07:35 2014 -0400
@@ -1,25 +1,5 @@
#Query BLAST hit 1 BLAST hit 2 BLAST hit 3
sp|Q9BS26|ERP44_HUMAN
-sp|Q9BS26|ERP44_HUMAN
-sp|Q9BS26|ERP44_HUMAN
-sp|Q9BS26|ERP44_HUMAN
-sp|Q9BS26|ERP44_HUMAN
-sp|Q9BS26|ERP44_HUMAN
sp|Q9NSY1|BMP2K_HUMAN
-sp|Q9NSY1|BMP2K_HUMAN
-sp|Q9NSY1|BMP2K_HUMAN
-sp|Q9NSY1|BMP2K_HUMAN
-sp|Q9NSY1|BMP2K_HUMAN
-sp|Q9NSY1|BMP2K_HUMAN
sp|P06213|INSR_HUMAN
-sp|P06213|INSR_HUMAN
-sp|P06213|INSR_HUMAN
-sp|P06213|INSR_HUMAN
-sp|P06213|INSR_HUMAN
-sp|P06213|INSR_HUMAN
-sp|P08100|OPSD_HUMAN gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]
-sp|P08100|OPSD_HUMAN gi|3024260|sp|P56514.1|OPSD_BUFBU RecName: Full=Rhodopsin
-sp|P08100|OPSD_HUMAN gi|283855846|gb|ADB45242.1| rhodopsin [Cynopterus brachyotis]
-sp|P08100|OPSD_HUMAN gi|283855823|gb|ADB45229.1| rhodopsin [Myotis pilosus]
-sp|P08100|OPSD_HUMAN gi|223523|prf||0811197A rhodopsin [Bos taurus]
-sp|P08100|OPSD_HUMAN gi|12583665|dbj|BAB21486.1| fresh water form rod opsin [Conger myriaster]
+sp|P08100|OPSD_HUMAN gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus] gi|3024260|sp|P56514.1|OPSD_BUFBU RecName: Full=Rhodopsin gi|283855846|gb|ADB45242.1| rhodopsin [Cynopterus brachyotis]
diff -r 09a68a90d552 -r 98f8431dab44 test-data/blastp_four_human_vs_rhodopsin_top3_positive.tabular
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/blastp_four_human_vs_rhodopsin_top3_positive.tabular Fri Jun 13 07:07:35 2014 -0400
@@ -0,0 +1,2 @@
+#Query BLAST hit 1 BLAST hit 2 BLAST hit 3
+sp|P08100|OPSD_HUMAN gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus] gi|3024260|sp|P56514.1|OPSD_BUFBU RecName: Full=Rhodopsin gi|283855846|gb|ADB45242.1| rhodopsin [Cynopterus brachyotis]
diff -r 09a68a90d552 -r 98f8431dab44 tools/blastxml_to_top_descr/README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/blastxml_to_top_descr/README.rst Fri Jun 13 07:07:35 2014 -0400
@@ -0,0 +1,128 @@
+Galaxy tool to extract top BLAST hit descriptions from BLAST XML
+================================================================
+
+This tool is copyright 2012-2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below.
+
+This tool is a short Python script to parse a BLAST XML file, and extract the
+identifiers with description for the top matches (by default the top 3), and
+output these as a simple tabular file along with the query identifiers.
+
+It is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/blastxml_to_top_descr
+
+This requires the 'blast_datatypes' repository from the Galaxy Tool Shed
+to provide the 'blastxml' file format definition.
+
+
+Automated Installation
+======================
+
+This should be straightforward, Galaxy should automatically install the
+'blast_datatypes' dependency.
+
+
+Manual Installation
+===================
+
+If you haven't done so before, first install the 'blast_datatypes' repository.
+
+There are just two files to install (if doing this manually):
+
+* blastxml_to_top_descr.py (the Python script)
+* blastxml_to_top_descr.xml (the Galaxy tool definition)
+
+The suggested location is in the Galaxy folder tools/ncbi_blast_plus next to
+the NCBI BLAST+ tool wrappers.
+
+You will also need to modify the tools_conf.xml file to tell Galaxy to offer
+the tool. e.g. next to the NCBI BLAST+ tools. Simply add the line::
+
+
+
+To run the tool's tests, also add this line to tools_conf.xml.sample then::
+
+ $ sh run_functional_tests.sh -id blastxml_to_top_descr
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1 - Initial version.
+v0.0.2 - Since BLAST+ was moved out of the Galaxy core, now have a dependency
+ on the 'blast_datatypes' repository in the Tool Shed.
+v0.0.3 - Include the test files required to run the unit tests
+v0.0.4 - Quote filenames in case they contain spaces (internal change)
+v0.0.5 - Include number of queries with BLAST matches in stdout (peek text)
+v0.0.6 - Check for errors via the script's return code (internal change)
+v0.0.7 - Link to Tool Shed added to help text and this documentation.
+ - Tweak dependency on blast_datatypes to also work on Test Tool Shed
+ - Adopt standard MIT License.
+v0.0.8 - Development moved to GitHub, https://github.com/peterjc/galaxy_blast
+v0.0.9 - Updated citation information (Cock et al. 2013).
+v0.0.10 - Update help text to mention BLAST+ 2.2.28 can produce tabular files
+ including the description/title (via the salltitles field).
+v0.1.0 - Switch to using an optparse based API for Python script internally.
+ - Support BLAST XML with multiple ```` blocks per query.
+ - Support the default 25 column extended tabular BLAST output.
+======= ======================================================================
+
+
+Bug Reports
+===========
+
+You can file an issue here https://github.com/peterjc/galaxy_blast/issues or ask
+us on the Galaxy development list http://lists.bx.psu.edu/listinfo/galaxy-dev
+
+
+Developers
+==========
+
+This script and related tools were originally developed on the 'tools' branch of
+the following Mercurial repository: https://bitbucket.org/peterjc/galaxy-central/
+
+As of July 2013, development is continuing on a dedicated GitHub repository:
+https://github.com/peterjc/galaxy_blast
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
+the following command from the GitHub repository root folder::
+
+ $ tar -czf blastxml_to_top_descr.tar.gz tools/blastxml_to_top_descr/README.rst tools/blastxml_to_top_descr/blastxml_to_top_descr.* tools/blastxml_to_top_descr/repository_dependencies.xml test-data/blastp_four_human_vs_rhodopsin.xml test-data/blastp_four_human_vs_rhodopsin_top3.tabular test-data/blastp_four_human_vs_rhodopsin_converted_ext.tabular test-data/blastp_four_human_vs_rhodopsin_top3_positive.tabular
+
+Check this worked::
+
+ $ tar -tzf blastxml_to_top_descr.tar.gz
+ tools/blastxml_to_top_descr/README.rst
+ tools/blastxml_to_top_descr/blastxml_to_top_descr.py
+ tools/blastxml_to_top_descr/blastxml_to_top_descr.xml
+ tools/blastxml_to_top_descr/repository_dependencies.xml
+ test-data/blastp_four_human_vs_rhodopsin.xml
+ test-data/blastp_four_human_vs_rhodopsin_top3.tabular
+ test-data/blastp_four_human_vs_rhodopsin_converted_ext.tabular
+ test-data/blastp_four_human_vs_rhodopsin_top3_positive.tabular
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff -r 09a68a90d552 -r 98f8431dab44 tools/blastxml_to_top_descr/blastxml_to_top_descr.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/blastxml_to_top_descr/blastxml_to_top_descr.py Fri Jun 13 07:07:35 2014 -0400
@@ -0,0 +1,255 @@
+#!/usr/bin/env python
+"""Convert a BLAST XML file to a top hits description table.
+
+Takes three command line options, input BLAST XML filename, output tabular
+BLAST filename, number of hits to collect the descriptions of.
+
+Assumes the hits are pre-sorted, so "best" 3 hits gives first 3 hits.
+"""
+import os
+import sys
+import re
+from optparse import OptionParser
+
+if "-v" in sys.argv or "--version" in sys.argv:
+ print "v0.1.0"
+ sys.exit(0)
+
+if sys.version_info[:2] >= ( 2, 5 ):
+ import xml.etree.cElementTree as ElementTree
+else:
+ from galaxy import eggs
+ import pkg_resources; pkg_resources.require( "elementtree" )
+ from elementtree import ElementTree
+
+def stop_err( msg ):
+ sys.stderr.write("%s\n" % msg)
+ sys.exit(1)
+
+usage = """Use as follows:
+
+$ blastxml_to_top_descr.py [-t 3] -o example.tabular input.xml
+
+Or,
+
+$ blastxml_to_top_descr.py [--topN 3] --output example.tabular input.xml
+
+This will take the top 3 BLAST descriptions from the input BLAST XML file,
+writing them to the specified output file in tabular format.
+"""
+
+parser = OptionParser(usage=usage)
+parser.add_option("-t", "--topN", dest="topN", default=3,
+ help="Number of descriptions to collect (in order from file)")
+parser.add_option("-o", "--output", dest="out_file", default=None,
+ help="Output filename for tabular file",
+ metavar="FILE")
+parser.add_option("-f", "--format", dest="format", default="blastxml",
+ help="Input format (blastxml or tabular)")
+parser.add_option("-q", "--qseqid", dest="qseqid", default="1",
+ help="Column for query 'qseqid' (for tabular input; default 1)")
+parser.add_option("-s", "--sseqid", dest="sseqid", default="2",
+ help="Column for subject 'sseqid' (for tabular input; default 2)")
+parser.add_option("-d", "--salltitles", dest="salltitles", default="25",
+ help="Column for descriptions 'salltitles' (for tabular input; default 25)")
+(options, args) = parser.parse_args()
+
+if len(sys.argv) == 4 and len(args) == 3 and not options.out_file:
+ stop_err("""The API has changed, replace this:
+
+$ python blastxml_to_top_descr.py input.xml output.tab 3
+
+with:
+
+$ python blastxml_to_top_descr.py -o output.tab -t 3 input.xml
+
+Sorry.
+""")
+
+if not args:
+ stop_err("Input filename missing, try -h")
+if len(args) > 1:
+ stop_err("Expects a single argument, one input filename")
+in_file = args[0]
+out_file = options.out_file
+topN = options.topN
+
+try:
+ topN = int(topN)
+except ValueError:
+ stop_err("Number of hits argument should be an integer (at least 1)")
+if topN < 1:
+ stop_err("Number of hits argument should be an integer (at least 1)")
+
+if not os.path.isfile(in_file):
+ stop_err("Missing input file: %r" % in_file)
+
+
+def get_column(value):
+ """Convert column number on command line to Python index."""
+ if value.startswith("c"):
+ # Ignore c prefix, e.g. "c1" for "1"
+ value = value[1:]
+ try:
+ col = int(value)
+ except:
+ stop_err("Expected an integer column number, not %r" % value)
+ if col < 1:
+ stop_err("Expect column numbers to be at least one, not %r" % value)
+ return col - 1 # Python counting!
+
+def tabular_hits(in_file, qseqid, sseqid, salltitles):
+ """Parse key data from tabular BLAST output.
+
+ Iterator returning tuples (qseqid, list_of_subject_description)
+ """
+ current_query = None
+ current_hits = []
+ with open(in_file) as input:
+ for line in input:
+ parts = line.rstrip("\n").split("\t")
+ query = parts[qseqid]
+ descr = "%s %s" % (parts[sseqid], parts[salltitles])
+ if current_query is None:
+ # First hit
+ current_query = query
+ current_hits = [descr]
+ elif current_query == query:
+ # Another hit
+ current_hits.append(descr)
+ else:
+ # New query
+ yield current_query, current_hits
+ current_query = query
+ current_hits = [descr]
+ if current_query is not None:
+ # Final query
+ yield current_query, current_hits
+
+def blastxml_hits(in_file):
+ """Parse key data from BLAST XML output.
+
+ Iterator returning tuples (qseqid, list_of_subject_description)
+ """
+ try:
+ context = ElementTree.iterparse(in_file, events=("start", "end"))
+ except:
+ with open(in_file) as handle:
+ header = handle.read(100)
+ stop_err("Invalid data format in XML file %r which starts: %r" % (in_file, header))
+ # turn it into an iterator
+ context = iter(context)
+ # get the root element
+ try:
+ event, root = context.next()
+ except:
+ with open(in_file) as handle:
+ header = handle.read(100)
+ stop_err("Unable to get root element from XML file %r which starts: %r" % (in_file, header))
+
+ re_default_query_id = re.compile("^Query_\d+$")
+ assert re_default_query_id.match("Query_101")
+ assert not re_default_query_id.match("Query_101a")
+ assert not re_default_query_id.match("MyQuery_101")
+ re_default_subject_id = re.compile("^Subject_\d+$")
+ assert re_default_subject_id.match("Subject_1")
+ assert not re_default_subject_id.match("Subject_")
+ assert not re_default_subject_id.match("Subject_12a")
+ assert not re_default_subject_id.match("TheSubject_1")
+
+ count = 0
+ pos_count = 0
+ current_query = None
+ hit_descrs = []
+ for event, elem in context:
+ # for every tag
+ if event == "end" and elem.tag == "Iteration":
+ # Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
+ # sp|Q9BS26|ERP44_HUMAN
+ # Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+ # 406
+ #
+ #
+ # Or, from BLAST 2.2.24+ run online
+ # Query_1
+ # Sample
+ # 516
+ # ...
+ qseqid = elem.findtext("Iteration_query-ID")
+ if qseqid is None:
+ stop_err("Missing (could be really old BLAST XML data?)")
+ if re_default_query_id.match(qseqid):
+ #Place holder ID, take the first word of the query definition
+ qseqid = elem.findtext("Iteration_query-def").split(None,1)[0]
+ if current_query is None:
+ # First hit
+ current_query = qseqid
+ hit_descrs = []
+ elif current_query != qseqid:
+ # New hit
+ yield current_query, hit_descrs
+ current_query = qseqid
+ hit_descrs = []
+ else:
+ # Continuation of previous query
+ # i.e. This BLAST XML did not use one per query
+ # sys.stderr.write("Multiple blocks for %s\n" % qseqid)
+ pass
+ # for every within
+ for hit in elem.findall("Iteration_hits/Hit"):
+ # Expecting either this,
+ # gi|3024260|sp|P56514.1|OPSD_BUFBU
+ # RecName: Full=Rhodopsin
+ # P56514
+ # or,
+ # Subject_1
+ # gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]
+ # Subject_1
+ #
+ #apparently depending on the parse_deflines switch
+ sseqid = hit.findtext("Hit_id").split(None,1)[0]
+ hit_def = sseqid + " " + hit.findtext("Hit_def")
+ if re_default_subject_id.match(sseqid) \
+ and sseqid == hit.findtext("Hit_accession"):
+ #Place holder ID, take the first word of the subject definition
+ hit_def = hit.findtext("Hit_def")
+ sseqid = hit_def.split(None,1)[0]
+ assert hit_def not in hit_descrs
+ hit_descrs.append(hit_def)
+ # prevents ElementTree from growing large datastructure
+ root.clear()
+ elem.clear()
+ if current_query is not None:
+ # Final query
+ yield current_query, hit_descrs
+
+if options.format == "blastxml":
+ hits = blastxml_hits(in_file)
+elif options.format == "tabular":
+ qseqid = get_column(options.qseqid)
+ sseqid = get_column(options.sseqid)
+ salltitles = get_column(options.salltitles)
+ hits = tabular_hits(in_file, qseqid, sseqid, salltitles)
+else:
+ stop_err("Unsupported format: %r" % options.format)
+
+
+def best_hits(descriptions, topN):
+ if len(descriptions) < topN:
+ return descriptions + [""] * (topN - len(descriptions))
+ else:
+ return descriptions[:topN]
+
+count = 0
+if out_file is None:
+ outfile = sys.stdout
+else:
+ outfile = open(out_file, 'w')
+outfile.write("#Query\t%s\n" % "\t".join("BLAST hit %i" % (i+1) for i in range(topN)))
+for query, descrs in hits:
+ count += 1
+ outfile.write("%s\t%s\n" % (query, "\t".join(best_hits(descrs, topN))))
+if out_file is not None:
+ outfile.close()
+# Queries with no hits are not present in tabular BLAST output
+print("%i queries with BLAST results" % count)
diff -r 09a68a90d552 -r 98f8431dab44 tools/blastxml_to_top_descr/blastxml_to_top_descr.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/blastxml_to_top_descr/blastxml_to_top_descr.xml Fri Jun 13 07:07:35 2014 -0400
@@ -0,0 +1,111 @@
+
+ Make a table from BLAST output
+ blastxml_to_top_descr.py --version
+
+blastxml_to_top_descr.py
+-f "$input.in_format"
+#if $input.in_format == "tabular":
+ --qseqid $input.qseqid
+ --sseqid $input.sseqid
+ --salltitles $input.salltitles
+#end if
+-o "${tabular_file}"
+-t ${topN}
+"${in_file}"
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of
+formats including text, tabular and a more detailed XML format. You can
+do a lot of things with tabular files in Galaxy (sorting, filtering, joins,
+etc), however until BLAST+ 2.2.28 the tabular output never included the
+hit descriptions (titles) found in the other output formats.
+
+This tool turns a BLAST XML file into a simple tabular file containing
+one row per query sequence, containing the query identifier and then
+the three (by default) top hit descriptions (i.e. the first three). If
+a query doesn't have that many hits, then these entries are left blank.
+
+This tool can also be used with the tabular output from BLAST+ instead,
+provided the relevant columns are provided. The default settings will
+work with the default 25 column extended output from the BLAST+ tools
+wrapped in Galaxy. Note if a query has *no* hits, it does not appear in
+the BLAST tabular output.
+
+**Example Usage**
+
+One simple usage would be to take a transcriptome assembly or set of
+gene predictions, run a BLAST search against the NCBI NR database, and
+then use this tool to make a table of the top three BLAST hits. This
+can give you a 'quick and dirty' crude annotation, potentially enough
+to spot some problems (e.g. bacterial contaimination could be very
+obvious).
+
+**References**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite:
+
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/blastxml_to_top_descr
+
+
+
diff -r 09a68a90d552 -r 98f8431dab44 tools/blastxml_to_top_descr/repository_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/blastxml_to_top_descr/repository_dependencies.xml Fri Jun 13 07:07:35 2014 -0400
@@ -0,0 +1,4 @@
+
+
+
+