annotate glimmerHMM/BCBio/GFF/GFFParser.py @ 0:c9699375fcf6 draft

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
author bgruening
date Wed, 13 Jul 2016 10:55:48 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
1 """Parse GFF files into features attached to Biopython SeqRecord objects.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
2
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
3 This deals with GFF3 formatted files, a tab delimited format for storing
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
4 sequence features and annotations:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
5
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
6 http://www.sequenceontology.org/gff3.shtml
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
7
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
8 It will also deal with older GFF versions (GTF/GFF2):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
9
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
10 http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
11 http://mblab.wustl.edu/GTF22.html
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
12
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
13 The implementation utilizes map/reduce parsing of GFF using Disco. Disco
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
14 (http://discoproject.org) is a Map-Reduce framework for Python utilizing
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
15 Erlang for parallelization. The code works on a single processor without
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
16 Disco using the same architecture.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
17 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
18 import os
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
19 import copy
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
20 import re
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
21 import collections
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
22 import urllib
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
23 import itertools
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
24
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
25 # Make defaultdict compatible with versions of python older than 2.4
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
26 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
27 collections.defaultdict
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
28 except AttributeError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
29 import _utils
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
30 collections.defaultdict = _utils.defaultdict
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
31
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
32 from Bio.Seq import Seq, UnknownSeq
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
33 from Bio.SeqRecord import SeqRecord
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
34 from Bio.SeqFeature import SeqFeature, FeatureLocation
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
35 from Bio import SeqIO
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
36
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
37 def _gff_line_map(line, params):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
38 """Map part of Map-Reduce; parses a line of GFF into a dictionary.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
39
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
40 Given an input line from a GFF file, this:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
41 - decides if the file passes our filtering limits
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
42 - if so:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
43 - breaks it into component elements
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
44 - determines the type of attribute (flat, parent, child or annotation)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
45 - generates a dictionary of GFF info which can be serialized as JSON
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
46 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
47 gff3_kw_pat = re.compile("\w+=")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
48 def _split_keyvals(keyval_str):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
49 """Split key-value pairs in a GFF2, GTF and GFF3 compatible way.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
50
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
51 GFF3 has key value pairs like:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
52 count=9;gene=amx-2;sequence=SAGE:aacggagccg
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
53 GFF2 and GTF have:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
54 Sequence "Y74C9A" ; Note "Clone Y74C9A; Genbank AC024206"
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
55 name "fgenesh1_pg.C_chr_1000003"; transcriptId 869
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
56 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
57 quals = collections.defaultdict(list)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
58 if keyval_str is None:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
59 return quals
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
60 # ensembl GTF has a stray semi-colon at the end
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
61 if keyval_str[-1] == ';':
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
62 keyval_str = keyval_str[:-1]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
63 # GFF2/GTF has a semi-colon with at least one space after it.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
64 # It can have spaces on both sides; wormbase does this.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
65 # GFF3 works with no spaces.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
66 # Split at the first one we can recognize as working
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
67 parts = keyval_str.split(" ; ")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
68 if len(parts) == 1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
69 parts = keyval_str.split("; ")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
70 if len(parts) == 1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
71 parts = keyval_str.split(";")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
72 # check if we have GFF3 style key-vals (with =)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
73 is_gff2 = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
74 if gff3_kw_pat.match(parts[0]):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
75 is_gff2 = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
76 key_vals = [p.split('=') for p in parts]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
77 # otherwise, we are separated by a space with a key as the first item
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
78 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
79 pieces = []
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
80 for p in parts:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
81 # fix misplaced semi-colons in keys in some GFF2 files
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
82 if p and p[0] == ';':
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
83 p = p[1:]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
84 pieces.append(p.strip().split(" "))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
85 key_vals = [(p[0], " ".join(p[1:])) for p in pieces]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
86 for item in key_vals:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
87 # standard in-spec items are key=value
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
88 if len(item) == 2:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
89 key, val = item
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
90 # out-of-spec files can have just key values. We set an empty value
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
91 # which will be changed to true later to standardize.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
92 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
93 assert len(item) == 1, item
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
94 key = item[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
95 val = ''
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
96 # remove quotes in GFF2 files
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
97 if (len(val) > 0 and val[0] == '"' and val[-1] == '"'):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
98 val = val[1:-1]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
99 if val:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
100 quals[key].extend([v for v in val.split(',') if v])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
101 # if we don't have a value, make this a key=True/False style
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
102 # attribute
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
103 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
104 quals[key].append('true')
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
105 for key, vals in quals.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
106 quals[key] = [urllib.unquote(v) for v in vals]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
107 return quals, is_gff2
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
108
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
109 def _nest_gff2_features(gff_parts):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
110 """Provide nesting of GFF2 transcript parts with transcript IDs.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
111
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
112 exons and coding sequences are mapped to a parent with a transcript_id
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
113 in GFF2. This is implemented differently at different genome centers
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
114 and this function attempts to resolve that and map things to the GFF3
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
115 way of doing them.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
116 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
117 # map protein or transcript ids to a parent
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
118 for transcript_id in ["transcript_id", "transcriptId", "proteinId"]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
119 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
120 gff_parts["quals"]["Parent"] = \
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
121 gff_parts["quals"][transcript_id]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
122 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
123 except KeyError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
124 pass
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
125 # case for WormBase GFF -- everything labelled as Transcript or CDS
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
126 for flat_name in ["Transcript", "CDS"]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
127 if gff_parts["quals"].has_key(flat_name):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
128 # parent types
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
129 if gff_parts["type"] in [flat_name]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
130 if not gff_parts["id"]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
131 gff_parts["id"] = gff_parts["quals"][flat_name][0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
132 gff_parts["quals"]["ID"] = [gff_parts["id"]]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
133 # children types
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
134 elif gff_parts["type"] in ["intron", "exon", "three_prime_UTR",
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
135 "coding_exon", "five_prime_UTR", "CDS", "stop_codon",
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
136 "start_codon"]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
137 gff_parts["quals"]["Parent"] = gff_parts["quals"][flat_name]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
138 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
139
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
140 return gff_parts
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
141
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
142 strand_map = {'+' : 1, '-' : -1, '?' : None, None: None}
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
143 line = line.strip()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
144 if line[:2] == "##":
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
145 return [('directive', line[2:])]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
146 elif line and line[0] != "#":
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
147 parts = line.split('\t')
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
148 should_do = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
149 if params.limit_info:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
150 for limit_name, limit_values in params.limit_info.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
151 cur_id = tuple([parts[i] for i in
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
152 params.filter_info[limit_name]])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
153 if cur_id not in limit_values:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
154 should_do = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
155 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
156 if should_do:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
157 assert len(parts) >= 8, line
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
158 # not python2.4 compatible but easier to understand
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
159 #gff_parts = [(None if p == '.' else p) for p in parts]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
160 gff_parts = []
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
161 for p in parts:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
162 if p == ".":
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
163 gff_parts.append(None)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
164 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
165 gff_parts.append(p)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
166 gff_info = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
167 # collect all of the base qualifiers for this item
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
168 if len(parts) > 8:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
169 quals, is_gff2 = _split_keyvals(gff_parts[8])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
170 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
171 quals, is_gff2 = dict(), False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
172 gff_info["is_gff2"] = is_gff2
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
173 if gff_parts[1]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
174 quals["source"].append(gff_parts[1])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
175 if gff_parts[5]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
176 quals["score"].append(gff_parts[5])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
177 if gff_parts[7]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
178 quals["phase"].append(gff_parts[7])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
179 gff_info['quals'] = dict(quals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
180 gff_info['rec_id'] = gff_parts[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
181 # if we are describing a location, then we are a feature
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
182 if gff_parts[3] and gff_parts[4]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
183 gff_info['location'] = [int(gff_parts[3]) - 1,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
184 int(gff_parts[4])]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
185 gff_info['type'] = gff_parts[2]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
186 gff_info['id'] = quals.get('ID', [''])[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
187 gff_info['strand'] = strand_map.get(gff_parts[6], None)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
188 if is_gff2:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
189 gff_info = _nest_gff2_features(gff_info)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
190 # features that have parents need to link so we can pick up
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
191 # the relationship
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
192 if gff_info['quals'].has_key('Parent'):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
193 # check for self referential parent/child relationships
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
194 # remove the ID, which is not useful
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
195 for p in gff_info['quals']['Parent']:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
196 if p == gff_info['id']:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
197 gff_info['id'] = ''
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
198 del gff_info['quals']['ID']
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
199 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
200 final_key = 'child'
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
201 elif gff_info['id']:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
202 final_key = 'parent'
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
203 # Handle flat features
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
204 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
205 final_key = 'feature'
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
206 # otherwise, associate these annotations with the full record
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
207 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
208 final_key = 'annotation'
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
209 if params.jsonify:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
210 return [(final_key, simplejson.dumps(gff_info))]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
211 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
212 return [(final_key, gff_info)]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
213 return []
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
214
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
215 def _gff_line_reduce(map_results, out, params):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
216 """Reduce part of Map-Reduce; combines results of parsed features.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
217 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
218 final_items = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
219 for gff_type, final_val in map_results:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
220 if params.jsonify and gff_type not in ['directive']:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
221 final_val = simplejson.loads(final_val)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
222 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
223 final_items[gff_type].append(final_val)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
224 except KeyError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
225 final_items[gff_type] = [final_val]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
226 for key, vals in final_items.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
227 if params.jsonify:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
228 vals = simplejson.dumps(vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
229 out.add(key, vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
230
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
231 class _MultiIDRemapper:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
232 """Provide an ID remapping for cases where a parent has a non-unique ID.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
233
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
234 Real life GFF3 cases have non-unique ID attributes, which we fix here
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
235 by using the unique sequence region to assign children to the right
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
236 parent.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
237 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
238 def __init__(self, base_id, all_parents):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
239 self._base_id = base_id
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
240 self._parents = all_parents
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
241
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
242 def remap_id(self, feature_dict):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
243 rstart, rend = feature_dict['location']
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
244 for index, parent in enumerate(self._parents):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
245 pstart, pend = parent['location']
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
246 if rstart >= pstart and rend <= pend:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
247 if index > 0:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
248 return ("%s_%s" % (self._base_id, index + 1))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
249 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
250 return self._base_id
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
251 raise ValueError("Did not find remapped ID location: %s, %s, %s" % (
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
252 self._base_id, [p['location'] for p in self._parents],
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
253 feature_dict['location']))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
254
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
255 class _AbstractMapReduceGFF:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
256 """Base class providing general GFF parsing for local and remote classes.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
257
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
258 This class should be subclassed to provide a concrete class to parse
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
259 GFF under specific conditions. These classes need to implement
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
260 the _gff_process function, which returns a dictionary of SeqRecord
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
261 information.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
262 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
263 def __init__(self, create_missing=True):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
264 """Initialize GFF parser
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
265
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
266 create_missing - If True, create blank records for GFF ids not in
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
267 the base_dict. If False, an error will be raised.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
268 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
269 self._create_missing = create_missing
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
270 self._map_fn = _gff_line_map
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
271 self._reduce_fn = _gff_line_reduce
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
272 self._examiner = GFFExaminer()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
273
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
274 def _gff_process(self, gff_files, limit_info, target_lines=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
275 raise NotImplementedError("Derived class must define")
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
276
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
277 def parse(self, gff_files, base_dict=None, limit_info=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
278 """Parse a GFF file, returning an iterator of SeqRecords.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
279
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
280 limit_info - A dictionary specifying the regions of the GFF file
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
281 which should be extracted. This allows only relevant portions of a file
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
282 to be parsed.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
283
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
284 base_dict - A base dictionary of SeqRecord objects which may be
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
285 pre-populated with sequences and other features. The new features from
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
286 the GFF file will be added to this dictionary.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
287 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
288 for rec in self.parse_in_parts(gff_files, base_dict, limit_info):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
289 yield rec
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
290
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
291 def parse_in_parts(self, gff_files, base_dict=None, limit_info=None,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
292 target_lines=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
293 """Parse a region of a GFF file specified, returning info as generated.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
294
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
295 target_lines -- The number of lines in the file which should be used
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
296 for each partial parse. This should be determined based on available
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
297 memory.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
298 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
299 for results in self.parse_simple(gff_files, limit_info, target_lines):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
300 if base_dict is None:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
301 cur_dict = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
302 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
303 cur_dict = copy.deepcopy(base_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
304 cur_dict = self._results_to_features(cur_dict, results)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
305 all_ids = cur_dict.keys()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
306 all_ids.sort()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
307 for cur_id in all_ids:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
308 yield cur_dict[cur_id]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
309
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
310 def parse_simple(self, gff_files, limit_info=None, target_lines=1):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
311 """Simple parse which does not build or nest features.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
312
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
313 This returns a simple dictionary representation of each line in the
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
314 GFF file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
315 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
316 # gracefully handle a single file passed
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
317 if not isinstance(gff_files, (list, tuple)):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
318 gff_files = [gff_files]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
319 limit_info = self._normalize_limit_info(limit_info)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
320 for results in self._gff_process(gff_files, limit_info, target_lines):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
321 yield results
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
322
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
323 def _normalize_limit_info(self, limit_info):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
324 """Turn all limit information into tuples for identical comparisons.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
325 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
326 final_limit_info = {}
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
327 if limit_info:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
328 for key, values in limit_info.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
329 final_limit_info[key] = []
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
330 for v in values:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
331 if isinstance(v, str):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
332 final_limit_info[key].append((v,))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
333 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
334 final_limit_info[key].append(tuple(v))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
335 return final_limit_info
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
336
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
337 def _results_to_features(self, base, results):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
338 """Add parsed dictionaries of results to Biopython SeqFeatures.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
339 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
340 base = self._add_annotations(base, results.get('annotation', []))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
341 for feature in results.get('feature', []):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
342 (_, base) = self._add_toplevel_feature(base, feature)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
343 base = self._add_parent_child_features(base, results.get('parent', []),
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
344 results.get('child', []))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
345 base = self._add_seqs(base, results.get('fasta', []))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
346 base = self._add_directives(base, results.get('directive', []))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
347 return base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
348
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
349 def _add_directives(self, base, directives):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
350 """Handle any directives or meta-data in the GFF file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
351
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
352 Relevant items are added as annotation meta-data to each record.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
353 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
354 dir_keyvals = collections.defaultdict(list)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
355 for directive in directives:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
356 parts = directive.split()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
357 if len(parts) > 1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
358 key = parts[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
359 if len(parts) == 2:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
360 val = parts[1]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
361 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
362 val = tuple(parts[1:])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
363 dir_keyvals[key].append(val)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
364 for key, vals in dir_keyvals.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
365 for rec in base.values():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
366 self._add_ann_to_rec(rec, key, vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
367 return base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
368
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
369 def _add_seqs(self, base, recs):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
370 """Add sequence information contained in the GFF3 to records.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
371 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
372 for rec in recs:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
373 if base.has_key(rec.id):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
374 base[rec.id].seq = rec.seq
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
375 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
376 base[rec.id] = rec
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
377 return base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
378
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
379 def _add_parent_child_features(self, base, parents, children):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
380 """Add nested features with parent child relationships.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
381 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
382 multi_remap = self._identify_dup_ids(parents)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
383 # add children features
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
384 children_prep = collections.defaultdict(list)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
385 for child_dict in children:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
386 child_feature = self._get_feature(child_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
387 for pindex, pid in enumerate(child_feature.qualifiers['Parent']):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
388 if multi_remap.has_key(pid):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
389 pid = multi_remap[pid].remap_id(child_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
390 child_feature.qualifiers['Parent'][pindex] = pid
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
391 children_prep[pid].append((child_dict['rec_id'],
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
392 child_feature))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
393 children = dict(children_prep)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
394 # add children to parents that exist
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
395 for cur_parent_dict in parents:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
396 cur_id = cur_parent_dict['id']
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
397 if multi_remap.has_key(cur_id):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
398 cur_parent_dict['id'] = multi_remap[cur_id].remap_id(
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
399 cur_parent_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
400 cur_parent, base = self._add_toplevel_feature(base, cur_parent_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
401 cur_parent, children = self._add_children_to_parent(cur_parent,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
402 children)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
403 # create parents for children without them (GFF2 or split/bad files)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
404 while len(children) > 0:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
405 parent_id, cur_children = itertools.islice(children.items(),
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
406 1).next()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
407 # one child, do not nest it
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
408 if len(cur_children) == 1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
409 rec_id, child = cur_children[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
410 loc = (child.location.nofuzzy_start, child.location.nofuzzy_end)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
411 rec, base = self._get_rec(base,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
412 dict(rec_id=rec_id, location=loc))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
413 rec.features.append(child)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
414 del children[parent_id]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
415 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
416 cur_parent, base = self._add_missing_parent(base, parent_id,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
417 cur_children)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
418 cur_parent, children = self._add_children_to_parent(cur_parent,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
419 children)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
420 return base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
421
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
422 def _identify_dup_ids(self, parents):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
423 """Identify duplicated ID attributes in potential nested parents.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
424
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
425 According to the GFF3 spec ID attributes are supposed to be unique
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
426 for a file, but this is not always true in practice. This looks
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
427 for duplicates, and provides unique IDs sorted by locations.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
428 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
429 multi_ids = collections.defaultdict(list)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
430 for parent in parents:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
431 multi_ids[parent['id']].append(parent)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
432 multi_ids = [(mid, parents) for (mid, parents) in multi_ids.items()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
433 if len(parents) > 1]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
434 multi_remap = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
435 for mid, parents in multi_ids:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
436 multi_remap[mid] = _MultiIDRemapper(mid, parents)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
437 return multi_remap
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
438
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
439 def _add_children_to_parent(self, cur_parent, children):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
440 """Recursively add children to parent features.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
441 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
442 if children.has_key(cur_parent.id):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
443 cur_children = children[cur_parent.id]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
444 for rec_id, cur_child in cur_children:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
445 cur_child, children = self._add_children_to_parent(cur_child,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
446 children)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
447 cur_parent.location_operator = "join"
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
448 cur_parent.sub_features.append(cur_child)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
449 del children[cur_parent.id]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
450 return cur_parent, children
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
451
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
452 def _add_annotations(self, base, anns):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
453 """Add annotation data from the GFF file to records.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
454 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
455 # add these as a list of annotations, checking not to overwrite
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
456 # current values
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
457 for ann in anns:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
458 rec, base = self._get_rec(base, ann)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
459 for key, vals in ann['quals'].items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
460 self._add_ann_to_rec(rec, key, vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
461 return base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
462
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
463 def _add_ann_to_rec(self, rec, key, vals):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
464 """Add a key/value annotation to the given SeqRecord.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
465 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
466 if rec.annotations.has_key(key):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
467 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
468 rec.annotations[key].extend(vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
469 except AttributeError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
470 rec.annotations[key] = [rec.annotations[key]] + vals
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
471 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
472 rec.annotations[key] = vals
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
473
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
474 def _get_rec(self, base, info_dict):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
475 """Retrieve a record to add features to.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
476 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
477 max_loc = info_dict.get('location', (0, 1))[1]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
478 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
479 cur_rec = base[info_dict['rec_id']]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
480 # update generated unknown sequences with the expected maximum length
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
481 if isinstance(cur_rec.seq, UnknownSeq):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
482 cur_rec.seq._length = max([max_loc, cur_rec.seq._length])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
483 return cur_rec, base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
484 except KeyError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
485 if self._create_missing:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
486 new_rec = SeqRecord(UnknownSeq(max_loc), info_dict['rec_id'])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
487 base[info_dict['rec_id']] = new_rec
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
488 return new_rec, base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
489 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
490 raise
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
491
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
492 def _add_missing_parent(self, base, parent_id, cur_children):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
493 """Add a new feature that is missing from the GFF file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
494 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
495 base_rec_id = list(set(c[0] for c in cur_children))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
496 assert len(base_rec_id) == 1
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
497 feature_dict = dict(id=parent_id, strand=None,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
498 type="inferred_parent", quals=dict(ID=[parent_id]),
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
499 rec_id=base_rec_id[0])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
500 coords = [(c.location.nofuzzy_start, c.location.nofuzzy_end)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
501 for r, c in cur_children]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
502 feature_dict["location"] = (min([c[0] for c in coords]),
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
503 max([c[1] for c in coords]))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
504 return self._add_toplevel_feature(base, feature_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
505
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
506 def _add_toplevel_feature(self, base, feature_dict):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
507 """Add a toplevel non-nested feature to the appropriate record.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
508 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
509 new_feature = self._get_feature(feature_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
510 rec, base = self._get_rec(base, feature_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
511 rec.features.append(new_feature)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
512 return new_feature, base
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
513
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
514 def _get_feature(self, feature_dict):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
515 """Retrieve a Biopython feature from our dictionary representation.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
516 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
517 location = FeatureLocation(*feature_dict['location'])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
518 new_feature = SeqFeature(location, feature_dict['type'],
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
519 id=feature_dict['id'], strand=feature_dict['strand'])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
520 new_feature.qualifiers = feature_dict['quals']
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
521 return new_feature
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
522
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
523 def _parse_fasta(self, in_handle):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
524 """Parse FASTA sequence information contained in the GFF3 file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
525 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
526 return list(SeqIO.parse(in_handle, "fasta"))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
527
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
528 class _GFFParserLocalOut:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
529 """Provide a collector for local GFF MapReduce file parsing.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
530 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
531 def __init__(self, smart_breaks=False):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
532 self._items = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
533 self._smart_breaks = smart_breaks
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
534 self._missing_keys = collections.defaultdict(int)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
535 self._last_parent = None
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
536 self.can_break = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
537 self.num_lines = 0
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
538
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
539 def add(self, key, vals):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
540 if self._smart_breaks:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
541 # if we are not GFF2 we expect parents and break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
542 # based on not having missing ones
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
543 if key == 'directive':
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
544 if vals[0] == '#':
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
545 self.can_break = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
546 self._last_parent = None
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
547 elif not vals[0].get("is_gff2", False):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
548 self._update_missing_parents(key, vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
549 self.can_break = (len(self._missing_keys) == 0)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
550 # break when we are done with stretches of child features
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
551 elif key != 'child':
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
552 self.can_break = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
553 self._last_parent = None
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
554 # break when we have lots of child features in a row
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
555 # and change between parents
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
556 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
557 cur_parent = vals[0]["quals"]["Parent"][0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
558 if (self._last_parent):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
559 self.can_break = (cur_parent != self._last_parent)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
560 self._last_parent = cur_parent
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
561 self.num_lines += 1
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
562 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
563 self._items[key].extend(vals)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
564 except KeyError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
565 self._items[key] = vals
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
566
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
567 def _update_missing_parents(self, key, vals):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
568 # smart way of deciding if we can break this.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
569 # if this is too much, can go back to not breaking in the
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
570 # middle of children
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
571 if key in ["child"]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
572 for val in vals:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
573 for p_id in val["quals"]["Parent"]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
574 self._missing_keys[p_id] += 1
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
575 for val in vals:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
576 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
577 del self._missing_keys[val["quals"]["ID"][0]]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
578 except KeyError:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
579 pass
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
580
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
581 def has_items(self):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
582 return len(self._items) > 0
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
583
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
584 def get_results(self):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
585 self._last_parent = None
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
586 return self._items
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
587
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
588 class GFFParser(_AbstractMapReduceGFF):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
589 """Local GFF parser providing standardized parsing of GFF3 and GFF2 files.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
590 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
591 def __init__(self, line_adjust_fn=None, create_missing=True):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
592 _AbstractMapReduceGFF.__init__(self, create_missing=create_missing)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
593 self._line_adjust_fn = line_adjust_fn
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
594
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
595 def _gff_process(self, gff_files, limit_info, target_lines):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
596 """Process GFF addition without any parallelization.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
597
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
598 In addition to limit filtering, this accepts a target_lines attribute
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
599 which provides a number of lines to parse before returning results.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
600 This allows partial parsing of a file to prevent memory issues.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
601 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
602 line_gen = self._file_line_generator(gff_files)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
603 for out in self._lines_to_out_info(line_gen, limit_info, target_lines):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
604 yield out
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
605
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
606 def _file_line_generator(self, gff_files):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
607 """Generate single lines from a set of GFF files.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
608 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
609 for gff_file in gff_files:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
610 if hasattr(gff_file, "read"):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
611 need_close = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
612 in_handle = gff_file
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
613 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
614 need_close = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
615 in_handle = open(gff_file)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
616 found_seqs = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
617 while 1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
618 line = in_handle.readline()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
619 if not line:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
620 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
621 yield line
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
622 if need_close:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
623 in_handle.close()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
624
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
625 def _lines_to_out_info(self, line_iter, limit_info=None,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
626 target_lines=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
627 """Generate SeqRecord and SeqFeatures from GFF file lines.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
628 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
629 params = self._examiner._get_local_params(limit_info)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
630 out_info = _GFFParserLocalOut((target_lines is not None and
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
631 target_lines > 1))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
632 found_seqs = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
633 for line in line_iter:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
634 results = self._map_fn(line, params)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
635 if self._line_adjust_fn and results:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
636 if results[0][0] not in ['directive']:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
637 results = [(results[0][0],
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
638 self._line_adjust_fn(results[0][1]))]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
639 self._reduce_fn(results, out_info, params)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
640 if (target_lines and out_info.num_lines >= target_lines and
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
641 out_info.can_break):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
642 yield out_info.get_results()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
643 out_info = _GFFParserLocalOut((target_lines is not None and
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
644 target_lines > 1))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
645 if (results and results[0][0] == 'directive' and
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
646 results[0][1] == 'FASTA'):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
647 found_seqs = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
648 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
649
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
650 class FakeHandle:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
651 def __init__(self, line_iter):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
652 self._iter = line_iter
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
653 def read(self):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
654 return "".join(l for l in self._iter)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
655 def readline(self):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
656 try:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
657 return self._iter.next()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
658 except StopIteration:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
659 return ""
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
660
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
661 if found_seqs:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
662 fasta_recs = self._parse_fasta(FakeHandle(line_iter))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
663 out_info.add('fasta', fasta_recs)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
664 if out_info.has_items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
665 yield out_info.get_results()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
666
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
667 class DiscoGFFParser(_AbstractMapReduceGFF):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
668 """GFF Parser with parallelization through Disco (http://discoproject.org.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
669 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
670 def __init__(self, disco_host, create_missing=True):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
671 """Initialize parser.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
672
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
673 disco_host - Web reference to a Disco host which will be used for
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
674 parallelizing the GFF reading job.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
675 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
676 _AbstractMapReduceGFF.__init__(self, create_missing=create_missing)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
677 self._disco_host = disco_host
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
678
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
679 def _gff_process(self, gff_files, limit_info, target_lines=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
680 """Process GFF addition, using Disco to parallelize the process.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
681 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
682 assert target_lines is None, "Cannot split parallelized jobs"
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
683 # make these imports local; only need them when using disco
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
684 import simplejson
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
685 import disco
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
686 # absolute path names unless they are special disco files
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
687 full_files = []
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
688 for f in gff_files:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
689 if f.split(":")[0] != "disco":
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
690 full_files.append(os.path.abspath(f))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
691 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
692 full_files.append(f)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
693 results = disco.job(self._disco_host, name="gff_reader",
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
694 input=full_files,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
695 params=disco.Params(limit_info=limit_info, jsonify=True,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
696 filter_info=self._examiner._filter_info),
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
697 required_modules=["simplejson", "collections", "re"],
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
698 map=self._map_fn, reduce=self._reduce_fn)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
699 processed = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
700 for out_key, out_val in disco.result_iterator(results):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
701 processed[out_key] = simplejson.loads(out_val)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
702 yield processed
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
703
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
704 def parse(gff_files, base_dict=None, limit_info=None, target_lines=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
705 """High level interface to parse GFF files into SeqRecords and SeqFeatures.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
706 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
707 parser = GFFParser()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
708 for rec in parser.parse_in_parts(gff_files, base_dict, limit_info,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
709 target_lines):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
710 yield rec
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
711
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
712 def parse_simple(gff_files, limit_info=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
713 """Parse GFF files as line by line dictionary of parts.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
714 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
715 parser = GFFParser()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
716 for rec in parser.parse_simple(gff_files, limit_info=limit_info):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
717 yield rec["child"][0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
718
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
719 def _file_or_handle(fn):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
720 """Decorator to handle either an input handle or a file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
721 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
722 def _file_or_handle_inside(*args, **kwargs):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
723 in_file = args[1]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
724 if hasattr(in_file, "read"):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
725 need_close = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
726 in_handle = in_file
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
727 else:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
728 need_close = True
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
729 in_handle = open(in_file)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
730 args = (args[0], in_handle) + args[2:]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
731 out = fn(*args, **kwargs)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
732 if need_close:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
733 in_handle.close()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
734 return out
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
735 return _file_or_handle_inside
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
736
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
737 class GFFExaminer:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
738 """Provide high level details about a GFF file to refine parsing.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
739
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
740 GFF is a spec and is provided by many different centers. Real life files
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
741 will present the same information in slightly different ways. Becoming
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
742 familiar with the file you are dealing with is the best way to extract the
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
743 information you need. This class provides high level summary details to
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
744 help in learning.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
745 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
746 def __init__(self):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
747 self._filter_info = dict(gff_id = [0], gff_source_type = [1, 2],
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
748 gff_source = [1], gff_type = [2])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
749
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
750 def _get_local_params(self, limit_info=None):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
751 class _LocalParams:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
752 def __init__(self):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
753 self.jsonify = False
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
754 params = _LocalParams()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
755 params.limit_info = limit_info
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
756 params.filter_info = self._filter_info
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
757 return params
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
758
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
759 @_file_or_handle
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
760 def available_limits(self, gff_handle):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
761 """Return dictionary information on possible limits for this file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
762
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
763 This returns a nested dictionary with the following structure:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
764
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
765 keys -- names of items to filter by
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
766 values -- dictionary with:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
767 keys -- filter choice
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
768 value -- counts of that filter in this file
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
769
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
770 Not a parallelized map-reduce implementation.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
771 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
772 cur_limits = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
773 for filter_key in self._filter_info.keys():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
774 cur_limits[filter_key] = collections.defaultdict(int)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
775 for line in gff_handle:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
776 # when we hit FASTA sequences, we are done with annotations
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
777 if line.startswith("##FASTA"):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
778 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
779 # ignore empty and comment lines
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
780 if line.strip() and line.strip()[0] != "#":
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
781 parts = [p.strip() for p in line.split('\t')]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
782 assert len(parts) == 9, line
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
783 for filter_key, cur_indexes in self._filter_info.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
784 cur_id = tuple([parts[i] for i in cur_indexes])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
785 cur_limits[filter_key][cur_id] += 1
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
786 # get rid of the default dicts
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
787 final_dict = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
788 for key, value_dict in cur_limits.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
789 if len(key) == 1:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
790 key = key[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
791 final_dict[key] = dict(value_dict)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
792 gff_handle.close()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
793 return final_dict
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
794
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
795 @_file_or_handle
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
796 def parent_child_map(self, gff_handle):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
797 """Provide a mapping of parent to child relationships in the file.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
798
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
799 Returns a dictionary of parent child relationships:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
800
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
801 keys -- tuple of (source, type) for each parent
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
802 values -- tuple of (source, type) as children of that parent
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
803
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
804 Not a parallelized map-reduce implementation.
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
805 """
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
806 # collect all of the parent and child types mapped to IDs
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
807 parent_sts = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
808 child_sts = collections.defaultdict(list)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
809 for line in gff_handle:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
810 # when we hit FASTA sequences, we are done with annotations
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
811 if line.startswith("##FASTA"):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
812 break
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
813 if line.strip():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
814 line_type, line_info = _gff_line_map(line,
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
815 self._get_local_params())[0]
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
816 if (line_type == 'parent' or (line_type == 'child' and
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
817 line_info['id'])):
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
818 parent_sts[line_info['id']] = (
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
819 line_info['quals']['source'][0], line_info['type'])
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
820 if line_type == 'child':
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
821 for parent_id in line_info['quals']['Parent']:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
822 child_sts[parent_id].append((
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
823 line_info['quals']['source'][0], line_info['type']))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
824 #print parent_sts, child_sts
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
825 # generate a dictionary of the unique final type relationships
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
826 pc_map = collections.defaultdict(list)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
827 for parent_id, parent_type in parent_sts.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
828 for child_type in child_sts[parent_id]:
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
829 pc_map[parent_type].append(child_type)
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
830 pc_final_map = dict()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
831 for ptype, ctypes in pc_map.items():
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
832 unique_ctypes = list(set(ctypes))
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
833 unique_ctypes.sort()
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
834 pc_final_map[ptype] = unique_ctypes
c9699375fcf6 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/glimmer_hmm commit 0dc67759bcbdf5a8a285ded9ba751340d741fe63
bgruening
parents:
diff changeset
835 return pc_final_map