Mercurial > repos > vipints > fml_mergeloci
annotate fml_gff_groomer/scripts/gff_id_mapper.py @ 0:79726c328621 default tip
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
author | vipints |
---|---|
date | Tue, 07 Jun 2011 17:29:24 -0400 |
parents | |
children |
rev | line source |
---|---|
0
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
1 #!/usr/bin/env python |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
2 # |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
3 # This program is free software; you can redistribute it and/or modify |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
4 # it under the terms of the GNU General Public License as published by |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
5 # the Free Software Foundation; either version 3 of the License, or |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
6 # (at your option) any later version. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
7 # |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
8 # Written (W) 2010 Vipin T Sreedharan, Friedrich Miescher Laboratory of the Max Planck Society |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
9 # Copyright (C) 2010 Max Planck Society |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
10 # |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
11 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
12 # Description : Provides feature to sub feature identifier mapping in a given GFF3 file. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
13 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
14 import re, sys |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
15 import collections |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
16 import urllib |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
17 import time |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
18 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
19 def _gff_line_map(line): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
20 """Parses a line of GFF into a dictionary. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
21 Given an input line from a GFF file, this: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
22 - breaks it into component elements |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
23 - determines the type of attribute (flat, parent, child or annotation) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
24 - generates a dictionary of GFF info |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
25 """ |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
26 gff3_kw_pat = re.compile("\w+=") |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
27 def _split_keyvals(keyval_str): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
28 """Split key-value pairs in a GFF2, GTF and GFF3 compatible way. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
29 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
30 GFF3 has key value pairs like: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
31 count=9;gene=amx-2;sequence=SAGE:aacggagccg |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
32 GFF2 and GTF have: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
33 Sequence "Y74C9A" ; Note "Clone Y74C9A; Genbank AC024206" |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
34 name "fgenesh1_pg.C_chr_1000003"; transcriptId 869 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
35 """ |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
36 quals = collections.defaultdict(list) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
37 if keyval_str is None: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
38 return quals |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
39 # ensembl GTF has a stray semi-colon at the end |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
40 if keyval_str[-1] == ';': |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
41 keyval_str = keyval_str[:-1] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
42 # GFF2/GTF has a semi-colon with at least one space after it. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
43 # It can have spaces on both sides; wormbase does this. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
44 # GFF3 works with no spaces. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
45 # Split at the first one we can recognize as working |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
46 parts = keyval_str.split(" ; ") |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
47 if len(parts) == 1: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
48 parts = keyval_str.split("; ") |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
49 if len(parts) == 1: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
50 parts = keyval_str.split(";") |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
51 # check if we have GFF3 style key-vals (with =) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
52 is_gff2 = True |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
53 if gff3_kw_pat.match(parts[0]): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
54 is_gff2 = False |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
55 key_vals = [p.split('=') for p in parts] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
56 # otherwise, we are separated by a space with a key as the first item |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
57 else: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
58 pieces = [] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
59 for p in parts: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
60 # fix misplaced semi-colons in keys in some GFF2 files |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
61 if p and p[0] == ';': |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
62 p = p[1:] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
63 pieces.append(p.strip().split(" ")) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
64 key_vals = [(p[0], " ".join(p[1:])) for p in pieces] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
65 for key, val in key_vals: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
66 # remove quotes in GFF2 files |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
67 if (len(val) > 0 and val[0] == '"' and val[-1] == '"'): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
68 val = val[1:-1] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
69 if val: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
70 quals[key].extend(val.split(',')) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
71 # if we don't have a value, make this a key=True/False style |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
72 # attribute |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
73 else: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
74 quals[key].append('true') |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
75 for key, vals in quals.items(): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
76 quals[key] = [urllib.unquote(v) for v in vals] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
77 return quals, is_gff2 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
78 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
79 def _nest_gff2_features(gff_parts): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
80 """Provide nesting of GFF2 transcript parts with transcript IDs. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
81 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
82 exons and coding sequences are mapped to a parent with a transcript_id |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
83 in GFF2. This is implemented differently at different genome centers |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
84 and this function attempts to resolve that and map things to the GFF3 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
85 way of doing them. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
86 """ |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
87 # map protein or transcript ids to a parent |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
88 for transcript_id in ["transcript_id", "transcriptId", "proteinId"]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
89 try: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
90 gff_parts["quals"]["Parent"] = \ |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
91 gff_parts["quals"][transcript_id] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
92 break |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
93 except KeyError: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
94 pass |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
95 # case for WormBase GFF -- everything labelled as Transcript or CDS |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
96 for flat_name in ["Transcript", "CDS"]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
97 if gff_parts["quals"].has_key(flat_name): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
98 # parent types |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
99 if gff_parts["type"] in [flat_name]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
100 if not gff_parts["id"]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
101 gff_parts["id"] = gff_parts["quals"][flat_name][0] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
102 gff_parts["quals"]["ID"] = [gff_parts["id"]] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
103 # children types |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
104 elif gff_parts["type"] in ["intron", "exon", "three_prime_UTR", |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
105 "coding_exon", "five_prime_UTR", "CDS", "stop_codon", |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
106 "start_codon"]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
107 gff_parts["quals"]["Parent"] = gff_parts["quals"][flat_name] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
108 break |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
109 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
110 return gff_parts |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
111 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
112 line = line.strip() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
113 if line == '':return [('directive', line)] # sometimes the blank lines will be there |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
114 if line[0] == '>':return [('directive', '')] # sometimes it will be a FATSA header |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
115 if line[0] == "#": |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
116 return [('directive', line[2:])] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
117 elif line: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
118 parts = line.split('\t') |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
119 if len(parts) == 1 and re.search(r'\w+', parts[0]):return [('directive', '')] ## GFF files with FASTA sequence together |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
120 assert len(parts) == 9, line |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
121 gff_parts = [(None if p == '.' else p) for p in parts] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
122 gff_info = dict() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
123 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
124 # collect all of the base qualifiers for this item |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
125 quals, is_gff2 = _split_keyvals(gff_parts[8]) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
126 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
127 gff_info["is_gff2"] = is_gff2 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
128 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
129 if gff_parts[1]:quals["source"].append(gff_parts[1]) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
130 gff_info['quals'] = dict(quals) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
131 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
132 # if we are describing a location, then we are a feature |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
133 if gff_parts[3] and gff_parts[4]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
134 gff_info['type'] = gff_parts[2] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
135 gff_info['id'] = quals.get('ID', [''])[0] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
136 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
137 if is_gff2:gff_info = _nest_gff2_features(gff_info) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
138 # features that have parents need to link so we can pick up |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
139 # the relationship |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
140 if gff_info['quals'].has_key('Parent'): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
141 final_key = 'child' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
142 elif gff_info['id']: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
143 final_key = 'parent' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
144 # Handle flat features |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
145 else: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
146 final_key = 'feature' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
147 # otherwise, associate these annotations with the full record |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
148 else: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
149 final_key = 'annotation' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
150 return [(final_key, gff_info)] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
151 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
152 def parent_child_id_map(gff_handle): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
153 """Provide a mapping of parent to child relationships in the file. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
154 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
155 Gives a dictionary of parent child relationships: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
156 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
157 keys -- tuple of (source, type) for each parent |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
158 values -- tuple of (source, type) as children of that parent""" |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
159 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
160 # collect all of the parent and child types mapped to IDs |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
161 parent_sts = dict() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
162 child_sts = collections.defaultdict(list) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
163 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
164 for line in gff_handle: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
165 line_type, line_info = _gff_line_map(line)[0] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
166 if (line_type == 'parent' or (line_type == 'child' and line_info['id'])): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
167 parent_sts[line_info['id']] = (line_info['quals']['source'][0], line_info['type']) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
168 if line_type == 'child': |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
169 for parent_id in line_info['quals']['Parent']: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
170 child_sts[parent_id].append((line_info['quals']['source'][0], line_info['type'])) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
171 gff_handle.close() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
172 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
173 # generate a dictionary of the unique final type relationships |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
174 pc_map = collections.defaultdict(list) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
175 for parent_id, parent_type in parent_sts.items(): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
176 for child_type in child_sts[parent_id]: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
177 pc_map[parent_type].append(child_type) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
178 pc_final_map = dict() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
179 for ptype, ctypes in pc_map.items(): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
180 unique_ctypes = list(set(ctypes)) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
181 unique_ctypes.sort() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
182 pc_final_map[ptype] = unique_ctypes |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
183 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
184 # Check for Parent Child relations |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
185 level1, level2, level3, sec_level_mis = {}, {}, {}, {} |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
186 for etype, fchild in pc_final_map.items(): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
187 level2_flag = 0 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
188 for kp, vp in pc_final_map.items(): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
189 if etype in vp:level2_flag = 1; level2[etype] = 1 # check for second level features |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
190 if level2_flag == 0: # first level features |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
191 level1[etype] =1 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
192 for eachfch in fchild: # perform a check for all level1 objects values were defined as level2 keys. |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
193 if not eachfch in pc_final_map.keys(): # figure out the missing level2 objects |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
194 if etype in sec_level_mis: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
195 sec_level_mis[etype].append(eachfch) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
196 else: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
197 sec_level_mis[etype]=[eachfch] |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
198 if level2_flag == 1:level3[str(fchild)] =1 # taking third level features |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
199 # disply the result |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
200 if level1==level2==level3=={} and sec_level_mis == {}: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
201 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
202 print 'ONLY FIRST level feature(s):' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
203 source_type = dict() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
204 gff_handle = open(gff_handle.name, 'rU') |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
205 for line in gff_handle: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
206 line = line.strip('\n\r') |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
207 if line[0] == '#': continue |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
208 parts = line.split('\t') |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
209 if parts[-1] == '':parts.pop() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
210 assert len(parts) == 9, line |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
211 source_type[(parts[1], parts[2])] = 1 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
212 gff_handle.close() |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
213 for ele in source_type:print '\t' + str(ele) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
214 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
215 else: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
216 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
217 print '===Report on different level features from GFF file===' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
218 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
219 print 'FIRST level feature(s):' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
220 for ele in level1: print '\t' + str(ele) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
221 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
222 print 'SECOND level feature(s):' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
223 for ele in level2: print '\t' + str(ele) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
224 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
225 print 'THIRD level feature(s):' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
226 for ele in level3:print '\t' + str(ele[1:-1]) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
227 print |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
228 # wrong way mapped feature mapping description |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
229 for wf, wfv in sec_level_mis.items(): |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
230 if wf[1]=='gene': |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
231 print 'GFF Parsing modules from publicly available packages like Bio-python, Bio-perl etc. are heavily dependent on feature identifier mapping.' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
232 print 'Here few features seems to be wrongly mapped to its child, which inturn cause problems while extracting the annotation based on feature identifier.\n' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
233 for ehv in wfv: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
234 if ehv[1]=='exon' or ehv[1]=='intron' or ehv[1]=='CDS' or ehv[1]=='three_prime_UTR' or ehv[1]=='five_prime_UTR':print 'Error in ID mapping: Level1 feature ' + str(wf) + ' maps to Level3 feature ' + str(ehv) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
235 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
236 if __name__=='__main__': |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
237 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
238 stime = time.asctime( time.localtime(time.time()) ) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
239 print '-------------------------------------------------------' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
240 print 'GFFExamine started on ' + stime |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
241 print '-------------------------------------------------------' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
242 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
243 try: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
244 gff_handle = open(sys.argv[1], 'rU') |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
245 except: |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
246 sys.stderr.write("Can't open the GFF3 file, Cannot continue...\n") |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
247 sys.stderr.write("USAGE: gff_id_mapper.py <gff3 file> \n") |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
248 sys.exit(-1) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
249 parent_child_id_map(gff_handle) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
250 |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
251 stime = time.asctime( time.localtime(time.time()) ) |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
252 print '-------------------------------------------------------' |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
253 print 'GFFExamine finished at ' + stime |
79726c328621
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
vipints
parents:
diff
changeset
|
254 print '-------------------------------------------------------' |