Mercurial > repos > george-weingart > lefse
annotate format_input.py @ 2:a31c10fe09c8 draft default tip
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
author | george-weingart |
---|---|
date | Tue, 07 Jul 2015 13:52:29 -0400 |
parents | |
children |
rev | line source |
---|---|
2
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
1 #!/usr/bin/env python |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
2 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
3 import sys,os,argparse,pickle,re,numpy |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
4 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
5 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
6 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
7 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
8 #*************************************************************************************************************** |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
9 #* Log of change * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
10 #* January 16, 2014 - George Weingart - george.weingart@gmail.com * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
11 #* * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
12 #* biom Support * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
13 #* Modified the program to enable it to accept biom files as input * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
14 #* * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
15 #* Added two optional input parameters: * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
16 #* 1. biom_c is the name of the biom metadata to be used as class * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
17 #* 2. biom_s is the name of the biom metadata to be used as subclass * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
18 #* class and subclass are used in the same context as the original * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
19 #* parameters class and subclass * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
20 #* These parameters are totally optional, the default is the program * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
21 #* chooses as class the first metadata received from the conversion * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
22 #* of the biom file into a sequential (pcl) file as generated by * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
23 #* breadcrumbs, and similarly, the second metadata is selected as * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
24 #* subclass. * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
25 #* The syntax or logic for the original non-biom case was NOT changed. * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
26 #* * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
27 #* <******************* IMPORTANT NOTE *************************> * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
28 #* The biom case requires breadcrumbs and therefore there is a * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
29 #* a conditional import of the breadcrumbs modules * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
30 #* If the User uses a biom input and breadcrumbs is not detected, * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
31 #* the run is abnormally ended * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
32 #* breadcrumbs itself needs a biom environment, so if the immport * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
33 #* of biom in breadcrumbs fails, the run is also abnormally |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
34 #* ended (Only if the input file was biom) * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
35 #* * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
36 #* USAGE EXAMPLES * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
37 #* -------------- * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
38 #* Case #1: Using a sequential file as input (Old version - did not change * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
39 #* ./format_input.py hmp_aerobiosis_small.txt hmp_aerobiosis_small.in -c 1 -s 2 -u 3 -o 1000000 * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
40 #* Case #2: Using a biom file as input * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
41 #* ./format_input.py hmp_aerobiosis_small.biom hmp_aerobiosis_small.in -o 1000000 * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
42 #* Case #3: Using a biom file as input and override the class and subclass * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
43 #* ./format_input.py lefse.biom hmp_aerobiosis_small.in -biom_c oxygen_availability -biom_s body_site -o 1000000 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
44 #* * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
45 #*************************************************************************************************************** |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
46 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
47 def read_input_file(inp_file, CommonArea): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
48 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
49 if inp_file.endswith('.biom'): #* If the file format is biom: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
50 CommonArea = biom_processing(inp_file) #* Process in biom format |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
51 return CommonArea #* And return the CommonArea |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
52 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
53 with open(inp_file) as inp: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
54 CommonArea['ReturnedData'] = [[v.strip() for v in line.strip().split("\t")] for line in inp.readlines()] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
55 return CommonArea |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
56 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
57 def transpose(data): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
58 return zip(*data) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
59 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
60 def read_params(args): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
61 parser = argparse.ArgumentParser(description='LEfSe formatting modules') |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
62 parser.add_argument('input_file', metavar='INPUT_FILE', type=str, help="the input file, feature hierarchical level can be specified with | or . and those symbols must not be present for other reasons in the input file.") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
63 parser.add_argument('output_file', metavar='OUTPUT_FILE', type=str, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
64 help="the output file containing the data for LEfSe") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
65 parser.add_argument('--output_table', type=str, required=False, default="", |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
66 help="the formatted table in txt format") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
67 parser.add_argument('-f',dest="feats_dir", choices=["c","r"], type=str, default="r", |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
68 help="set whether the features are on rows (default) or on columns") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
69 parser.add_argument('-c',dest="class", metavar="[1..n_feats]", type=int, default=1, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
70 help="set which feature use as class (default 1)") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
71 parser.add_argument('-s',dest="subclass", metavar="[1..n_feats]", type=int, default=None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
72 help="set which feature use as subclass (default -1 meaning no subclass)") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
73 parser.add_argument('-o',dest="norm_v", metavar="float", type=float, default=-1.0, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
74 help="set the normalization value (default -1.0 meaning no normalization)") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
75 parser.add_argument('-u',dest="subject", metavar="[1..n_feats]", type=int, default=None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
76 help="set which feature use as subject (default -1 meaning no subject)") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
77 parser.add_argument('-m',dest="missing_p", choices=["f","s"], type=str, default="d", |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
78 help="set the policy to adopt with missin values: f removes the features with missing values, s removes samples with missing values (default f)") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
79 parser.add_argument('-n',dest="subcl_min_card", metavar="int", type=int, default=10, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
80 help="set the minimum cardinality of each subclass (subclasses with low cardinalities will be grouped together, if the cardinality is still low, no pairwise comparison will be performed with them)") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
81 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
82 parser.add_argument('-biom_c',dest="biom_class", type=str, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
83 help="For biom input files: Set which feature use as class ") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
84 parser.add_argument('-biom_s',dest="biom_subclass", type=str, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
85 help="For biom input files: set which feature use as subclass ") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
86 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
87 args = parser.parse_args() |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
88 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
89 return vars(args) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
90 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
91 def remove_missing(data,roc): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
92 if roc == "c": data = transpose(data) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
93 max_len = max([len(r) for r in data]) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
94 to_rem = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
95 for i,r in enumerate(data): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
96 if len([v for v in r if not( v == "" or v.isspace())]) < max_len: to_rem.append(i) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
97 if len(to_rem): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
98 for i in to_rem.reverse(): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
99 data.pop(i) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
100 if roc == "c": return transpose(data) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
101 return data |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
102 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
103 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
104 def sort_by_cl(data,n,c,s,u): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
105 def sort_lines1(a,b): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
106 return int(a[c] > b[c])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
107 def sort_lines2u(a,b): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
108 if a[c] != b[c]: return int(a[c] > b[c])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
109 return int(a[u] > b[u])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
110 def sort_lines2s(a,b): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
111 if a[c] != b[c]: return int(a[c] > b[c])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
112 return int(a[s] > b[s])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
113 def sort_lines3(a,b): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
114 if a[c] != b[c]: return int(a[c] > b[c])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
115 if a[s] != b[s]: return int(a[s] > b[s])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
116 return int(a[u] > b[u])*2-1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
117 if n == 3: data.sort(sort_lines3) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
118 if n == 2: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
119 if s is None: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
120 data.sort(sort_lines2u) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
121 else: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
122 data.sort(sort_lines2s) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
123 if n == 1: data.sort(sort_lines1) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
124 return data |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
125 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
126 def group_small_subclasses(cls,min_subcl): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
127 last = "" |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
128 n = 0 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
129 repl = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
130 dd = [list(cls['class']),list(cls['subclass'])] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
131 for d in dd: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
132 if d[1] != last: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
133 if n < min_subcl and last != "": |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
134 repl.append(d[1]) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
135 last = d[1] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
136 n = 1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
137 for i,d in enumerate(dd): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
138 if d[1] in repl: dd[i][1] = "other" |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
139 dd[i][1] = str(dd[i][0])+"_"+str(dd[i][1]) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
140 cls['class'] = dd[0] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
141 cls['subclass'] = dd[1] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
142 return cls |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
143 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
144 def get_class_slices(data): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
145 previous_class = data[0][0] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
146 previous_subclass = data[0][1] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
147 subclass_slices = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
148 class_slices = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
149 last_cl = 0 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
150 last_subcl = 0 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
151 class_hierarchy = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
152 subcls = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
153 for i,d in enumerate(data): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
154 if d[1] != previous_subclass: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
155 subclass_slices.append((previous_subclass,(last_subcl,i))) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
156 last_subcl = i |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
157 subcls.append(previous_subclass) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
158 if d[0] != previous_class: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
159 class_slices.append((previous_class,(last_cl,i))) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
160 class_hierarchy.append((previous_class,subcls)) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
161 subcls = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
162 last_cl = i |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
163 previous_subclass = d[1] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
164 previous_class = d[0] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
165 subclass_slices.append((previous_subclass,(last_subcl,i+1))) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
166 subcls.append(previous_subclass) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
167 class_slices.append((previous_class,(last_cl,i+1))) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
168 class_hierarchy.append((previous_class,subcls)) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
169 return dict(class_slices), dict(subclass_slices), dict(class_hierarchy) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
170 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
171 def numerical_values(feats,norm): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
172 mm = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
173 for k,v in feats.items(): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
174 feats[k] = [float(val) for val in v] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
175 if norm < 0.0: return feats |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
176 tr = zip(*(feats.values())) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
177 mul = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
178 fk = feats.keys() |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
179 hie = True if sum([k.count(".") for k in fk]) > len(fk) else False |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
180 for i in range(len(feats.values()[0])): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
181 if hie: mul.append(sum([t for j,t in enumerate(tr[i]) if fk[j].count(".") < 1 ])) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
182 else: mul.append(sum(tr[i])) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
183 if hie and sum(mul) == 0: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
184 mul = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
185 for i in range(len(feats.values()[0])): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
186 mul.append(sum(tr[i])) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
187 for i,m in enumerate(mul): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
188 if m == 0: mul[i] = 0.0 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
189 else: mul[i] = float(norm) / m |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
190 for k,v in feats.items(): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
191 feats[k] = [val*mul[i] for i,val in enumerate(v)] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
192 if numpy.mean(feats[k]) and (numpy.std(feats[k])/numpy.mean(feats[k])) < 1e-10: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
193 feats[k] = [ float(round(kv*1e6)/1e6) for kv in feats[k]] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
194 return feats |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
195 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
196 def add_missing_levels2(ff): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
197 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
198 if sum( [f.count(".") for f in ff] ) < 1: return ff |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
199 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
200 dn = {} |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
201 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
202 added = True |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
203 while added: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
204 added = False |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
205 for f in ff: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
206 lev = f.count(".") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
207 if lev == 0: continue |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
208 if lev not in dn: dn[lev] = [f] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
209 else: dn[lev].append(f) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
210 for fn in sorted(dn,reverse=True): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
211 for f in dn[fn]: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
212 fc = ".".join(f.split('.')[:-1]) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
213 if fc not in ff: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
214 ab_all = [ff[fg] for fg in ff if (fg.count(".") == 0 and fg == fc) or (fg.count(".") > 0 and fc == ".".join(fg.split('.')[:-1]))] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
215 ab =[] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
216 for l in [f for f in zip(*ab_all)]: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
217 ab.append(sum([float(ll) for ll in l])) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
218 ff[fc] = ab |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
219 added = True |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
220 if added: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
221 break |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
222 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
223 return ff |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
224 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
225 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
226 def add_missing_levels(ff): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
227 if sum( [f.count(".") for f in ff] ) < 1: return ff |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
228 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
229 clades2leaves = {} |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
230 for f in ff: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
231 fs = f.split(".") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
232 if len(fs) < 2: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
233 continue |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
234 for l in range(len(fs)): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
235 n = ".".join( fs[:l] ) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
236 if n in clades2leaves: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
237 clades2leaves[n].append( f ) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
238 else: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
239 clades2leaves[n] = [f] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
240 for k,v in clades2leaves.items(): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
241 if k and k not in ff: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
242 ff[k] = [sum(a) for a in zip(*[[float(fn) for fn in ff[vv]] for vv in v])] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
243 return ff |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
244 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
245 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
246 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
247 def modify_feature_names(fn): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
248 ret = fn |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
249 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
250 for v in [' ',r'\$',r'\@',r'#',r'%',r'\^',r'\&',r'\*',r'\"',r'\'']: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
251 ret = [re.sub(v,"",f) for f in ret] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
252 for v in ["/",r'\(',r'\)',r'-',r'\+',r'=',r'{',r'}',r'\[',r'\]', |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
253 r',',r'\.',r';',r':',r'\?',r'\<',r'\>',r'\.',r'\,']: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
254 ret = [re.sub(v,"_",f) for f in ret] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
255 for v in ["\|"]: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
256 ret = [re.sub(v,".",f) for f in ret] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
257 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
258 ret2 = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
259 for r in ret: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
260 if r[0] in ['0','1','2','3','4','5','6','7','8','9']: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
261 ret2.append("f_"+r) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
262 else: ret2.append(r) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
263 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
264 return ret2 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
265 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
266 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
267 def rename_same_subcl(cl,subcl): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
268 toc = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
269 for sc in set(subcl): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
270 if len(set([cl[i] for i in range(len(subcl)) if sc == subcl[i]])) > 1: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
271 toc.append(sc) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
272 new_subcl = [] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
273 for i,sc in enumerate(subcl): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
274 if sc in toc: new_subcl.append(cl[i]+"_"+sc) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
275 else: new_subcl.append(sc) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
276 return new_subcl |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
277 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
278 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
279 #************************************************************************************* |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
280 #* Modifications by George Weingart, Jan 15, 2014 * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
281 #* If the input file is biom: * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
282 #* a. Load an AbundanceTable (Using breadcrumbs) * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
283 #* b. Create a sequential file from the AbundanceTable (de-facto - pcl) * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
284 #* c. Use that file as input to the rest of the program * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
285 #* d. Calculate the c,s,and u parameters, either from the values the User entered * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
286 #* from the meta data values in the biom file or set up defaults * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
287 #* <<<------------- I M P O R T A N T N O T E ------------------->> * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
288 #* breadcrumbs src directory must be included in the PYTHONPATH * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
289 #* <<<------------- I M P O R T A N T N O T E ------------------->> * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
290 #************************************************************************************* |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
291 def biom_processing(inp_file): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
292 CommonArea = dict() #* Set up a dictionary to return |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
293 CommonArea['abndData'] = AbundanceTable.funcMakeFromFile(inp_file, #* Create AbundanceTable from input biom file |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
294 cDelimiter = None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
295 sMetadataID = None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
296 sLastMetadataRow = None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
297 sLastMetadata = None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
298 strFormat = None) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
299 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
300 #**************************************************************** |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
301 #* Building the data element here * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
302 #**************************************************************** |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
303 ResolvedData = list() #This is the Resolved data that will be returned |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
304 IDMetadataName = CommonArea['abndData'].funcGetIDMetadataName() #* ID Metadataname |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
305 IDMetadata = [CommonArea['abndData'].funcGetIDMetadataName()] #* The first Row |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
306 for IDMetadataEntry in CommonArea['abndData'].funcGetMetadataCopy()[IDMetadataName]: #* Loop on all the metadata values |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
307 IDMetadata.append(IDMetadataEntry) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
308 ResolvedData.append(IDMetadata) #Add the IDMetadata with all its values to the resolved area |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
309 for key, value in CommonArea['abndData'].funcGetMetadataCopy().iteritems(): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
310 if key != IDMetadataName: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
311 MetadataEntry = list() #* Set it up |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
312 MetadataEntry.append(key) #* And post it to the area |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
313 for x in value: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
314 MetadataEntry.append(x) #* Append the metadata value name |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
315 ResolvedData.append(MetadataEntry) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
316 for AbundanceDataEntry in CommonArea['abndData'].funcGetAbundanceCopy(): #* The Abundance Data |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
317 lstAbundanceDataEntry = list(AbundanceDataEntry) #Convert tuple to list |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
318 ResolvedData.append(lstAbundanceDataEntry) #Append the list to the metadata list |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
319 CommonArea['ReturnedData'] = ResolvedData #Post the results |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
320 return CommonArea |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
321 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
322 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
323 #******************************************************************************* |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
324 #* Check the params and override in the case of biom * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
325 #******************************************************************************* |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
326 def check_params_for_biom_case(params, CommonArea): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
327 CommonArea['MetadataNames'] = list() #Metadata names |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
328 params['original_class'] = params['class'] #Save the original class |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
329 params['original_subclass'] = params['subclass'] #Save the original subclass |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
330 params['original_subject'] = params['subject'] #Save the original subclass |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
331 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
332 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
333 TotalMetadataEntriesAndIDInBiomFile = len(CommonArea['abndData'].funcGetMetadataCopy()) # The number of metadata entries |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
334 for i in range(0,TotalMetadataEntriesAndIDInBiomFile): #* Populate the meta data names table |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
335 CommonArea['MetadataNames'].append(CommonArea['ReturnedData'][i][0]) #Add the metadata name |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
336 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
337 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
338 #**************************************************** |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
339 #* Setting the params here * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
340 #**************************************************** |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
341 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
342 if TotalMetadataEntriesAndIDInBiomFile > 0: #If there is at least one entry - has to be the subject |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
343 params['subject'] = 1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
344 if TotalMetadataEntriesAndIDInBiomFile == 2: #If there are 2 - The first is the subject and the second has to be the metadata, and that is the class |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
345 params['class'] = 2 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
346 if TotalMetadataEntriesAndIDInBiomFile == 3: #If there are 3: Set up default that the second entry is the class and the third is the subclass |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
347 params['class'] = 2 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
348 params['subclass'] = 3 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
349 FlagError = False #Set up error flag |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
350 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
351 if not params['biom_class'] is None and not params['biom_subclass'] is None: #Check if the User passed a valid class and subclass |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
352 if params['biom_class'] in CommonArea['MetadataNames']: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
353 params['class'] = CommonArea['MetadataNames'].index(params['biom_class']) +1 #* Set up the index for that metadata |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
354 else: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
355 FlagError = True |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
356 if params['biom_subclass'] in CommonArea['MetadataNames']: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
357 params['subclass'] = CommonArea['MetadataNames'].index(params['biom_subclass']) +1 #* Set up the index for that metadata |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
358 else: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
359 FlagError = True |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
360 if FlagError == True: #* If the User passed an invalid class |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
361 print "**Invalid biom class or subclass passed - Using defaults: First metadata=class, Second Metadata=subclass\n" |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
362 params['class'] = 2 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
363 params['subclass'] = 3 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
364 return params |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
365 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
366 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
367 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
368 if __name__ == '__main__': |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
369 CommonArea = dict() #Build a Common Area to pass variables in the biom case |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
370 params = read_params(sys.argv) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
371 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
372 #************************************************************* |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
373 #* Conditionally import breadcrumbs if file is a biom file * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
374 #* If it is and no breadcrumbs found - abnormally exit * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
375 #************************************************************* |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
376 if params['input_file'].endswith('.biom'): |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
377 try: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
378 from lefsebiom.ConstantsBreadCrumbs import * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
379 from lefsebiom.AbundanceTable import * |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
380 except ImportError: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
381 sys.stderr.write("************************************************************************************************************ \n") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
382 sys.stderr.write("* Error: Breadcrumbs libraries not detected - required to process biom files - run abnormally terminated * \n") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
383 sys.stderr.write("************************************************************************************************************ \n") |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
384 exit(1) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
385 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
386 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
387 if type(params['subclass']) is int and int(params['subclass']) < 1: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
388 params['subclass'] = None |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
389 if type(params['subject']) is int and int(params['subject']) < 1: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
390 params['subject'] = None |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
391 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
392 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
393 CommonArea = read_input_file(sys.argv[1], CommonArea) #Pass The CommonArea to the Read |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
394 data = CommonArea['ReturnedData'] #Select the data |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
395 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
396 if sys.argv[1].endswith('biom'): #* Check if biom: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
397 params = check_params_for_biom_case(params, CommonArea) #Check the params for the biom case |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
398 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
399 if params['feats_dir'] == "c": |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
400 data = transpose(data) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
401 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
402 ncl = 1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
403 if not params['subclass'] is None: ncl += 1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
404 if not params['subject'] is None: ncl += 1 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
405 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
406 first_line = zip(*data)[0] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
407 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
408 first_line = modify_feature_names(list(first_line)) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
409 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
410 data = zip( first_line, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
411 *sort_by_cl(zip(*data)[1:], |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
412 ncl, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
413 params['class']-1, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
414 params['subclass']-1 if not params['subclass'] is None else None, |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
415 params['subject']-1 if not params['subject'] is None else None)) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
416 # data.insert(0,first_line) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
417 # data = remove_missing(data,params['missing_p']) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
418 cls = {} |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
419 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
420 cls_i = [('class',params['class']-1)] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
421 if params['subclass'] > 0: cls_i.append(('subclass',params['subclass']-1)) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
422 if params['subject'] > 0: cls_i.append(('subject',params['subject']-1)) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
423 cls_i.sort(lambda x, y: -cmp(x[1],y[1])) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
424 for v in cls_i: cls[v[0]] = data.pop(v[1])[1:] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
425 if not params['subclass'] > 0: cls['subclass'] = [str(cl)+"_subcl" for cl in cls['class']] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
426 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
427 cls['subclass'] = rename_same_subcl(cls['class'],cls['subclass']) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
428 # if 'subclass' in cls.keys(): cls = group_small_subclasses(cls,params['subcl_min_card']) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
429 class_sl,subclass_sl,class_hierarchy = get_class_slices(zip(*cls.values())) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
430 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
431 feats = dict([(d[0],d[1:]) for d in data]) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
432 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
433 feats = add_missing_levels(feats) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
434 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
435 feats = numerical_values(feats,params['norm_v']) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
436 out = {} |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
437 out['feats'] = feats |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
438 out['norm'] = params['norm_v'] |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
439 out['cls'] = cls |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
440 out['class_sl'] = class_sl |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
441 out['subclass_sl'] = subclass_sl |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
442 out['class_hierarchy'] = class_hierarchy |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
443 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
444 if params['output_table']: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
445 with open( params['output_table'], "w") as outf: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
446 if 'class' in cls: outf.write( "\t".join(list(["class"])+list(cls['class'])) + "\n" ) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
447 if 'subclass' in cls: outf.write( "\t".join(list(["subclass"])+list(cls['subclass'])) + "\n" ) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
448 if 'subject' in cls: outf.write( "\t".join(list(["subject"])+list(cls['subject'])) + "\n" ) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
449 for k,v in out['feats'].items(): outf.write( "\t".join([k]+[str(vv) for vv in v]) + "\n" ) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
450 |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
451 with open(params['output_file'], 'wb') as back_file: |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
452 pickle.dump(out,back_file) |
a31c10fe09c8
Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff
changeset
|
453 |