annotate format_input.py @ 2:a31c10fe09c8 draft default tip

Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
author george-weingart
date Tue, 07 Jul 2015 13:52:29 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
1 #!/usr/bin/env python
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
2
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
3 import sys,os,argparse,pickle,re,numpy
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
4
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
5
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
6
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
7
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
8 #***************************************************************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
9 #* Log of change *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
10 #* January 16, 2014 - George Weingart - george.weingart@gmail.com *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
11 #* *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
12 #* biom Support *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
13 #* Modified the program to enable it to accept biom files as input *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
14 #* *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
15 #* Added two optional input parameters: *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
16 #* 1. biom_c is the name of the biom metadata to be used as class *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
17 #* 2. biom_s is the name of the biom metadata to be used as subclass *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
18 #* class and subclass are used in the same context as the original *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
19 #* parameters class and subclass *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
20 #* These parameters are totally optional, the default is the program *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
21 #* chooses as class the first metadata received from the conversion *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
22 #* of the biom file into a sequential (pcl) file as generated by *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
23 #* breadcrumbs, and similarly, the second metadata is selected as *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
24 #* subclass. *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
25 #* The syntax or logic for the original non-biom case was NOT changed. *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
26 #* *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents:
diff changeset
27 #* <******************* IMPORTANT NOTE *************************> *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
28 #* The biom case requires breadcrumbs and therefore there is a *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
29 #* a conditional import of the breadcrumbs modules *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
30 #* If the User uses a biom input and breadcrumbs is not detected, *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
31 #* the run is abnormally ended *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
32 #* breadcrumbs itself needs a biom environment, so if the immport *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
33 #* of biom in breadcrumbs fails, the run is also abnormally
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
34 #* ended (Only if the input file was biom) *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
35 #* *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
36 #* USAGE EXAMPLES *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
37 #* -------------- *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
38 #* Case #1: Using a sequential file as input (Old version - did not change *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
39 #* ./format_input.py hmp_aerobiosis_small.txt hmp_aerobiosis_small.in -c 1 -s 2 -u 3 -o 1000000 *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
40 #* Case #2: Using a biom file as input *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
41 #* ./format_input.py hmp_aerobiosis_small.biom hmp_aerobiosis_small.in -o 1000000 *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
42 #* Case #3: Using a biom file as input and override the class and subclass *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
43 #* ./format_input.py lefse.biom hmp_aerobiosis_small.in -biom_c oxygen_availability -biom_s body_site -o 1000000
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
44 #* *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
45 #***************************************************************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
46
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
47 def read_input_file(inp_file, CommonArea):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
48
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
49 if inp_file.endswith('.biom'): #* If the file format is biom:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
50 CommonArea = biom_processing(inp_file) #* Process in biom format
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
51 return CommonArea #* And return the CommonArea
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
52
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
53 with open(inp_file) as inp:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
54 CommonArea['ReturnedData'] = [[v.strip() for v in line.strip().split("\t")] for line in inp.readlines()]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
55 return CommonArea
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
56
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
57 def transpose(data):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
58 return zip(*data)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
59
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
60 def read_params(args):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
61 parser = argparse.ArgumentParser(description='LEfSe formatting modules')
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
62 parser.add_argument('input_file', metavar='INPUT_FILE', type=str, help="the input file, feature hierarchical level can be specified with | or . and those symbols must not be present for other reasons in the input file.")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
63 parser.add_argument('output_file', metavar='OUTPUT_FILE', type=str,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
64 help="the output file containing the data for LEfSe")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
65 parser.add_argument('--output_table', type=str, required=False, default="",
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
66 help="the formatted table in txt format")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
67 parser.add_argument('-f',dest="feats_dir", choices=["c","r"], type=str, default="r",
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
68 help="set whether the features are on rows (default) or on columns")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
69 parser.add_argument('-c',dest="class", metavar="[1..n_feats]", type=int, default=1,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
70 help="set which feature use as class (default 1)")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
71 parser.add_argument('-s',dest="subclass", metavar="[1..n_feats]", type=int, default=None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
72 help="set which feature use as subclass (default -1 meaning no subclass)")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
73 parser.add_argument('-o',dest="norm_v", metavar="float", type=float, default=-1.0,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
74 help="set the normalization value (default -1.0 meaning no normalization)")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
75 parser.add_argument('-u',dest="subject", metavar="[1..n_feats]", type=int, default=None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
76 help="set which feature use as subject (default -1 meaning no subject)")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
77 parser.add_argument('-m',dest="missing_p", choices=["f","s"], type=str, default="d",
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
78 help="set the policy to adopt with missin values: f removes the features with missing values, s removes samples with missing values (default f)")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
79 parser.add_argument('-n',dest="subcl_min_card", metavar="int", type=int, default=10,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
80 help="set the minimum cardinality of each subclass (subclasses with low cardinalities will be grouped together, if the cardinality is still low, no pairwise comparison will be performed with them)")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
81
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
82 parser.add_argument('-biom_c',dest="biom_class", type=str,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
83 help="For biom input files: Set which feature use as class ")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
84 parser.add_argument('-biom_s',dest="biom_subclass", type=str,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
85 help="For biom input files: set which feature use as subclass ")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
86
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
87 args = parser.parse_args()
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
88
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
89 return vars(args)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
90
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
91 def remove_missing(data,roc):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
92 if roc == "c": data = transpose(data)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
93 max_len = max([len(r) for r in data])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
94 to_rem = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
95 for i,r in enumerate(data):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
96 if len([v for v in r if not( v == "" or v.isspace())]) < max_len: to_rem.append(i)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
97 if len(to_rem):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
98 for i in to_rem.reverse():
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
99 data.pop(i)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
100 if roc == "c": return transpose(data)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
101 return data
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
102
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
103
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
104 def sort_by_cl(data,n,c,s,u):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
105 def sort_lines1(a,b):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
106 return int(a[c] > b[c])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
107 def sort_lines2u(a,b):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
108 if a[c] != b[c]: return int(a[c] > b[c])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
109 return int(a[u] > b[u])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
110 def sort_lines2s(a,b):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
111 if a[c] != b[c]: return int(a[c] > b[c])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
112 return int(a[s] > b[s])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
113 def sort_lines3(a,b):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
114 if a[c] != b[c]: return int(a[c] > b[c])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
115 if a[s] != b[s]: return int(a[s] > b[s])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
116 return int(a[u] > b[u])*2-1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
117 if n == 3: data.sort(sort_lines3)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
118 if n == 2:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
119 if s is None:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
120 data.sort(sort_lines2u)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
121 else:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
122 data.sort(sort_lines2s)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
123 if n == 1: data.sort(sort_lines1)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
124 return data
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
125
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
126 def group_small_subclasses(cls,min_subcl):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
127 last = ""
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
128 n = 0
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
129 repl = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
130 dd = [list(cls['class']),list(cls['subclass'])]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
131 for d in dd:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
132 if d[1] != last:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
133 if n < min_subcl and last != "":
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
134 repl.append(d[1])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
135 last = d[1]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
136 n = 1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
137 for i,d in enumerate(dd):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
138 if d[1] in repl: dd[i][1] = "other"
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
139 dd[i][1] = str(dd[i][0])+"_"+str(dd[i][1])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
140 cls['class'] = dd[0]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
141 cls['subclass'] = dd[1]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
142 return cls
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
143
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
144 def get_class_slices(data):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
145 previous_class = data[0][0]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
146 previous_subclass = data[0][1]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
147 subclass_slices = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
148 class_slices = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
149 last_cl = 0
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
150 last_subcl = 0
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
151 class_hierarchy = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
152 subcls = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
153 for i,d in enumerate(data):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
154 if d[1] != previous_subclass:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
155 subclass_slices.append((previous_subclass,(last_subcl,i)))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
156 last_subcl = i
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
157 subcls.append(previous_subclass)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
158 if d[0] != previous_class:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
159 class_slices.append((previous_class,(last_cl,i)))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
160 class_hierarchy.append((previous_class,subcls))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
161 subcls = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
162 last_cl = i
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
163 previous_subclass = d[1]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
164 previous_class = d[0]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
165 subclass_slices.append((previous_subclass,(last_subcl,i+1)))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
166 subcls.append(previous_subclass)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
167 class_slices.append((previous_class,(last_cl,i+1)))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
168 class_hierarchy.append((previous_class,subcls))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
169 return dict(class_slices), dict(subclass_slices), dict(class_hierarchy)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
170
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
171 def numerical_values(feats,norm):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
172 mm = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
173 for k,v in feats.items():
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
174 feats[k] = [float(val) for val in v]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
175 if norm < 0.0: return feats
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
176 tr = zip(*(feats.values()))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
177 mul = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
178 fk = feats.keys()
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
179 hie = True if sum([k.count(".") for k in fk]) > len(fk) else False
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
180 for i in range(len(feats.values()[0])):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
181 if hie: mul.append(sum([t for j,t in enumerate(tr[i]) if fk[j].count(".") < 1 ]))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
182 else: mul.append(sum(tr[i]))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
183 if hie and sum(mul) == 0:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
184 mul = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
185 for i in range(len(feats.values()[0])):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
186 mul.append(sum(tr[i]))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
187 for i,m in enumerate(mul):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
188 if m == 0: mul[i] = 0.0
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
189 else: mul[i] = float(norm) / m
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
190 for k,v in feats.items():
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
191 feats[k] = [val*mul[i] for i,val in enumerate(v)]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
192 if numpy.mean(feats[k]) and (numpy.std(feats[k])/numpy.mean(feats[k])) < 1e-10:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
193 feats[k] = [ float(round(kv*1e6)/1e6) for kv in feats[k]]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
194 return feats
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
195
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
196 def add_missing_levels2(ff):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
197
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
198 if sum( [f.count(".") for f in ff] ) < 1: return ff
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
199
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
200 dn = {}
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
201
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
202 added = True
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
203 while added:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
204 added = False
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
205 for f in ff:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
206 lev = f.count(".")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
207 if lev == 0: continue
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
208 if lev not in dn: dn[lev] = [f]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
209 else: dn[lev].append(f)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
210 for fn in sorted(dn,reverse=True):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
211 for f in dn[fn]:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
212 fc = ".".join(f.split('.')[:-1])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
213 if fc not in ff:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
214 ab_all = [ff[fg] for fg in ff if (fg.count(".") == 0 and fg == fc) or (fg.count(".") > 0 and fc == ".".join(fg.split('.')[:-1]))]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
215 ab =[]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
216 for l in [f for f in zip(*ab_all)]:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
217 ab.append(sum([float(ll) for ll in l]))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
218 ff[fc] = ab
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
219 added = True
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
220 if added:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
221 break
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
222
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
223 return ff
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
224
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
225
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
226 def add_missing_levels(ff):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
227 if sum( [f.count(".") for f in ff] ) < 1: return ff
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
228
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
229 clades2leaves = {}
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
230 for f in ff:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
231 fs = f.split(".")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
232 if len(fs) < 2:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
233 continue
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
234 for l in range(len(fs)):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
235 n = ".".join( fs[:l] )
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
236 if n in clades2leaves:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
237 clades2leaves[n].append( f )
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
238 else:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
239 clades2leaves[n] = [f]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
240 for k,v in clades2leaves.items():
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
241 if k and k not in ff:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
242 ff[k] = [sum(a) for a in zip(*[[float(fn) for fn in ff[vv]] for vv in v])]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
243 return ff
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
244
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
245
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
246
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
247 def modify_feature_names(fn):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
248 ret = fn
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
249
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
250 for v in [' ',r'\$',r'\@',r'#',r'%',r'\^',r'\&',r'\*',r'\"',r'\'']:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
251 ret = [re.sub(v,"",f) for f in ret]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
252 for v in ["/",r'\(',r'\)',r'-',r'\+',r'=',r'{',r'}',r'\[',r'\]',
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
253 r',',r'\.',r';',r':',r'\?',r'\<',r'\>',r'\.',r'\,']:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
254 ret = [re.sub(v,"_",f) for f in ret]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
255 for v in ["\|"]:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
256 ret = [re.sub(v,".",f) for f in ret]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
257
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
258 ret2 = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
259 for r in ret:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
260 if r[0] in ['0','1','2','3','4','5','6','7','8','9']:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
261 ret2.append("f_"+r)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
262 else: ret2.append(r)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
263
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
264 return ret2
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
265
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
266
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
267 def rename_same_subcl(cl,subcl):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
268 toc = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
269 for sc in set(subcl):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
270 if len(set([cl[i] for i in range(len(subcl)) if sc == subcl[i]])) > 1:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
271 toc.append(sc)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
272 new_subcl = []
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
273 for i,sc in enumerate(subcl):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
274 if sc in toc: new_subcl.append(cl[i]+"_"+sc)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
275 else: new_subcl.append(sc)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
276 return new_subcl
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
277
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
278
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
279 #*************************************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
280 #* Modifications by George Weingart, Jan 15, 2014 *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
281 #* If the input file is biom: *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
282 #* a. Load an AbundanceTable (Using breadcrumbs) *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
283 #* b. Create a sequential file from the AbundanceTable (de-facto - pcl) *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
284 #* c. Use that file as input to the rest of the program *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
285 #* d. Calculate the c,s,and u parameters, either from the values the User entered *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
286 #* from the meta data values in the biom file or set up defaults *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
287 #* <<<------------- I M P O R T A N T N O T E ------------------->> *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
288 #* breadcrumbs src directory must be included in the PYTHONPATH *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
289 #* <<<------------- I M P O R T A N T N O T E ------------------->> *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
290 #*************************************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
291 def biom_processing(inp_file):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
292 CommonArea = dict() #* Set up a dictionary to return
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
293 CommonArea['abndData'] = AbundanceTable.funcMakeFromFile(inp_file, #* Create AbundanceTable from input biom file
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
294 cDelimiter = None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
295 sMetadataID = None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
296 sLastMetadataRow = None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
297 sLastMetadata = None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
298 strFormat = None)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
299
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
300 #****************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
301 #* Building the data element here *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
302 #****************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
303 ResolvedData = list() #This is the Resolved data that will be returned
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
304 IDMetadataName = CommonArea['abndData'].funcGetIDMetadataName() #* ID Metadataname
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
305 IDMetadata = [CommonArea['abndData'].funcGetIDMetadataName()] #* The first Row
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
306 for IDMetadataEntry in CommonArea['abndData'].funcGetMetadataCopy()[IDMetadataName]: #* Loop on all the metadata values
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
307 IDMetadata.append(IDMetadataEntry)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
308 ResolvedData.append(IDMetadata) #Add the IDMetadata with all its values to the resolved area
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
309 for key, value in CommonArea['abndData'].funcGetMetadataCopy().iteritems():
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
310 if key != IDMetadataName:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
311 MetadataEntry = list() #* Set it up
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
312 MetadataEntry.append(key) #* And post it to the area
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
313 for x in value:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
314 MetadataEntry.append(x) #* Append the metadata value name
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
315 ResolvedData.append(MetadataEntry)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
316 for AbundanceDataEntry in CommonArea['abndData'].funcGetAbundanceCopy(): #* The Abundance Data
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
317 lstAbundanceDataEntry = list(AbundanceDataEntry) #Convert tuple to list
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
318 ResolvedData.append(lstAbundanceDataEntry) #Append the list to the metadata list
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
319 CommonArea['ReturnedData'] = ResolvedData #Post the results
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
320 return CommonArea
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
321
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
322
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
323 #*******************************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
324 #* Check the params and override in the case of biom *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
325 #*******************************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
326 def check_params_for_biom_case(params, CommonArea):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
327 CommonArea['MetadataNames'] = list() #Metadata names
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
328 params['original_class'] = params['class'] #Save the original class
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
329 params['original_subclass'] = params['subclass'] #Save the original subclass
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
330 params['original_subject'] = params['subject'] #Save the original subclass
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
331
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
332
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
333 TotalMetadataEntriesAndIDInBiomFile = len(CommonArea['abndData'].funcGetMetadataCopy()) # The number of metadata entries
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
334 for i in range(0,TotalMetadataEntriesAndIDInBiomFile): #* Populate the meta data names table
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
335 CommonArea['MetadataNames'].append(CommonArea['ReturnedData'][i][0]) #Add the metadata name
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
336
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
337
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
338 #****************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
339 #* Setting the params here *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
340 #****************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
341
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
342 if TotalMetadataEntriesAndIDInBiomFile > 0: #If there is at least one entry - has to be the subject
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
343 params['subject'] = 1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
344 if TotalMetadataEntriesAndIDInBiomFile == 2: #If there are 2 - The first is the subject and the second has to be the metadata, and that is the class
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
345 params['class'] = 2
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
346 if TotalMetadataEntriesAndIDInBiomFile == 3: #If there are 3: Set up default that the second entry is the class and the third is the subclass
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
347 params['class'] = 2
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
348 params['subclass'] = 3
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
349 FlagError = False #Set up error flag
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
350
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
351 if not params['biom_class'] is None and not params['biom_subclass'] is None: #Check if the User passed a valid class and subclass
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
352 if params['biom_class'] in CommonArea['MetadataNames']:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
353 params['class'] = CommonArea['MetadataNames'].index(params['biom_class']) +1 #* Set up the index for that metadata
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
354 else:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
355 FlagError = True
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
356 if params['biom_subclass'] in CommonArea['MetadataNames']:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
357 params['subclass'] = CommonArea['MetadataNames'].index(params['biom_subclass']) +1 #* Set up the index for that metadata
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
358 else:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
359 FlagError = True
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
360 if FlagError == True: #* If the User passed an invalid class
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
361 print "**Invalid biom class or subclass passed - Using defaults: First metadata=class, Second Metadata=subclass\n"
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
362 params['class'] = 2
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
363 params['subclass'] = 3
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
364 return params
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
365
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
366
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
367
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
368 if __name__ == '__main__':
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
369 CommonArea = dict() #Build a Common Area to pass variables in the biom case
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
370 params = read_params(sys.argv)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
371
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
372 #*************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
373 #* Conditionally import breadcrumbs if file is a biom file *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
374 #* If it is and no breadcrumbs found - abnormally exit *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
375 #*************************************************************
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
376 if params['input_file'].endswith('.biom'):
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
377 try:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
378 from lefsebiom.ConstantsBreadCrumbs import *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
379 from lefsebiom.AbundanceTable import *
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
380 except ImportError:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
381 sys.stderr.write("************************************************************************************************************ \n")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
382 sys.stderr.write("* Error: Breadcrumbs libraries not detected - required to process biom files - run abnormally terminated * \n")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
383 sys.stderr.write("************************************************************************************************************ \n")
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
384 exit(1)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
385
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
386
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
387 if type(params['subclass']) is int and int(params['subclass']) < 1:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
388 params['subclass'] = None
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
389 if type(params['subject']) is int and int(params['subject']) < 1:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
390 params['subject'] = None
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
391
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
392
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
393 CommonArea = read_input_file(sys.argv[1], CommonArea) #Pass The CommonArea to the Read
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
394 data = CommonArea['ReturnedData'] #Select the data
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
395
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
396 if sys.argv[1].endswith('biom'): #* Check if biom:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
397 params = check_params_for_biom_case(params, CommonArea) #Check the params for the biom case
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
398
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
399 if params['feats_dir'] == "c":
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
400 data = transpose(data)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
401
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
402 ncl = 1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
403 if not params['subclass'] is None: ncl += 1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
404 if not params['subject'] is None: ncl += 1
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
405
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
406 first_line = zip(*data)[0]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
407
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
408 first_line = modify_feature_names(list(first_line))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
409
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
410 data = zip( first_line,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
411 *sort_by_cl(zip(*data)[1:],
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
412 ncl,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
413 params['class']-1,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
414 params['subclass']-1 if not params['subclass'] is None else None,
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
415 params['subject']-1 if not params['subject'] is None else None))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
416 # data.insert(0,first_line)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
417 # data = remove_missing(data,params['missing_p'])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
418 cls = {}
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
419
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
420 cls_i = [('class',params['class']-1)]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
421 if params['subclass'] > 0: cls_i.append(('subclass',params['subclass']-1))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
422 if params['subject'] > 0: cls_i.append(('subject',params['subject']-1))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
423 cls_i.sort(lambda x, y: -cmp(x[1],y[1]))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
424 for v in cls_i: cls[v[0]] = data.pop(v[1])[1:]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
425 if not params['subclass'] > 0: cls['subclass'] = [str(cl)+"_subcl" for cl in cls['class']]
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
426
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
427 cls['subclass'] = rename_same_subcl(cls['class'],cls['subclass'])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
428 # if 'subclass' in cls.keys(): cls = group_small_subclasses(cls,params['subcl_min_card'])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
429 class_sl,subclass_sl,class_hierarchy = get_class_slices(zip(*cls.values()))
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
430
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
431 feats = dict([(d[0],d[1:]) for d in data])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
432
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
433 feats = add_missing_levels(feats)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
434
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
435 feats = numerical_values(feats,params['norm_v'])
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
436 out = {}
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
437 out['feats'] = feats
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
438 out['norm'] = params['norm_v']
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
439 out['cls'] = cls
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
440 out['class_sl'] = class_sl
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
441 out['subclass_sl'] = subclass_sl
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
442 out['class_hierarchy'] = class_hierarchy
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
443
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
444 if params['output_table']:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
445 with open( params['output_table'], "w") as outf:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
446 if 'class' in cls: outf.write( "\t".join(list(["class"])+list(cls['class'])) + "\n" )
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
447 if 'subclass' in cls: outf.write( "\t".join(list(["subclass"])+list(cls['subclass'])) + "\n" )
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
448 if 'subject' in cls: outf.write( "\t".join(list(["subject"])+list(cls['subject'])) + "\n" )
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
449 for k,v in out['feats'].items(): outf.write( "\t".join([k]+[str(vv) for vv in v]) + "\n" )
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
450
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
451 with open(params['output_file'], 'wb') as back_file:
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
452 pickle.dump(out,back_file)
a31c10fe09c8 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents:
diff changeset
453