annotate hexagram-6ae12361157c/hexagram/hexagram.py @ 0:1407e3634bcf draft default tip

Uploaded r11 from test tool shed.
author adam-novak
date Tue, 22 Oct 2013 14:17:59 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1 #!/usr/bin/env python2.7
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
2 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
3 hexagram.py: Given a matrix of similarities, produce a hexagram visualization.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
4
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
5 This script takes in the filename of a tab-separated value file containing a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
6 sparse similarity matrix (with string labels) and several matrices of
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
7 layer/score data. It produces an HTML file (and several support files) that
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
8 provide an interactive visualization of the items clustered on a hexagonal grid.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
9
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
10 This script depends on the DrL graph layout package, binaries for which must be
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
11 present in your PATH.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
12
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
13 Re-uses sample code and documentation from
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
14 <http://users.soe.ucsc.edu/~karplus/bme205/f12/Scaffold.html>
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
15 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
16
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
17 import argparse, sys, os, itertools, math, numpy, subprocess, shutil, tempfile
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
18 import collections, multiprocessing, traceback, numpy
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
19 import scipy.stats, scipy.linalg
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
20 import os.path
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
21 import tsv
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
22
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
23 # Global variable to hold opened matrices files
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
24 matrices = [];
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
25
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
26
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
27 def parse_args(args):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
28 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
29 Takes in the command-line arguments list (args), and returns a nice argparse
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
30 result with fields for all the options.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
31 Borrows heavily from the argparse documentation examples:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
32 <http://docs.python.org/library/argparse.html>
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
33 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
34
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
35 # The command line arguments start with the program name, which we don't
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
36 # want to treat as an argument for argparse. So we remove it.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
37 args = args[1:]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
38
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
39 # Construct the parser (which is stored in parser)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
40 # Module docstring lives in __doc__
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
41 # See http://python-forum.com/pythonforum/viewtopic.php?f=3&t=36847
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
42 # And a formatter class so our examples in the docstring look good. Isn't it
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
43 # convenient how we already wrapped it to 80 characters?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
44 # See http://docs.python.org/library/argparse.html#formatter-class
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
45 parser = argparse.ArgumentParser(description=__doc__,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
46 formatter_class=argparse.RawDescriptionHelpFormatter)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
47
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
48 # Now add all the options to it
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
49 # Options match the ctdHeatmap tool options as much as possible.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
50 parser.add_argument("similarity", type=str, nargs='+',
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
51 help="the unopened files of similarity matrices")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
52 parser.add_argument("--names", type=str, action="append", default=[],
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
53 help="the unopened files of similarity matrices")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
54 parser.add_argument("--scores", type=str,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
55 action="append", default=[],
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
56 help="a TSV to read scores for each signature from")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
57 parser.add_argument("--colormaps", type=argparse.FileType("r"),
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
58 default=None,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
59 help="a TSV defining coloring and value names for discrete scores")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
60 parser.add_argument("--html", "-H", type=str,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
61 default="index.html",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
62 help="where to write HTML report")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
63 parser.add_argument("--directory", "-d", type=str, default=".",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
64 help="directory in which to create other output files")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
65 parser.add_argument("--query", type=str, default=None,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
66 help="Galaxy-escaped name of the query signature")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
67 parser.add_argument("--window_size", type=int, default=20,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
68 help="size of the window to use when looking for clusters")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
69 parser.add_argument("--truncation_edges", type=int, default=10,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
70 help="number of edges for DrL truncate to pass per node")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
71 parser.add_argument("--no-stats", dest="stats", action="store_false",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
72 default=True,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
73 help="disable cluster-finding statistics")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
74 parser.add_argument("--include-singletons", dest="singletons",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
75 action="store_true", default=False,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
76 help="add self-edges to retain unconnected points")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
77
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
78 return parser.parse_args(args)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
79
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
80 def hexagon_center(x, y, scale=1.0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
81 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
82 Given a coordinate on a grid of hexagons (using wiggly rows in x), what is
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
83 the 2d Euclidian coordinate of its center?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
84
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
85 x and y are integer column and row coordinates of the hexagon in the grid.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
86
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
87 scale is a float specifying hexagon side length.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
88
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
89 The origin in coordinate space is defined as the upper left corner of the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
90 bounding box of the hexagon with indices x=0 and y=0.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
91
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
92 Returns a tuple of floats.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
93 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
94 # The grid looks like this:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
95 #
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
96 # /-\ /-\ /-\ /-\
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
97 # /-\-/-\-/-\-/-\-/-\
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
98 # \-/-\-/-\-/-\-/-\-/
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
99 # /-\-/-\-/-\-/-\-/-\
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
100 # \-/-\-/-\-/-\-/-\-/
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
101 # /-\-/-\-/-\-/-\-/-\
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
102 # \-/ \-/ \-/ \-/ \-/
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
103 #
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
104 # Say a hexagon side has length 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
105 # It's 2 across corner to corner (x), and sqrt(3) across side to side (y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
106 # X coordinates are 1.5 per column
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
107 # Y coordinates (down from top) are sqrt(3) per row, -1/2 sqrt(3) if you're
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
108 # in an odd column.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
109
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
110 center_y = math.sqrt(3) * y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
111 if x % 2 == 1:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
112 # Odd column: shift up
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
113 center_y -= 0.5 * math.sqrt(3)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
114
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
115 return (1.5 * x * scale + scale, center_y * scale + math.sqrt(3.0) / 2.0 *
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
116 scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
117
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
118 def hexagon_pick(x, y, scale=1.0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
119 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
120 Given floats x and y specifying coordinates in the plane, determine which
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
121 hexagon grid cell that point is in.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
122
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
123 scale is a float specifying hexagon side length.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
124
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
125 See http://blog.ruslans.com/2011/02/hexagonal-grid-math.html
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
126 But we flip the direction of the wiggle. Odd rows are up (-y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
127 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
128
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
129 # How high is a hex?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
130 hex_height = math.sqrt(3) * scale
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
131
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
132 # First we pick a rectangular tile, from the point of one side-traingle to
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
133 # the base of the other in width, and the whole hexagon height in height.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
134
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
135 # How wide are these tiles? Corner to line-between-far-corners distance
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
136 tile_width = (3.0 / 2.0 * scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
137
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
138 # Tile X index is floor(x / )
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
139 tile_x = int(math.floor(x / tile_width))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
140
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
141 # We need this intermediate value for the Y index and for tile-internal
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
142 # picking
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
143 corrected_y = y + (tile_x % 2) * hex_height / 2.0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
144
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
145 # Tile Y index is floor((y + (x index mod 2) * hex height/2) / hex height)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
146 tile_y = int(math.floor(corrected_y / hex_height))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
147
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
148 # Find coordinates within the tile
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
149 internal_x = x - tile_x * tile_width
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
150 internal_y = corrected_y - tile_y * hex_height
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
151
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
152 # Do tile-scale picking
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
153 # Are we in the one corner, the other corner, or the bulk of the tile?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
154 if internal_x > scale * abs(0.5 - internal_y / hex_height):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
155 # We're in the bulk of the tile
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
156 # This is the column (x) of the picked hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
157 hexagon_x = tile_x
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
158
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
159 # This is the row (y) of the picked hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
160 hexagon_y = tile_y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
161 else:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
162 # We're in a corner.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
163 # In an even column, the lower left is part of the next row, and the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
164 # upper left is part of the same row. In an odd column, the lower left
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
165 # is part of the same row, and the upper left is part of the previous
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
166 # row.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
167 if internal_y > hex_height / 2.0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
168 # It's the lower left corner
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
169 # This is the offset in row (y) that being in this corner gives us
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
170 # The lower left corner is always 1 row below the upper left corner.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
171 corner_y_offset = 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
172 else:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
173 corner_y_offset = 0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
174
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
175 # TODO: verify this for correctness. It seems to be right, but I want a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
176 # unit test to be sure.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
177 # This is the row (y) of the picked hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
178 hexagon_y = tile_y - tile_x % 2 + corner_y_offset
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
179
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
180 # This is the column (x) of the picked hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
181 hexagon_x = tile_x - 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
182
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
183 # Now we've picked the hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
184 return (hexagon_x, hexagon_y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
185
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
186 def radial_search(center_x, center_y):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
187 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
188 An iterator that yields coordinate tuples (x, y) in order of increasing
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
189 hex-grid distance from the specified center position.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
190 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
191
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
192 # A hexagon has neighbors at the following relative coordinates:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
193 # (-1, 0), (1, 0), (0, -1), (0, 1)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
194 # and ((-1, 1) and (1, 1) if in an even column)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
195 # or ((-1, -1) and (1, -1) if in an odd column)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
196
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
197 # We're going to go outwards using breadth-first search, so we need a queue
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
198 # of hexes to visit and a set of already visited hexes.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
199
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
200 # This holds a queue (really a deque) of hexes waiting to be visited.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
201 # A list has O(n) pop/insert at left.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
202 queue = collections.deque()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
203 # This holds a set of the (x, y) coordinate tuples of already-seen hexes,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
204 # so we don't enqueue them again.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
205 seen = set()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
206
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
207 # First place to visit is the center.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
208 queue.append((center_x, center_y))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
209
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
210 while len(queue) > 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
211 # We should in theory never run out of items in the queue.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
212 # Get the current x and y to visit.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
213 x, y = queue.popleft()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
214
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
215 # Yield the location we're visiting
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
216 yield (x, y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
217
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
218 # This holds a list of all relative neighbor positions as (x, y) tuples.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
219 neighbor_offsets = [(-1, 0), (1, 0), (0, -1), (0, 1)]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
220 if y % 2 == 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
221 # An even-column hex also has these neighbors
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
222 neighbor_offsets += [(-1, 1), (1, 1)]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
223 else:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
224 # An odd-column hex also has these neighbors
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
225 neighbor_offsets += [(-1, -1), (1, -1)]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
226
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
227 for x_offset, y_offset in neighbor_offsets:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
228 # First calculate the absolute position of the neighbor in x
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
229 neighbor_x = x + x_offset
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
230 # And in y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
231 neighbor_y = y + y_offset
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
232
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
233 if (neighbor_x, neighbor_y) not in seen:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
234 # This is a hex that has never been in the queue. Add it.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
235 queue.append((neighbor_x, neighbor_y))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
236
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
237 # Record that it has ever been enqueued
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
238 seen.add((neighbor_x, neighbor_y))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
239
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
240
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
241
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
242
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
243 def assign_hexagon(hexagons, node_x, node_y, node, scale=1.0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
244 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
245 This function assigns the given node to a hexagon in hexagons. hexagons is a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
246 defaultdict from tuples of hexagon (x, y) integer indices to assigned nodes,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
247 or None if a hexagon is free. node_x and node_y are the x and y coordinates
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
248 of the node, adapted so that the seed node lands in the 0, 0 hexagon, and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
249 re-scaled to reduce hexagon conflicts. node is the node to be assigned.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
250 scale, if specified, is the hexagon side length in node space units.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
251
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
252 This function assigns nodes to their closest hexagon, reprobing outwards if
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
253 already occupied.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
254
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
255 When the function completes, node is stored in hexagons under some (x, y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
256 tuple.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
257
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
258 Returns the distance this hexagon is from its ideal location.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
259 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
260
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
261 # These hold the hexagon that the point falls in, which may be taken.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
262 best_x, best_y = hexagon_pick(node_x, node_y, scale=scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
263
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
264 for x, y in radial_search(best_x, best_y):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
265 # These hexes are enumerated in order of increasign distance from the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
266 # best one, starting with the best hex itself.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
267
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
268 if hexagons[(x, y)] is None:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
269 # This is the closest free hex. Break out of the loop, leaving x and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
270 # y pointing here.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
271 break
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
272
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
273 # Assign the node to the hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
274 hexagons[(x, y)] = node
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
275
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
276 return math.sqrt((x - best_x) ** 2 + (y - best_y) ** 2)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
277
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
278
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
279
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
280 def assign_hexagon_local_radial(hexagons, node_x, node_y, node, scale=1.0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
281 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
282 This function assigns the given node to a hexagon in hexagons. hexagons is a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
283 defaultdict from tuples of hexagon (x, y) integer indices to assigned nodes,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
284 or None if a hexagon is free. node_x and node_y are the x and y coordinates
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
285 of the node, adapted so that the seed node lands in the 0, 0 hexagon, and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
286 re-scaled to reduce hexagon conflicts. node is the node to be assigned.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
287 scale, if specified, is the hexagon side length in node space units.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
288
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
289 This function assigns nodes to their closest hexagon. If thast hexagon is
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
290 full, it re-probes in the direction that the node is from the closest
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
291 hexagon's center.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
292
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
293 When the function completes, node is stored in hexagons under some (x, y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
294 tuple.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
295
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
296 Returns the distance this hexagon is from its ideal location.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
297 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
298
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
299 # These hold the hexagon that the point falls in, which may be taken.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
300 best_x, best_y = hexagon_pick(node_x, node_y, scale=scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
301
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
302 # These hold the center of that hexagon in float space
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
303 center_x, center_y = hexagon_center(best_x, best_y, scale=scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
304
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
305 # This holds the distance from this point to the center of that hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
306 node_distance = math.sqrt((node_x - center_x) ** 2 + (node_y - center_y) **
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
307 2)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
308
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
309 # These hold the normalized direction of this point, relative to the center
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
310 # of its best hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
311 direction_x = (node_x - center_x) / node_distance
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
312 direction_y = (node_y - center_y) / node_distance
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
313
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
314 # Do a search in that direction, starting at the best hex.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
315
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
316 # These are the hexagon indices we're considering
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
317 x, y = best_x, best_y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
318
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
319 # These are the Cartesian coordinates we're probing. Must be in the x, y hex
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
320 # as a loop invariant.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
321 test_x, test_y = center_x, center_y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
322
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
323 while hexagons[(x, y)] is not None:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
324 # Re-probe outwards from the best hex in scale/2-sized steps
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
325 # TODO: is that the right step size? Scale-sized steps seemed slightly
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
326 # large.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
327 test_x += direction_x * scale
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
328 test_y += direction_y * scale
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
329
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
330 # Re-pick x and y for the hex containing our test point
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
331 x, y = hexagon_pick(test_x, test_y, scale=scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
332
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
333 # We've finally reached the edge of the cluster.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
334 # Drop our hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
335 hexagons[(x, y)] = node
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
336
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
337 return math.sqrt((x - best_x) ** 2 + (y - best_y) ** 2)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
338
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
339 def assign_hexagon_radial(hexagons, node_x, node_y, node, scale=1.0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
340 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
341 This function assigns the given node to a hexagon in hexagons. hexagons is a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
342 defaultdict from tuples of hexagon (x, y) integer indices to assigned nodes,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
343 or None if a hexagon is free. node_x and node_y are the x and y coordinates
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
344 of the node, adapted so that the seed node lands in the 0, 0 hexagon, and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
345 re-scaled to reduce hexagon conflicts. node is the node to be assigned.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
346 scale, if specified, is the hexagon side length in node space units.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
347
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
348 This function assigns nodes to hexagons based on radial distance from 0, 0.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
349 This makes hexagon assignment much more dense, but can lose spatial
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
350 structure.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
351
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
352 When the function completes, node is stored in hexagons under some (x, y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
353 tuple.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
354
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
355 Returns the distance this hexagon is from its ideal location. Unfortunately,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
356 this doesn't really make sense for this assignment scheme, so it is always
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
357 0.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
358 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
359
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
360 # Compute node's distance from the origin
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
361 node_distance = math.sqrt(node_x ** 2 + node_y ** 2)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
362
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
363 # Compute normalized direction from the origin for this node
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
364 direction_x = node_x / node_distance
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
365 direction_y = node_y / node_distance
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
366
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
367 # These are the coordinates we are testing
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
368 test_x = 0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
369 test_y = 0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
370
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
371 # These are the hexagon indices that correspond to that point
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
372 x, y = hexagon_pick(test_x, test_y, scale=scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
373
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
374 while hexagons[(x, y)] is not None:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
375 # Re-probe outwards from the origin in scale-sized steps
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
376 # TODO: is that the right step size?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
377 test_x += direction_x * scale
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
378 test_y += direction_y * scale
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
379
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
380 # Re-pick
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
381 x, y = hexagon_pick(test_x, test_y, scale=scale)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
382
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
383 # We've finally reached the edge of the cluster.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
384 # Drop our hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
385 # TODO: this has to be N^2 if we line them all up in a line
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
386 hexagons[(x, y)] = node
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
387
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
388 return 0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
389
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
390 def hexagons_in_window(hexagons, x, y, width, height):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
391 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
392 Given a dict from (x, y) position to signature names, return the list of all
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
393 signatures in the window starting at hexagon x, y and extending width in the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
394 x direction and height in the y direction on the hexagon grid.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
395 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
396
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
397 # This holds the list of hexagons we've found
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
398 found = []
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
399
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
400 for i in xrange(x, x + width):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
401 for j in xrange(y, y + height):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
402 if hexagons.has_key((i, j)):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
403 # This position in the window has a hex.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
404 found.append(hexagons[(i, j)])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
405
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
406 return found
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
407
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
408 class ClusterFinder(object):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
409 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
410 A class that can be invoked to find the p value of the best cluster in its
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
411 layer. Instances are pickleable.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
412 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
413
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
414 def __init__(self, hexagons, layer, window_size=5):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
415 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
416 Keep the given hexagons dict (from (x, y) to signature name) and the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
417 given layer (a dict from signature name to a value), and the given
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
418 window size, in a ClusterFinder object.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
419 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
420
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
421 # TODO: This should probably all operate on numpy arrays that we can
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
422 # slice efficiently.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
423
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
424 # Store the layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
425 self.hexagons = hexagons
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
426 # Store the hexagon assignments
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
427 self.layer = layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
428
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
429 # Store the window size
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
430 self.window_size = window_size
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
431
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
432 @staticmethod
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
433 def continuous_p(in_values, out_values):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
434 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
435 Get the p value for in_values and out_values being distinct continuous
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
436 distributions.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
437
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
438 in_values and out_values are both Numpy arrays. Returns the p value, or
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
439 raises a ValueError if the statistical test cannot be run for some
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
440 reason.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
441
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
442 Uses the Mann-Whitney U test.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
443 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
444
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
445 # Do a Mann-Whitney U test to see how different the data
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
446 # sets are.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
447 u_statistic, p_value = scipy.stats.mannwhitneyu(in_values,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
448 out_values)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
449
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
450 return p_value
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
451
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
452 @staticmethod
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
453 def dichotomous_p(in_values, out_values):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
454 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
455 Given two one-dimensional Numpy arrays of 0s and 1s, compute a p value
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
456 for the in_values having a different probability of being 1 than the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
457 frequency of 1s in the out_values.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
458
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
459 This test uses the scipy.stats.binom_test function, which does not claim
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
460 to use the normal approximation. Therefore, this test should be valid
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
461 for arbitrarily small frequencies of either 0s or 1s in in_values.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
462
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
463 TODO: What if out_values is shorter than in_values?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
464 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
465
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
466 if len(out_values) == 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
467 raise ValueError("Background group is empty!")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
468
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
469 # This holds the observed frequency of 1s in out_values
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
470 frequency = numpy.sum(out_values) / len(out_values)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
471
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
472 # This holds the number of 1s in in_values
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
473 successes = numpy.sum(in_values)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
474
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
475 # This holds the number of "trials" we got that many successes in
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
476 trials = len(in_values)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
477
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
478 # Return how significantly the frequency inside differs from that
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
479 # outside.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
480 return scipy.stats.binom_test(successes, trials, frequency)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
481
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
482 @staticmethod
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
483 def categorical_p(in_values, out_values):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
484 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
485 Given two one-dimensional Numpy arrays of integers (which may be stored
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
486 as floats), which represent items being assigned to different
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
487 categories, return a p value for the distribution of categories observed
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
488 in in_values differing from that observed in out_values.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
489
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
490 The normal way to do this is with a chi-squared goodness of fit test.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
491 However, that test has invalid assumptions when there are fewer than 5
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
492 expected and 5 observed observations in every category.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
493 See http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chis
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
494 quare.html
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
495
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
496 However, we will use it anyway, because the tests that don't break down
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
497 are prohibitively slow.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
498 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
499
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
500 # Convert our inputs to integer arrays
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
501 in_values = in_values.astype(int)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
502 out_values = out_values.astype(int)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
503
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
504 # How many categories are there (count 0 to the maximum value)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
505 num_categories = max(numpy.max(in_values), numpy.max(out_values)) + 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
506
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
507 # Count the number of in_values and out_values in each category
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
508 in_counts = numpy.array([len(in_values[in_values == i]) for i in
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
509 xrange(num_categories)])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
510 out_counts = numpy.array([len(out_values[out_values == i]) for i in
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
511 xrange(num_categories)])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
512
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
513 # Get the p value for the window being from the estimated distribution
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
514 # None of the distribution parameters count as "estimated from data"
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
515 # because they aren't estimated from the data under test.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
516 _, p_value = scipy.stats.chisquare(in_counts, out_counts)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
517
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
518 return p_value
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
519
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
520 def __call__(self):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
521 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
522 Find the best p value for any window of size window_size. Return it.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
523 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
524
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
525 # Calculate the bounding box where we want to look for windows.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
526 # TODO: This would just be all of a numpy array
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
527 min_x = min(coords[0] for coords in self.hexagons.iterkeys())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
528 min_y = min(coords[1] for coords in self.hexagons.iterkeys())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
529 max_x = max(coords[0] for coords in self.hexagons.iterkeys())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
530 max_y = max(coords[1] for coords in self.hexagons.iterkeys())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
531
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
532 # This holds a Numpy array of all the data by x, y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
533 layer_data = numpy.empty((max_x - min_x + 1, max_y - min_y + 1))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
534
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
535 # Fill it with NaN so we can mask those out later
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
536 layer_data[:] = numpy.NAN
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
537
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
538 for (hex_x, hex_y), name in self.hexagons.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
539 # Copy the layer values into the Numpy array
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
540 if self.layer.has_key(name):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
541 layer_data[hex_x - min_x, hex_y - min_y] = self.layer[name]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
542
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
543 # This holds a masked version of the layer data
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
544 layer_data_masked = numpy.ma.masked_invalid(layer_data, copy=False)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
545
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
546 # This holds the smallest p value we have found for this layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
547 best_p = float("+inf")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
548
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
549 # This holds the statistical test to use (a function from two Numpy
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
550 # arrays to a p value)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
551 # The most specific test is the dichotomous test (0 or 1)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
552 statistical_test = self.dichotomous_p
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
553
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
554 if numpy.sum(~layer_data_masked.mask) == 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
555 # There is actually no data in this layer at all.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
556 # nditer complains if we try to iterate over an empty thing.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
557 # So quit early and say we couldn't find anything.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
558 return best_p
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
559
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
560 for value in numpy.nditer(layer_data_masked[~layer_data_masked.mask]):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
561 # Check all the values in the layer.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
562 # If this value is out of the domain of the current statistical
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
563 # test, upgrade to a more general test.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
564
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
565 if statistical_test == self.dichotomous_p and (value > 1 or
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
566 value < 0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
567
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
568 # We can't use a dichotomous test on things outside 0 to 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
569 # But we haven't yet detected any non-integers
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
570 # Use categorical
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
571 statistical_test = self.categorical_p
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
572
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
573 if value % 1 != 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
574 # This is not an integer value
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
575 # So, we must use a continuous statistical test
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
576 statistical_test = self.continuous_p
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
577
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
578 # This is the least specific test, so we can stop now
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
579 break
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
580
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
581
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
582 for i in xrange(min_x, max_x - self.window_size):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
583 for j in xrange(min_y, max_y - self.window_size):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
584
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
585 # Get the layer values for hexes in the window, as a Numpy
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
586 # masked array.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
587 in_region = layer_data_masked[i:i + self.window_size,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
588 j:j + self.window_size]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
589
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
590 # And as a 1d Numpy array
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
591 in_values = numpy.reshape(in_region[~in_region.mask], -1).data
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
592
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
593 # And out of the window (all the other hexes) as a masked array
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
594 out_region = numpy.ma.copy(layer_data_masked)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
595 # We get this by masking out everything in the region
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
596 out_region.mask[i:i + self.window_size,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
597 j:j + self.window_size] = True
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
598
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
599 # And as a 1d Numpy array
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
600 out_values = numpy.reshape(out_region[~out_region.mask],
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
601 -1).data
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
602
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
603
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
604 if len(in_values) == 0 or len(out_values) == 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
605 # Can't do any stats on this window
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
606 continue
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
607
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
608 if len(in_values) < 0.5 * self.window_size ** 2:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
609 # The window is less than half full. Skip it.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
610 # TODO: Make this threshold configurable.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
611 continue
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
612
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
613 try:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
614
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
615 # Get the p value for this window under the selected
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
616 # statistical test
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
617 p_value = statistical_test(in_values, out_values)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
618
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
619 # If this is the best p value so far, record it
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
620 best_p = min(best_p, p_value)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
621 except ValueError:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
622 # Probably an all-zero layer, or something else the test
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
623 # can't handle.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
624 # But let's try all the other windows to be safe.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
625 # Maybe one will work.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
626 pass
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
627
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
628
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
629
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
630 # We have now found the best p for any window for this layer.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
631 print "Best p found: {}".format(best_p)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
632 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
633
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
634 return best_p
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
635
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
636 def run_functor(functor):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
637 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
638 Given a no-argument functor (like a ClusterFinder), run it and return its
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
639 result. We can use this with multiprocessing.map and map it over a list of
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
640 job functors to do them.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
641
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
642 Handles getting more than multiprocessing's pitiful exception output
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
643 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
644
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
645 try:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
646 return functor()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
647 except:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
648 # Put all exception text into an exception and raise that
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
649 raise Exception(traceback.format_exc())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
650
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
651 def open_matrices(names):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
652 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
653 The argument parser now take multiple similarity matrices as input and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
654 saves their file name as strings. We want to store the names of these
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
655 strings for display later in hexagram.js in order to allow the user to
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
656 navigate and know what type of visualization map they are looking at -
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
657 gene expression, copy number, etc.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
658
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
659 Since, the parser no longer opens the files automatically we must, do it
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
660 in this function.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
661 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
662
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
663 # For each file name, open the file and add it to the matrices list
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
664 # 'r' is the argument stating that the file will be read-only
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
665 for similarity_filename in names:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
666 print "Opening Matrices..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
667 matrix_file = tsv.TsvReader(open(similarity_filename, "r"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
668 matrices.append(matrix_file)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
669
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
670 def compute_beta (coords, matrix, axis, index, options):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
671 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
672 Compute and return a beta matrix from coords * matrix.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
673 Then print the matrix to a file to be read on clientside.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
674 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
675 beta = coords * matrix
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
676 return beta
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
677 # Must add writing function
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
678
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
679 def drl_similarity_functions(matrix, index, options):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
680 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
681 Performs all the functions needed to format a similarity matrix into a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
682 tsv format whereby the DrL can take the values. Then all of the DrL
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
683 functions are performed on the similarity matrix.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
684
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
685 Options is passed to access options.singletons and other required apsects
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
686 of the parsed args.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
687 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
688
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
689 # Work in a temporary directory
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
690 # If not available, create the directory.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
691 drl_directory = tempfile.mkdtemp()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
692
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
693 # This is the base name for all the files that DrL uses to do the layout
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
694 # We're going to put it in a temporary directory.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
695 # index added to extension in order to keep track of
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
696 # respective layouts
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
697 drl_basename = os.path.join(drl_directory, "layout" + str(index))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
698
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
699 # We can just pass our similarity matrix to DrL's truncate
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
700 # But we want to run it through our tsv parser to strip comments and ensure
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
701 # it's valid
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
702
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
703 # This holds a reader for the similarity matrix
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
704 sim_reader = matrix
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
705
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
706 # This holds a writer for the sim file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
707 sim_writer = tsv.TsvWriter(open(drl_basename + ".sim", "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
708
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
709 print "Regularizing similarity matrix..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
710 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
711
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
712 # This holds a list of all unique signature names in the similarity matrix.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
713 # We can use it to add edges to keep singletons.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
714 signatures = set()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
715
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
716 print "Reach for parts in sim_reader"
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
717 for parts in sim_reader:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
718 # Keep the signature names used
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
719 signatures.add(parts[0])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
720 signatures.add(parts[1])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
721
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
722 # Save the line to the regularized file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
723 sim_writer.list_line(parts)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
724
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
725 if options.singletons:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
726 # Now add a self-edge on every node, so we don't drop nodes with no
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
727 # other strictly positive edges
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
728 for signature in signatures:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
729 sim_writer.line(signature, signature, 1)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
730
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
731 sim_reader.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
732 sim_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
733
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
734 # Now our input for DrL is prepared!
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
735
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
736 # Do DrL truncate.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
737 # TODO: pass a truncation level
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
738 print "DrL: Truncating..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
739 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
740 subprocess.check_call(["truncate", "-t", str(options.truncation_edges),
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
741 drl_basename])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
742
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
743 # Run the DrL layout engine.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
744 print "DrL: Doing layout..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
745 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
746 subprocess.check_call(["layout", drl_basename])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
747
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
748 # Put the string names back
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
749 print "DrL: Restoring names..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
750 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
751 subprocess.check_call(["recoord", drl_basename])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
752
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
753 # Now DrL has saved its coordinates as <signature name>\t<x>\t<y> rows in
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
754 # <basename>.coord
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
755
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
756 # We want to read that.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
757 # This holds a reader for the DrL output
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
758 coord_reader = tsv.TsvReader(open(drl_basename + ".coord", "r"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
759
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
760 # This holds a dict from signature name string to (x, y) float tuple. It is
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
761 # also our official collection of node names that made it through DrL, and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
762 # therefore need their score data sent to the client.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
763 nodes = {}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
764
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
765 print "Reading DrL output..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
766 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
767 for parts in coord_reader:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
768 nodes[parts[0]] = (float(parts[1]), float(parts[2]))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
769
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
770 coord_reader.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
771
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
772 # Save the DrL coordinates in our bundle, to be displayed client-side for
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
773 # debugging.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
774
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
775 # index added to drl.tab extension in order to keep track of
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
776 # respective drl.tabs
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
777 coord_writer = tsv.TsvWriter(open(
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
778 os.path.join(options.directory, "drl" + str(index) + ".tab"), "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
779
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
780 for signature_name, (x, y) in nodes.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
781 # Write a tsv with names instead of numbers, like what DrL recoord would
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
782 # have written. This is what the Javascript on the client side wants.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
783 coord_writer.line(signature_name, x, y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
784
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
785 coord_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
786
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
787 # Delete our temporary directory.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
788 shutil.rmtree(drl_directory)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
789
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
790 # Return nodes dict back to main method for further processes
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
791 return nodes
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
792
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
793 def compute_hexagram_assignments (nodes, index, options):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
794 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
795 Now that we are taking multiple similarity matrices as inputs, we must
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
796 compute hexagram assignments for each similarity matrix. These assignments
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
797 are based up on the nodes ouput provided by the DrL function.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
798
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
799 Index relates each matrix name with its drl output, nodes, assignments, etc.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
800 Options contains the parsed arguments that are present in the main method.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
801 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
802 # Do the hexagon layout
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
803 # We do the squiggly rows setup, so express everything as integer x, y
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
804
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
805 # This is a defaultdict from (x, y) integer tuple to id that goes there, or
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
806 # None if it's free.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
807 global hexagons
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
808 hexagons = collections.defaultdict(lambda: None)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
809
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
810 # This holds the side length that we use
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
811 side_length = 1.0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
812
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
813 # This holds what will be a layer of how badly placed each hexagon is
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
814 # A dict from node name to layer value
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
815 placement_badnesses = {}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
816
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
817 for node, (node_x, node_y) in nodes.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
818 # Assign each node to a hexagon
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
819 # This holds the resulting placement badness for that hexagon (i.e.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
820 # distance from ideal location)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
821 badness = assign_hexagon(hexagons, node_x, node_y, node,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
822 scale=side_length)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
823
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
824 # Put the badness in the layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
825 placement_badnesses[node] = float(badness)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
826
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
827 # Normalize the placement badness layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
828 # This holds the max placement badness
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
829 max_placement_badness = max(placement_badnesses.itervalues())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
830 print "Max placement badness: {}".format(max_placement_badness)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
831
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
832 if max_placement_badness != 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
833 # Normalize by the max if possible.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
834 placement_badnesses = {node: value / max_placement_badness for node,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
835 value in placement_badnesses.iteritems()}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
836
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
837 # The hexagons have been assigned. Make hexagons be a dict instead of a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
838 # defaultdict, so it pickles.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
839 # TODO: I should change it so I don't need to do this.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
840 hexagons = dict(hexagons)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
841
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
842 # Now dump the hexagon assignments as an id, x, y tsv. This will be read by
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
843 # the JavaScript on the static page and be used to produce the
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
844 # visualization.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
845 hexagon_writer = tsv.TsvWriter(open(os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
846 "assignments"+ str(index) + ".tab"), "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
847
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
848 # First find the x and y offsets needed to make all hexagon positions
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
849 # positive
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
850 min_x = min(coords[0] for coords in hexagons.iterkeys())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
851 min_y = min(coords[1] for coords in hexagons.iterkeys())
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
852
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
853 for coords, name in hexagons.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
854 # Write this hexagon assignment, converted to all-positive coordinates.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
855 hexagon_writer.line(name, coords[0] - min_x, coords[1] - min_y)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
856 hexagon_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
857
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
858 # Hand placement_badness dict to main method so that it can be used else
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
859 # where.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
860 return placement_badnesses
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
861
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
862 def write_matrix_names (options):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
863 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
864 Write the names of the similarity matrices so that hexagram.js can
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
865 process the names and create the toggle layout GUI.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
866 We pass options to access the parsed args and thus the matrix names.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
867 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
868 name_writer = tsv.TsvWriter(open(os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
869 "matrixnames.tab"), "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
870 for i in options.names:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
871 name_writer.line(i)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
872
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
873 name_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
874
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
875 def main(args):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
876 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
877 Parses command line arguments, and makes visualization.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
878 "args" specifies the program arguments, with args[0] being the executable
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
879 name. The return value should be used as the program's exit code.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
880 """
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
881
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
882 options = parse_args(args) # This holds the nicely-parsed options object
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
883
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
884 print "Created Options"
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
885
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
886 # Test our picking
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
887 x, y = hexagon_center(0, 0)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
888 if hexagon_pick(x, y) != (0, 0):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
889 raise Exception("Picking is broken!")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
890
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
891 # First bit of stdout becomes annotation in Galaxy
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
892 # Make sure our output directory exists.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
893 if not os.path.exists(options.directory):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
894 # makedirs is the right thing to use here: recursive
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
895 os.makedirs(options.directory)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
896
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
897 print "Writing matrix names..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
898 # We must write the file names for hexagram.js to access.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
899 write_matrix_names(options)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
900
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
901 print "About to open matrices..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
902
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
903 # We have file names stored in options.similarities
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
904 # We must open the files and store them in matrices list for access
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
905 open_matrices(options.similarity)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
906
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
907 print "Opened matrices..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
908
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
909 # The nodes list stores the list of nodes for each matrix
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
910 # We must keep track of each set of nodes
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
911 nodes_multiple = []
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
912
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
913 print "Created nodes_multiple list..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
914
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
915 # Index for drl.tab and drl.layout file naming. With indexes we can match
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
916 # file names, to matrices, to drl output files.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
917 for index, i in enumerate (matrices):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
918 nodes_multiple.append (drl_similarity_functions(i, index, options))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
919
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
920 # Compute Hexagam Assignments for each similarity matrix's drl output,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
921 # which is found in nodes_multiple.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
922
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
923 # placement_badnesses_multiple list is required to store the placement
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
924 # badness dicts that are returned by the compute_hexagram_assignments
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
925 # function.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
926 placement_badnesses_multiple = []
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
927 for index, i in enumerate (nodes_multiple):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
928 placement_badnesses_multiple.append (compute_hexagram_assignments (i, index, options))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
929
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
930 # Now that we have hex assignments, compute layers.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
931
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
932 # In addition to making per-layer files, we're going to copy all the score
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
933 # matrices to our output directoy. That way, the client can download layers
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
934 # in big chunks when it wants all layer data for statistics. We need to
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
935 # write a list of matrices that the client can read, which is written by
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
936 # this TSV writer.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
937 matrix_index_writer = tsv.TsvWriter(open(os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
938 "matrices.tab"), "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
939
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
940 # Read in all the layer data at once
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
941 # TODO: Don't read in all the layer data at once
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
942
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
943 # This holds a dict from layer name to a dict from signature name to
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
944 # score.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
945 layers = {}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
946
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
947 # This holds the names of all layers
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
948 layer_names = []
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
949
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
950 for matrix_number, score_filename in enumerate(options.scores):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
951 # First, copy the whole matrix into our output. This holds its filename.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
952 output_filename = "matrix_{}.tab".format(matrix_number)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
953 shutil.copy2(score_filename, os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
954 output_filename))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
955
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
956 # Record were we put it
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
957 matrix_index_writer.line(output_filename)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
958
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
959 # This holds a reader for the scores TSV
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
960 scores_reader = tsv.TsvReader(open(score_filename, "r"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
961
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
962 # This holds an iterator over lines in that file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
963 # TODO: Write a proper header/data API
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
964 scores_iterator = scores_reader.__iter__()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
965
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
966 try:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
967 # This holds the names of the columns (except the first, which is
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
968 # labels). They also happen to be layer names
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
969 file_layer_names = scores_iterator.next()[1:]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
970
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
971 # Add all the layers in this file to the complete list of layers.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
972 layer_names += file_layer_names
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
973
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
974 # Ensure that we have a dict for every layer mentioned in the file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
975 # (even the ones that have no data below). Doing it this way means
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
976 # all score matrices need disjoint columns, or the last one takes
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
977 # precedence.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
978 for name in file_layer_names:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
979 layers[name] = {}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
980
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
981 for parts in scores_iterator:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
982 # This is the signature that this line is about
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
983 signature_name = parts[0]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
984
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
985 if signature_name not in nodes_multiple[0]:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
986 # This signature wasn't in our DrL output. Don't bother
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
987 # putting its layer data in our visualization. This saves
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
988 # space and makes the client-side layer counts accurate for
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
989 # the data actually displayable.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
990 continue
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
991
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
992 # These are the scores for all the layers for this signature
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
993 layer_scores = parts[1:]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
994
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
995 for (layer_name, score) in itertools.izip(file_layer_names,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
996 layer_scores):
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
997
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
998 # Store all the layer scores in the appropriate
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
999 # dictionaries.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1000 try:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1001 layers[layer_name][signature_name] = float(score)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1002 except ValueError:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1003 # This is not a float.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1004 # Don't set that entry for this layer.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1005 # TODO: possibly ought to complain to the user? But then
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1006 # things like "N/A" won't be handled properly.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1007 continue
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1008
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1009 except StopIteration:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1010 # We don't have any real data here. Couldn't read the header line.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1011 # Skip to the next file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1012 pass
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1013
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1014 # We're done with this score file now
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1015 scores_reader.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1016
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1017 # We're done with all the input score matrices, so our index is done too.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1018 matrix_index_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1019
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1020 # We have now loaded all layer data into memory as Python objects. What
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1021 # could possibly go wrong?
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1022
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1023 # Stick our placement badness layer on the end
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1024 layer_names.append("Placement Badness")
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1025 layers["Placement Badness"] = placement_badnesses_multiple[0]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1026
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1027 # Now we need to write layer files.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1028
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1029 # Generate some filenames for layers that we can look up by layer name.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1030 # We do this because layer names may not be valid filenames.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1031 layer_files = {name: os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1032 "layer_{}.tab".format(number)) for (name, number) in itertools.izip(
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1033 layer_names, itertools.count())}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1034
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1035 for layer_name, layer in layers.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1036 # Write out all the individual layer files
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1037 # This holds the writer for this layer file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1038 scores_writer = tsv.TsvWriter(open(layer_files[layer_name], "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1039 for signature_name, score in layer.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1040 # Write the score for this signature in this layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1041 scores_writer.line(signature_name, score)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1042 scores_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1043
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1044 # We need something to sort layers by. We have "priority" (lower is
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1045 # better)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1046
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1047 if len(layer_names) > 0 and options.stats:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1048 # We want to do this fancy parallel stats thing.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1049 # We skip it when there are no layers, so we don't try to join a
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1050 # never-used pool, which seems to hang.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1051
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1052 print "Running statistics..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1053
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1054 # This holds an iterator that makes ClusterFinders for all out layers
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1055 cluster_finders = [ClusterFinder(hexagons, layers[layer_name],
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1056 window_size=options.window_size) for layer_name in layer_names]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1057
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1058 print "{} jobs to do.".format(len(cluster_finders))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1059
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1060 # This holds a multiprocessing pool for parallelization
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1061 pool = multiprocessing.Pool()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1062
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1063 # This holds all the best p values in the same order
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1064 best_p_values = pool.map(run_functor, cluster_finders)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1065
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1066 # Close down the pool so multiprocessing won't die sillily at the end
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1067 pool.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1068 pool.join()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1069
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1070 # This holds a dict from layer name to priority (best p value)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1071 # We hope the order of the dict items has not changed
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1072 layer_priorities = {layer_name: best_p_value for layer_name,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1073 best_p_value in itertools.izip(layer_names, best_p_values)}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1074 else:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1075 # We aren't doing any stats.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1076
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1077 print "Skipping statistics."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1078
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1079 # Make up priorities.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1080 layer_priorities = {name: float("+inf") for name in layer_names}
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1081
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1082 # Count how many layer entries are greater than 0 for each binary layer, and
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1083 # store that number in this dict by layer name. Things with the default
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1084 # empty string instead of a number aren't binary layers, but they can use
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1085 # the empty string as their TSV field value, so we can safely pull any layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1086 # out of this by name.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1087 layer_positives = collections.defaultdict(str)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1088
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1089 for layer_name in layer_names:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1090 # Assume it's a binary layer until proven otherwise
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1091 layer_positives[layer_name] = 0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1092 for value in layers[layer_name].itervalues():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1093 if value == 1:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1094 # Count up all the 1s in the layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1095 layer_positives[layer_name] += 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1096 elif value != 0:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1097 # It has something that isn't 1 or 0, so it can't be a binary
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1098 # layer. Throw it out and try the next layer.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1099 layer_positives[layer_name] = ""
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1100 break
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1101
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1102 # Write an index of all the layers we have, in the form:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1103 # <layer>\t<file>\t<priority>\t<number of signatures with data>\t<number of
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1104 # signatures that are 1 for binary layers, or empty>
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1105 # This is the writer to use.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1106 index_writer = tsv.TsvWriter(open(os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1107 "layers.tab"), "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1108
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1109 for layer_name, layer_file in layer_files.iteritems():
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1110 # Write the index entry for this layer
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1111 index_writer.line(layer_name, os.path.basename(layer_file),
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1112 layer_priorities[layer_name], len(layers[layer_name]),
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1113 layer_positives[layer_name])
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1114
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1115 index_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1116
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1117 # Sahil will implement linear regression code here
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1118
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1119 # We must create a m * n matrix of samples * genes
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1120 # In order to create this matrix we first must know the number of hexes
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1121 # and mantain them in a certain order. The order is important so that
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1122 # we populate the matrix with the data values in the proper row (sample).
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1123
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1124 # Copy over the user-specified colormaps file, or make an empty TSV if it's
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1125 # not specified.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1126
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1127 # This holds a writer for the sim file. Creating it creates the file.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1128 colormaps_writer = tsv.TsvWriter(open(os.path.join(options.directory,
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1129 "colormaps.tab"), "w"))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1130
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1131 if options.colormaps is not None:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1132 # The user specified colormap data, so copy it over
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1133 # This holds a reader for the colormaps file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1134 colormaps_reader = tsv.TsvReader(options.colormaps)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1135
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1136 print "Regularizing colormaps file..."
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1137 sys.stdout.flush()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1138
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1139 for parts in colormaps_reader:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1140 colormaps_writer.list_line(parts)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1141
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1142 colormaps_reader.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1143
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1144 # Close the colormaps file we wrote. It may have gotten data, or it may
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1145 # still be empty.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1146 colormaps_writer.close()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1147
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1148 # Now copy any static files from where they live next to this Python file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1149 # into the web page bundle.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1150 # This holds the directory where this script lives, which also contains
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1151 # static files.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1152 tool_root = os.path.dirname(os.path.realpath(__file__))
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1153
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1154 # Copy over all the static files we need for the web page
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1155 # This holds a list of them
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1156 static_files = [
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1157 # Static images
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1158 "drag.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1159 "filter.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1160 "statistics.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1161 "right.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1162 "set.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1163 "save.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1164 "inflate.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1165 "throbber.svg",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1166
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1167 # jQuery itself is pulled from a CDN.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1168 # We can't take everything offline since Google Maps needs to be sourced
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1169 # from Google, so we might as well use CDN jQuery.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1170
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1171 # Select2 scripts and resources:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1172 "select2.css",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1173 "select2.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1174 "select2.png",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1175 "select2-spinner.gif",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1176 "select2x2.png",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1177
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1178 # The jQuery.tsv plugin
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1179 "jquery.tsv.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1180 # The color library
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1181 "color-0.4.1.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1182 # The jStat statistics library
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1183 "jstat-1.0.0.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1184 # The Google Maps MapLabel library
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1185 "maplabel-compiled.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1186 # The main CSS file
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1187 "hexagram.css",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1188 # The main JavaScript file that runs the page
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1189 "hexagram.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1190 # Web Worker for statistics
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1191 "statistics.js",
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1192 # File with all the tool code
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1193 "tools.js"
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1194 ]
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1195
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1196 # We'd just use a directory of static files, but Galaxy needs single-level
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1197 # output.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1198 for filename in static_files:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1199 shutil.copy2(os.path.join(tool_root, filename), options.directory)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1200
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1201 # Copy the HTML file to our output file. It automatically knows to read
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1202 # assignments.tab, and does its own TSV parsing
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1203 shutil.copy2(os.path.join(tool_root, "hexagram.html"), options.html)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1204
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1205 print "Visualization generation complete!"
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1206
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1207 return 0
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1208
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1209 if __name__ == "__main__" :
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1210 try:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1211 # Get the return code to return
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1212 # Don't just exit with it because sys.exit works by exceptions.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1213 return_code = main(sys.argv)
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1214 except:
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1215 traceback.print_exc()
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1216 # Return a definite number and not some unspecified error code.
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1217 return_code = 1
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1218
1407e3634bcf Uploaded r11 from test tool shed.
adam-novak
parents:
diff changeset
1219 sys.exit(return_code)