annotate TO_GALAXY/README @ 1:2086dd919b31 draft

Uploaded
author mmaiensc
date Wed, 13 Nov 2013 16:28:55 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
1 ARTS: Automated Randomization of multiple Traits for Study design
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
2 Written by Mark Maienschein-Cline
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
3 mmaiensc@gmail.com
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
4 Center for Research Informatics
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
5 University of Illinois at Chicago
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
6
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
7 ARTS uses a genetic algorithm to optimize (minimize) a mutual information-based objective function, obtaining
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
8 an optimal randomization for studies of arbitrary size and design.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
9
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
10 The publication for this code is in preparation; citation to be added soon (hopefully!). When it is published,
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
11 the section of the supplementary information will give more details about usage (in addition to what's below).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
12
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
13 Please contact me at the email above with questions.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
14
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
15
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
16
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
17 There are two ways of using this code: command-line (it's a perl script), or through Galaxy.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
18
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
19 You can learn about, and download, Galaxy at http://galaxyproject.org.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
20
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
21 ################
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
22 # INSTALLATION #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
23 ################
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
24
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
25 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
26 # Command line version:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
27 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
28 No installation needed, as long as you have a perl interpreter. Should work fine on a Mac or Linux system;
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
29 probably fine on Windows, but I haven't tested it.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
30
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
31 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
32 # Galaxy version:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
33 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
34 Two options:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
35 1) You can download this tool from the Galaxy toolshed directly into your installation.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
36 2) Move the ARTS.pl and .xml files into tools/ in your Galaxy distribution, and edit the tool_config file
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
37 appropriately. If you don't know how to do this, you should probably use strategy #1.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
38
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
39 ###########
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
40 # RUNNING #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
41 ###########
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
42
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
43 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
44 # Galaxy version
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
45 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
46 Once you get the tools installed in Galaxy, there are help sections in the tool descriptions you can refer to.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
47 Also refer to the instructions for the command-line version below.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
48
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
49 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
50 # Command line version:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
51 #
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
52
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
53 Run ARTS.pl without any inputs to see the usage. All inputs are specified using the usual [-flag] [value]
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
54 syntax (i.e., -i input.txt).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
55
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
56 Sample command using the sample_data.txt file:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
57 ./ARTS.pl -i sample_data.txt -c "2,3,4,5;2;3;4;5" -b 10 -o batched_data.txt -cc 2,4 -cd 4
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
58
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
59
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
60 More information about the inputs (*'ed remarks refer to the values in the sample command above):
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
61
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
62 -i Input trait table: tab-delimited table, including 1 header line. See sample_data.txt for an example.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
63 You can prepare this table in Excel and save as a tab-delimited text, or just write it in a text file,
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
64 or copy-paste from Excel to a text file. You can have more columns than you will actually care about
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
65 randomizing here.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
66 * You can use the file sample_data.txt as an example input; there are 5 columns, Sample ID, Age, Sex,
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
67 Collection Date, and Disease.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
68
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
69 -c Trait columns to randomize. This is a comma- and semicolon-delimited list. Its syntax is important,
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
70 so pay attention.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
71 Columns are numbered starting from 1. Traits that should be considered jointly should be listed together
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
72 separated by commas. Each set of jointly considered traits should be listed separated by semicolons. Hence,
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
73 * -c "2,3,4,5;2;3;4;5" says to consider all the traits (columns 2-5) jointly (that's the 2,3,4,5 part), AND
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
74 to consider each trait individually (that's the ;2;3;4;5 part).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
75 You could opt to only consider traits individually (-c "2;3;4;5"), or only jointly (-c "2,3,4,5"), or only
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
76 pair-wise (-c "2,3;2,4;2,5;3,4;3,5;4,5"), or whatever you want.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
77 OUR GENERAL-PURPOSE RECOMMENDATION is to consider all traits jointly, plus all individually, as in the sample
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
78 command. This corresponds to the MMI statistic discussed in the publication.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
79 GALAXY USERS: you just get to select the columns to consider, and the script will use the MMI statistic
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
80 automatically (you don't get a choice).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
81 FINAL NOTE: you should put quotes around the value here, since otherwise semicolons will be interpreted
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
82 as end-of-line characters.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
83
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
84 -b Batch size (number of samples that can be processed at the same time). You have two options:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
85 1) Enter a single number. This will fill as many complete batches as possible, and put the remainder into a smaller
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
86 batch. This is probably convenient, but you should do a quick count to make sure you don't end up with a really
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
87 small last batch (e.g., if you have 105 samples and do batch size of 25, your last batch will only have 5 samples).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
88 2) Enter a comma-delimited list that adds up to the number of samples, which allows for uneven batch sizes
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
89 For example, -b 10,10,9,9 for 38 samples. If your math doesn't add up, the program will exit and let you know.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
90 * sample_data.txt has 30 samples, so "-b 10" makes 3 batches of 10 samples each.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
91
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
92 -o Output file. Self-explanatory. The batch assignments are added as an extra column on the end, otherwise looks
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
93 like the input.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
94 * batched_data.txt is our output file.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
95
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
96 -p (sort-of optional: you MUST use both -b and -o, OR just -p) Print (to STDOUT) the statistics of a batched
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
97 run using this column. The result will look like the last part of the STDOUT from an ARTS run (see below),
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
98 but you can use this option for testing batch assignments from another algorithm, or if you did one by hand.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
99
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
100 -cc Indices of continuously-valued columns. ARTS uses discrete values for its statistics, so these columns must
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
101 be discretized (binned). If ARTS encounters a column with more than 20 values, it will generate a warning asking
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
102 if you want it to be continuous. Comma-delimited list.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
103 * In sample_data.txt, columns 2 (age) and 4 (date) could be considered continuous (that is, it's worth treating
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
104 a 35 year-old similarly to a 36 year-old), so we set "-cc 2,4".
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
105
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
106 -cd Date-valued columns. These columns should also be listed under -cc, but this lets ARTS know to expect a date
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
107 (format MUST be M/D/Y, where month is a number (1 instead of January)) and convert the date to a number before
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
108 binning.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
109 * In sample_data.txt, column 4 is a date, so set "-cd 4".
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
110
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
111 -cb Number of bins to use for discretizing the continuous columns. Again, you can set a single value, or give a comma-
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
112 delimited list, which will match the order of the list given in the -cc flag.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
113 * For the sample run, we left the default value of 5, but we could do, for example, "-cb 5,7", which would bin
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
114 the ages into 5 bins and the dates into 7 bins (since we set "-cc 2,4", and column 2 was age, column 4 was date).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
115
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
116 -bn Name for the batch column added to the output. Default is "batch".
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
117
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
118 -s Random number seed. Set as a large negative integer. The code always uses the same seed, but if you want to
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
119 rerun with a different seed you can use this option.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
120
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
121 ----------------------------------------------
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
122
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
123 When you run the sample command, the STDOUT looks like this (I added the N) line numbers):
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
124
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
125 """""""""""""""""""
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
126 1) Using traits: Age Sex Collection date Disease
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
127 2) Using trait combinations: {Age,Sex,Collection date,Disease} {Age} {Sex} {Collection date} {Disease}
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
128 3) Generation 1 of 300, average fitness 0.1432
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
129 4) Generation 2 of 300, average fitness 0.1342
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
130 5) Generation 3 of 300, average fitness 0.1298
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
131 6) Generation 4 of 300, average fitness 0.1279
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
132 7) Generation 5 of 300, average fitness 0.1250
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
133 8) Generation 6 of 300, average fitness 0.1227
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
134 9) Generation 7 of 300, average fitness 0.1211
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
135 10) Generation 8 of 300, average fitness 0.1194
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
136 11) Generation 9 of 300, average fitness 0.1187
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
137 12) Generation 10 of 300, average fitness 0.1181
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
138 13) Generation 11 of 300, average fitness 0.1175
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
139 14) Generation 12 of 300, average fitness 0.1165
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
140 15) Generation 13 of 300, average fitness 0.1143
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
141 16) Generation 14 of 300, average fitness 0.1133
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
142 17) Generation 15 of 300, average fitness 0.1132
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
143 18) Generation 16 of 300, average fitness 0.1127
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
144 19) Generation 17 of 300, average fitness 0.1123
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
145 20) Generation 18 of 300, average fitness 0.1116
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
146 21) Generation 19 of 300, average fitness 0.1119
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
147 22) Generation 20 of 300, average fitness 0.1113
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
148 23) Generation 21 of 300, average fitness 0.1113
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
149 24) Generation 22 of 300, average fitness 0.1110
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
150 25) Generation 23 of 300, average fitness 0.1110
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
151 26) Final MI 0.1045 ; Individual trait MIs (mean 0.0091 ): 0.0155 0.0000 0.0209 0.0000
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
152 27) -----------------------------------------------------------------
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
153 28) Age values Sex values Collection date values Disease values
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
154 29) Batch (size) 19-27.2 35.4-43.6 51.8-60 43.6-51.8 27.2-35.4 M F 2/26/2012-11/11/2012 11/11/2012-7/27/2013 6/14/2011-2/26/2012 9/29/2010-6/14/2011 1/15/2010-9/29/2010 Y N
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
155 30) ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
156 31) 1 (10) 2 2 2 1 3 5 5 3 2 2 2 1 5 5
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
157 32) 2 (10) 2 2 1 2 3 5 5 2 2 4 1 1 5 5
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
158 33) 3 (10) 3 2 1 1 3 5 5 3 2 2 2 1 5 5
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
159 34) ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
160 35) Total 7 6 4 4 9 15 15 8 6 8 5 3 15 15
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
161 """""""""""""""""""
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
162
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
163 Here's what the lines mean:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
164 1) Tells you what traits you've selected.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
165 2) Tells you what trait combinations you've selected.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
166 3-25) Prints the progress for each generation of the GA. Converges when average fitness changes by less than 0.0001.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
167 26) Final objective function value. Normalized between 0 and 1, ideal case is 0. Note that different choices of the
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
168 objective function ARE NOT COMPARABLE: if you select fewer traits, or simpler combinations of traits (fewer
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
169 joint traits) using different -c values, you will get lower MI values, but this does not necessarily indicate better
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
170 overall randomization, because your choices may be overly simplistic. This is why we recommend sticking with the
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
171 MMI definition (all joint + all individual) consistently. This line also gives the randomization values for all
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
172 individual traits.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
173 27-24) Inividual trait counts per batch for different values. Continuously-valued columns are given as a range
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
174 (e.g., age 19-27.2).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
175 35) Total number of traits in each bin over all samples.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
176
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
177 ----------------------------------------------
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
178
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
179 The output, batched_data.txt, will look like this:
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
180
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
181 """""""""""""""""""
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
182 Sample ID Age Sex Collection date Disease batch
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
183 sample1 25 M 3/28/2012 Y 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
184 sample2 37 F 4/27/2013 N 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
185 sample3 36 F 3/10/2013 N 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
186 sample4 52 M 7/1/2012 Y 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
187 sample5 48 M 8/13/2011 Y 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
188 sample6 60 M 9/21/2011 N 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
189 sample7 31 F 10/22/2010 Y 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
190 sample8 28 F 1/15/2010 N 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
191 sample9 26 M 1/7/2012 N 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
192 sample10 44 F 4/5/2012 Y 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
193 sample11 33 M 5/18/2012 N 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
194 sample12 25 F 7/27/2013 N 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
195 sample13 28 M 1/20/2013 Y 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
196 sample14 30 F 8/11/2012 Y 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
197 sample15 51 M 11/23/2011 N 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
198 sample16 22 M 12/21/2011 N 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
199 sample17 28 M 9/26/2010 Y 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
200 sample18 19 F 1/18/2010 Y 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
201 sample19 35 M 2/10/2012 N 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
202 sample20 38 F 2/17/2012 N 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
203 sample21 25 F 4/28/2012 Y 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
204 sample22 55 M 1/7/2013 Y 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
205 sample23 33 F 6/30/2013 N 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
206 sample24 24 M 7/1/2012 Y 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
207 sample25 42 M 2/15/2011 N 3
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
208 sample26 60 M 5/21/2011 N 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
209 sample27 34 F 10/23/2010 Y 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
210 sample28 37 F 12/18/2010 Y 1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
211 sample29 41 F 11/7/2012 N 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
212 sample30 50 F 2/15/2012 Y 2
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
213 """""""""""""""""""
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
214
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
215 Looks the same as the input file, with a sixth column titled "batch" added, saying which of the three
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
216 batches each sample should be processed in (of course, you can permute the order of batches if you want).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
217
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
218 Included file batched_data.txt is what the output should look like.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
219
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
220
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
221
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
222
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
223
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
224
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
225
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
226
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
227