annotate ARTS/galaxy_arts.xml @ 0:3723b54935cb draft

Uploaded
author mmaiensc
date Wed, 13 Nov 2013 16:13:17 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
1 <tool id="ARTS" name="ARTS">
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
2 <description>automated study randomization</description>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
3 <command interpreter="perl">ARTS.pl -i $input -o $out -b $batch -c "$column" -cc $conts -cd $dates -cb $bins -bn $bname -s $seed -mmi -v l </command>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
4 <inputs>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
5 <param name="input" type="data" format="tabular" label="Input traits per sample" help="Ensure input is formatted as tabular"/>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
6 <param name="batch" type="text" size="40" label="Batch size" optional="False" help="Set to a single number, or a comma-delimited list"/>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
7 <param name="column" type="data_column" data_ref="input" multiple="True" numerical="False" label="Trait columns to randomize" help="Multi-select list - hold the appropriate key while clicking to select multiple columns." />
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
8 <param name="conts" type="data_column" data_ref="input" multiple="True" numerical="False" optional="True" label="Continuous- and date-valued columns for binning (if any)" help="Multi-select list. Values should be numbers." />
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
9 <param name="dates" type="data_column" data_ref="input" multiple="True" numerical="False" optional="True" label="Date-valued columns for binning (if any)" help="Multi-select list. Dates should be M/D/Y, where M, D, and Y are all integers (e.g., 7/9/1985)." />
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
10 <param name="bins" type="text" size="40" label="Bin sizes (for continuously-valued columns)" value="5" optional="False" help="Set to a single number, or a comma-delimited list. If given as a list, will be used in same order as continuous columns."/>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
11 <param name="bname" type="text" size="40" label="Batch name" value="batch" optional="False" help="Name given to the batch column in the output."/>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
12 <param name="seed" type="integer" size="40" label="Random number seed" optional="False" value="-123456789"/>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
13 </inputs>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
14 <outputs>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
15 <data format="tabular" name="out" />
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
16 </outputs>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
17 <help>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
18
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
19 **Purpose**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
20
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
21 This tool completes automated study randomization for a selected number of traits over the samples in your data set by minimizing a mutual information-based objective function using a genetic algorithm.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
22
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
23 NOTE: in the history output, click the i (view details) icon (between the save-to-disk and rerun icons), then click on stdout to see a summary of the run. This allows you to confirm which traits are being considered, and gives you a snapshot of how randomized individual traits are (it does not inform you about combinations of traits, which ARE ALSO being randomized).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
24
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
25 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
26
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
27 **Input traits per sample**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
28
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
29 - A list of traits associated with each sample, including a header line giving the name of each type of trait. For example::
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
30
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
31 ID Sex Age Sample date Diseased
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
32 Sample1 M 15 6/7/2011 Y
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
33 Sample2 M 25 8/5/2012 Y
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
34 Sample3 F 23 1/30/2012 N
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
35 Sample4 F 45 4/1/2013 N
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
36 Sample5 M 52 3/21/2011 Y
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
37 Sample6 F 37 3/12/2013 N
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
38 Sample7 M 31 7/17/2011 N
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
39
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
40 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
41
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
42 **Batch size**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
43
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
44 - The size of each batch. You can specify this with a single number (e.g., 50), or a list of numbers (separated by commas, for example 50,50,49,49.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
45
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
46 - The first choice will fill up full batches as much as possible, and put all remaining samples in a smaller batch. Thus, the latter choice may be better if the batch size does not evenly divide the number of samples. For example, lets say you have 105 samples and can do a batch size of up to 30. Then::
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
47
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
48 (First option) Batch size = 30 --> batch sizes of 30, 30, 30, and 15
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
49 -or-
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
50 (Second option) Batch size = 27,26,26,26
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
51
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
52 The second option has a more evenly distributed batch size, and will give better results.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
53
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
54 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
55
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
56 **Traits to randomize**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
57
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
58 - Which traits should be randomized. On Macs, hold command to multi-select. You do not need to select all columns (it would be silly, for example, to randomize over sample ID).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
59
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
60 - Note missing values for traits will be treated as an additional trait value (i.e., empty).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
61
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
62 - For the example above, we would select c2, c3, c4, and c5 (Sex, Age, Sample date, and Diseased). Not all traits need be selected, just the relevant ones (we may not care about Sample date, for example).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
63
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
64 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
65
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
66 **Continuous- and date-valued columns (optional)**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
67
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
68 - Use if you have columns with continuous values (e.g., age, blood pressure) or dates. They will be discretized prior to running.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
69
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
70 - For the example above, we would select c3 and c4 (Age, Sample date).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
71
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
72 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
73
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
74 **Date-valued columns (optional)**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
75
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
76 - Use if any of the columns selected as continuous are dates (MUST be formatted M/D/Y, where month is a number, for example 7/9/1985).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
77
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
78 - For the example above, we would select c4 (Sample date).
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
79
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
80 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
81
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
82 **Bin sizes**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
83
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
84 - This only relates to any columns selected as continuous, and determines how many discrete bins the data will be split up in to.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
85
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
86 - You can set it to a single number, and all columns will use that number of bins. Or you can set it to a list of numbers to specify a different number of bins for each column.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
87
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
88 - For the example above, where we selected c3 and c4 as continuous, we could set::
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
89
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
90 Bin sizes=5,6
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
91
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
92 - which would split the Age column (c3) into 5 bins, and the Sample date column (c4) into 6 bins.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
93
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
94 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
95
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
96 **Batch name**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
97
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
98 - The output file will look exactly the same as the input, except an additional column will be added indicated which batch each sample should belong to. You can specify the name of that column here.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
99
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
100 -----
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
101
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
102 **Random number seed**
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
103
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
104 - This will not be need to be changed in general, but if you want to force the use of a different seed, you can.
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
105
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
106
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
107
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
108
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
109 </help>
3723b54935cb Uploaded
mmaiensc
parents:
diff changeset
110 </tool>