annotate TO_GALAXY/tools/ARTS/galaxy_arts.xml @ 1:2086dd919b31 draft

Uploaded
author mmaiensc
date Wed, 13 Nov 2013 16:28:55 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
1 <tool id="ARTS" name="ARTS">
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
2 <description>automated study randomization</description>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
3 <command interpreter="perl">ARTS.pl -i $input -o $out -b $batch -c "$column" -cc $conts -cd $dates -cb $bins -bn $bname -s $seed -mmi -v l </command>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
4 <inputs>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
5 <param name="input" type="data" format="tabular" label="Input traits per sample" help="Ensure input is formatted as tabular"/>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
6 <param name="batch" type="text" size="40" label="Batch size" optional="False" help="Set to a single number, or a comma-delimited list"/>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
7 <param name="column" type="data_column" data_ref="input" multiple="True" numerical="False" label="Trait columns to randomize" help="Multi-select list - hold the appropriate key while clicking to select multiple columns." />
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
8 <param name="conts" type="data_column" data_ref="input" multiple="True" numerical="False" optional="True" label="Continuous- and date-valued columns for binning (if any)" help="Multi-select list. Values should be numbers." />
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
9 <param name="dates" type="data_column" data_ref="input" multiple="True" numerical="False" optional="True" label="Date-valued columns for binning (if any)" help="Multi-select list. Dates should be M/D/Y, where M, D, and Y are all integers (e.g., 7/9/1985)." />
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
10 <param name="bins" type="text" size="40" label="Bin sizes (for continuously-valued columns)" value="5" optional="False" help="Set to a single number, or a comma-delimited list. If given as a list, will be used in same order as continuous columns."/>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
11 <param name="bname" type="text" size="40" label="Batch name" value="batch" optional="False" help="Name given to the batch column in the output."/>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
12 <param name="seed" type="integer" size="40" label="Random number seed" optional="False" value="-123456789"/>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
13 </inputs>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
14 <outputs>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
15 <data format="tabular" name="out" />
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
16 </outputs>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
17 <help>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
18
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
19 **Purpose**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
20
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
21 This tool completes automated study randomization for a selected number of traits over the samples in your data set by minimizing a mutual information-based objective function using a genetic algorithm.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
22
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
23 NOTE: in the history output, click the i (view details) icon (between the save-to-disk and rerun icons), then click on stdout to see a summary of the run. This allows you to confirm which traits are being considered, and gives you a snapshot of how randomized individual traits are (it does not inform you about combinations of traits, which ARE ALSO being randomized).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
24
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
25 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
26
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
27 **Input traits per sample**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
28
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
29 - A list of traits associated with each sample, including a header line giving the name of each type of trait. For example::
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
30
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
31 ID Sex Age Sample date Diseased
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
32 Sample1 M 15 6/7/2011 Y
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
33 Sample2 M 25 8/5/2012 Y
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
34 Sample3 F 23 1/30/2012 N
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
35 Sample4 F 45 4/1/2013 N
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
36 Sample5 M 52 3/21/2011 Y
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
37 Sample6 F 37 3/12/2013 N
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
38 Sample7 M 31 7/17/2011 N
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
39
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
40 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
41
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
42 **Batch size**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
43
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
44 - The size of each batch. You can specify this with a single number (e.g., 50), or a list of numbers (separated by commas, for example 50,50,49,49.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
45
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
46 - The first choice will fill up full batches as much as possible, and put all remaining samples in a smaller batch. Thus, the latter choice may be better if the batch size does not evenly divide the number of samples. For example, lets say you have 105 samples and can do a batch size of up to 30. Then::
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
47
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
48 (First option) Batch size = 30 --> batch sizes of 30, 30, 30, and 15
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
49 -or-
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
50 (Second option) Batch size = 27,26,26,26
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
51
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
52 The second option has a more evenly distributed batch size, and will give better results.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
53
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
54 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
55
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
56 **Traits to randomize**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
57
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
58 - Which traits should be randomized. On Macs, hold command to multi-select. You do not need to select all columns (it would be silly, for example, to randomize over sample ID).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
59
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
60 - Note missing values for traits will be treated as an additional trait value (i.e., empty).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
61
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
62 - For the example above, we would select c2, c3, c4, and c5 (Sex, Age, Sample date, and Diseased). Not all traits need be selected, just the relevant ones (we may not care about Sample date, for example).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
63
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
64 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
65
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
66 **Continuous- and date-valued columns (optional)**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
67
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
68 - Use if you have columns with continuous values (e.g., age, blood pressure) or dates. They will be discretized prior to running.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
69
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
70 - For the example above, we would select c3 and c4 (Age, Sample date).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
71
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
72 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
73
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
74 **Date-valued columns (optional)**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
75
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
76 - Use if any of the columns selected as continuous are dates (MUST be formatted M/D/Y, where month is a number, for example 7/9/1985).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
77
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
78 - For the example above, we would select c4 (Sample date).
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
79
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
80 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
81
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
82 **Bin sizes**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
83
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
84 - This only relates to any columns selected as continuous, and determines how many discrete bins the data will be split up in to.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
85
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
86 - You can set it to a single number, and all columns will use that number of bins. Or you can set it to a list of numbers to specify a different number of bins for each column.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
87
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
88 - For the example above, where we selected c3 and c4 as continuous, we could set::
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
89
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
90 Bin sizes=5,6
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
91
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
92 - which would split the Age column (c3) into 5 bins, and the Sample date column (c4) into 6 bins.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
93
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
94 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
95
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
96 **Batch name**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
97
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
98 - The output file will look exactly the same as the input, except an additional column will be added indicated which batch each sample should belong to. You can specify the name of that column here.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
99
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
100 -----
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
101
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
102 **Random number seed**
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
103
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
104 - This will not be need to be changed in general, but if you want to force the use of a different seed, you can.
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
105
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
106
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
107
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
108
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
109 </help>
2086dd919b31 Uploaded
mmaiensc
parents:
diff changeset
110 </tool>