0
|
1 <tool id="rgGLM1" name="Linear Models:" version="0.2">
|
|
2 <description>for genotype data</description>
|
|
3 <code file="rgGLM_code.py"/>
|
|
4 <command interpreter="python">
|
|
5 rgGLM.py '$i.extra_files_path/$i.metadata.base_name' '$phef.extra_files_path/$phef.metadata.base_name'
|
|
6 "$title" '$predvar' '$covar' '$out_file1' '$logf' '$i.metadata.base_name'
|
|
7 '$inter' '$cond' '$gender' '$mind' '$geno' '$maf' '$logistic' '$gffout'
|
|
8 </command>
|
|
9
|
|
10 <inputs>
|
|
11 <page>
|
|
12 <param name='title' label='Title for outputs' type='text' value='GLM' size="80" />
|
|
13 <param name="i" type="data" format="pbed" label="Genotype file" size="80" />
|
|
14 <param name="phef" type="data" format="pphe" label="Phenotype file" size="80"
|
|
15 help="Dependent variable and covariates will be chosen from this file on the next page"/>
|
|
16 <param name="logistic" type="text" value = "0" label="1=Use a logistic model (trait must be 1/2 coded like affection)"
|
|
17 help="Please read the Plink documentation about this option" />
|
|
18 <param name="gender" type="text" value = "0" label="1=Add a gender term to model" />
|
|
19 <param name='inter' label='1=Build an interaction model - please read the docs carefully before using this'
|
|
20 type='text' value='0' size="1" />
|
|
21 <param name="cond" type="text" area='true' size='15x20' value = ""
|
|
22 label="condition on this whitespace delimited rs (snp id) list" />
|
|
23 <param name="mind" type="float" value = "0.1" label="Remove subjects with missing genotypes gt (eg 0.1)"
|
|
24 help = "Set to 1 to include all subjects in the input file" />
|
|
25 <param name="geno" type="float" value = "0.1" label="Remove markers with missing genotypes gt (eg 0.1)"
|
|
26 help = "Set to 1 to include all markers in the input file" />
|
|
27 <param name="maf" type="float" value = "0.01" label="Remove markers with MAF lt (eg 0.01) "
|
|
28 help = "Set to 0 to include all markers in the input file"/>
|
|
29 </page>
|
|
30 <page>
|
|
31 <param name="predvar" size="80" type="select" label="Dependent Trait"
|
|
32 dynamic_options="get_phecols(phef=phef,selectOne=1)" display="radio" multiple="false"
|
|
33 help="Model this characteristic in terms of subject snp genotypes - eg rare allele dosage for additive model" />
|
|
34 <param name="covar" size="80" type="select" label="Covariates"
|
|
35 dynamic_options="get_phecols(phef=phef,selectOne=0)" multiple="true" display="checkboxes"
|
|
36 help="Use these phenotypes as covariates in models of snp dosage effects on the dependent trait"/>
|
|
37 </page>
|
|
38 </inputs>
|
|
39
|
|
40 <outputs>
|
|
41 <data format="tabular" name="out_file1" label="${title}_rgGLM.xls"/>
|
|
42 <data format="txt" name="logf" label="${title}_rgGLMlog.txt" />
|
|
43 <data format="gff" name="gffout" label="${title}_rgGLM.gff"/>
|
|
44 </outputs>
|
|
45 <tests>
|
|
46 <test>
|
|
47 <param name='i' value='tinywga' ftype='pbed' >
|
|
48 <metadata name='base_name' value='tinywga' />
|
|
49 <composite_data value='tinywga.bim' />
|
|
50 <composite_data value='tinywga.bed' />
|
|
51 <composite_data value='tinywga.fam' />
|
|
52 <edit_attributes type='name' value='tinywga' />
|
|
53 </param>
|
|
54 <param name='phef' value='tinywga' ftype='pphe' >
|
|
55 <metadata name='base_name' value='tinywga' />
|
|
56 <composite_data value='tinywga.pphe' />
|
|
57 <edit_attributes type='name' value='tinywga' />
|
|
58 </param>
|
|
59 <param name='title' value='rgGLMtest1' />
|
|
60 <param name='predvar' value='c1' />
|
|
61 <param name='covar' value='None' />
|
|
62 <param name='inter' value='0' />
|
|
63 <param name='cond' value='' />
|
|
64 <param name='gender' value='0' />
|
|
65 <param name='mind' value='1.0' />
|
|
66 <param name='geno' value='1.0' />
|
|
67 <param name='maf' value='0.0' />
|
|
68 <param name='logistic' value='0' />
|
|
69 <output name='out_file1' file='rgGLMtest1_GLM.xls' ftype='tabular' compare="diff" />
|
|
70 <output name='logf' file='rgGLMtest1_GLM_log.txt' ftype='txt' compare="diff" lines_diff='36'/>
|
|
71 <output name='gffout' file='rgGLMtest1_GLM_topTable.gff' compare="diff" ftype='gff' />
|
|
72 </test>
|
|
73 </tests>
|
|
74 <help>
|
|
75
|
|
76 .. class:: infomark
|
|
77
|
|
78 **Syntax**
|
|
79
|
|
80 Note this is a two form tool - you will choose the dependent trait and covariates
|
|
81 on the second page based on the phenotype file you choose on the first page
|
|
82
|
|
83 - **Genotype file** is the input Plink format compressed genotype (pbed) file
|
|
84 - **Phenotype file** is the input Plink phenotype (pphe) file with FAMID IID followed by phenotypes
|
|
85 - **Dependant variable** is the term on the left of the model and is chosen from the pphe columns on the second page
|
|
86 - **Logistic** if you are (eg) using disease status as the outcome variable (case/control) - otherwise the model is linear.
|
|
87 - **Covariates** are covariate terms on the right of the model, also chosen on the second page
|
|
88 - **Interactions** will add interactions - please be careful how you interpret these - see the Plink documentation.
|
|
89 - **Gender** will add gender as a model term - described in the Plink documentation
|
|
90 - **Condition** will condition the model on one or more specific SNP rs ids as a whitespace delimited sequence
|
|
91 - **Format** determines how your data will be returned to your Galaxy workspace
|
|
92
|
|
93 -----
|
|
94
|
|
95 .. class:: infomark
|
|
96
|
|
97 **Summary**
|
|
98
|
|
99 This tool will test GLM models for SNP predicting a dependent phenotype
|
|
100 variable with adjustment for specified covariates.
|
|
101
|
|
102 If you don't see the genotype or phenotype data set you want here, it can be imported using
|
|
103 one of the methods available from the rg get data tool group.
|
|
104
|
|
105 Output format can be UCSC .bed if you want to see one column of your
|
|
106 results as a fully fledged UCSC genome browser track. A map file containing the chromosome and offset for each marker is
|
|
107 required for writing this kind of output.
|
|
108 Alternatively you can use .gg for the UCSC Genome Graphs tool which has all of the advantages
|
|
109 of the the .bed track, plus a neat, visual front end that displays a lot of useful clues.
|
|
110 Either of these are a very useful way of quickly getting a look
|
|
111 at your data in full genomic context.
|
|
112
|
|
113 Finally, if you can't live without
|
|
114 spreadsheet data, choose the .xls tab delimited format. It's not a stupid binary excel file. Just a plain old tab
|
|
115 delimited
|
|
116 one with a header. Fortunately excel is dumb enough to open these without much protest.
|
|
117
|
|
118 -----
|
|
119
|
|
120 .. class:: infomark
|
|
121
|
|
122 **Attribution**
|
|
123
|
|
124 This Galaxy tool relies on Plink (see Plinksrc_) to test GLM models.
|
|
125
|
|
126 So, we rely on the author (Shaun Purcell) for the documentation you need specific to those settings - they are very nicely documented - see
|
|
127 DOC_
|
|
128
|
|
129 Tool and Galaxy datatypes originally designed and written for the Rgenetics
|
|
130 series of whole genome scale statistical genetics tools by ross lazarus (ross.lazarus@gmail.com)
|
|
131
|
|
132 Copyright Ross Lazarus March 2007
|
|
133 This Galaxy wrapper is released licensed under the LGPL_ but is about as useful as a chocolate teapot without Plink which is GPL.
|
|
134
|
|
135 I'm no lawyer, but it looks like you got GPL if you use this software. Good luck.
|
|
136
|
|
137 .. _Plinksrc: http://pngu.mgh.harvard.edu/~purcell/plink/
|
|
138
|
|
139 .. _LGPL: http://www.gnu.org/copyleft/lesser.html
|
|
140
|
|
141 .. _DOC: http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#glm
|
|
142
|
|
143 </help>
|
|
144 </tool>
|
|
145
|
|
146
|