0
|
1 <tool id="rgGRR1" name="GRR:">
|
|
2 <description>Pairwise Allele Sharing</description>
|
|
3 <command interpreter="python">
|
|
4 rgGRR.py $i.extra_files_path/$i.metadata.base_name "$i.metadata.base_name"
|
|
5 '$out_file1' '$out_file1.files_path' "$title" '$n' '$Z'
|
|
6 </command>
|
|
7 <inputs>
|
|
8 <param name="i" type="data" label="Genotype data file from your current history"
|
|
9 format="ldindep" />
|
|
10 <param name='title' type='text' size="80" value='rgGRR' label="Title for this job"/>
|
|
11 <param name="n" type="integer" label="N snps to use (0=all)" value="5000" />
|
|
12 <param name="Z" type="float" label="Z score cutoff for outliers (eg 2)" value="6"
|
|
13 help="2 works but for very large numbers of pairs, you might want to see less than 5%" />
|
|
14 </inputs>
|
|
15 <outputs>
|
|
16 <data format="html" name="out_file1" label="${title}_rgGRR.html"/>
|
|
17 </outputs>
|
|
18
|
|
19 <tests>
|
|
20 <test>
|
|
21 <param name='i' value='tinywga' ftype='ldindep' >
|
|
22 <metadata name='base_name' value='tinywga' />
|
|
23 <composite_data value='tinywga.bim' />
|
|
24 <composite_data value='tinywga.bed' />
|
|
25 <composite_data value='tinywga.fam' />
|
|
26 <edit_attributes type='name' value='tinywga' />
|
|
27 </param>
|
|
28 <param name='title' value='rgGRRtest1' />
|
|
29 <param name='n' value='100' />
|
|
30 <param name='Z' value='6' />
|
|
31 <param name='force' value='true' />
|
|
32 <output name='out_file1' file='rgtestouts/rgGRR/rgGRRtest1.html' ftype='html' compare="diff" lines_diff='350'>
|
|
33 <extra_files type="file" name='Log_rgGRRtest1.txt' value="rgtestouts/rgGRR/Log_rgGRRtest1.txt" compare="diff" lines_diff="170"/>
|
|
34 <extra_files type="file" name='rgGRRtest1.svg' value="rgtestouts/rgGRR/rgGRRtest1.svg" compare="diff" lines_diff="1000" />
|
|
35 <extra_files type="file" name='rgGRRtest1_table.xls' value="rgtestouts/rgGRR/rgGRRtest1_table.xls" compare="diff" lines_diff="100" />
|
|
36 </output>
|
|
37 </test>
|
|
38 </tests>
|
|
39
|
|
40
|
|
41 <help>
|
|
42
|
|
43 .. class:: infomark
|
|
44
|
|
45 **Explanation**
|
|
46
|
|
47 This tool will calculate allele sharing among all subjects, one pair at a time. It outputs measures of average alleles
|
|
48 shared and measures of variability for each pair of subjects and creates an interactive image where each pair is
|
|
49 plotted in this mean/variance space. It is based on the GRR windows application available at
|
|
50 http://www.sph.umich.edu/csg/abecasis/GRR/
|
|
51
|
|
52 The plot is interactive - you can unselect one of the relationships in the legend to remove all those points
|
|
53 from the plot for example. Details of outlier pairs will pop up when the pointer is over them. e found by moving your pointer
|
|
54 over them. This relies on a working browser SVG plugin - try getting one installed for your browser if the interactivity is
|
|
55 broken.
|
|
56
|
|
57 -----
|
|
58
|
|
59 **Syntax**
|
|
60
|
|
61 - **Genotype file** is the input pedigree data chosen from available library Plink binary files
|
|
62 - **Title** will be used to name the outputs so make it mnemonic and useful
|
|
63 - **N** is left 0 to use all snps - otherwise you get a random sample - much quicker with little loss of precision > 5000 SNPS
|
|
64
|
|
65 **Summary**
|
|
66
|
|
67 Warning - this tool works pairwise so slows down exponentially with sample size. An LD-reduced dataset is
|
|
68 strongly recommended as it will give good resolution with relatively few SNPs. Do not use all million snps from a whole
|
|
69 genome chip - it's overkill - 5k is good, 10k is almost indistinguishable from 100k.
|
|
70
|
|
71 SNP are sampled randomly from the autosomes - otherwise parent/child pairs will be separated by gender.
|
|
72 This tool will estimate mean pairwise allele shareing among all subjects. Based on the work of Abecasis, it has
|
|
73 been rewritten so it can run with much larger data sets, produces cross platform svg and runs
|
|
74 on a Galaxy server, instead of being MS windows only. Written in is Python, it uses numpy, and the innermost loop
|
|
75 is inline C so it can calculate about 50M SNPpairs/sec on a typical opteron server.
|
|
76
|
|
77 Setting N to some (fraction) of available markers will speed up calculation - the difference is most painful for
|
|
78 large subject N. The real cost is that every subject must be compared to every other one over all genotypes -
|
|
79 this is an exponential problem on subjects.
|
|
80
|
|
81 If you don't see the genotype data set you want here, it can be imported using one of the methods available from
|
|
82 the Rgenetics Get Data tool.
|
|
83
|
|
84 -----
|
|
85
|
|
86 **Attribution**
|
|
87
|
|
88 Based on an idea from G. Abecasis implemented as GRR (windows only) at http://www.sph.umich.edu/csg/abecasis/GRR/
|
|
89
|
|
90 Ross Lazarus wrote the original pdf writer Galaxy tool version.
|
|
91 John Ziniti added the C and created the slick svg representation.
|
|
92 Copyright Ross Lazarus 2007
|
|
93 Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html
|
|
94 </help>
|
|
95 </tool>
|