0
|
1 <tool id="rgEigPCA1" name="Eigensoft:">
|
|
2 <description>PCA Ancestry using SNP</description>
|
|
3
|
|
4 <command interpreter="python">
|
|
5 rgEigPCA.py "$i.extra_files_path/$i.metadata.base_name" "$title" "$out_file1"
|
|
6 "$out_file1.files_path" "$k" "$m" "$t" "$s" "$pca"
|
|
7 </command>
|
|
8
|
|
9 <inputs>
|
|
10
|
|
11 <param name="i" type="data" label="Input genotype data file"
|
|
12 size="120" format="ldindep" />
|
|
13 <param name="title" type="text" value="Ancestry PCA" label="Title for outputs from this run"
|
|
14 size="80" />
|
|
15 <param name="k" type="integer" value="4" label="Number of principal components to output"
|
|
16 size="3" />
|
|
17 <param name="m" type="integer" value="0" label="Max. outlier removal iterations"
|
|
18 help="To turn on outlier removal, set m=5 or so. Do this if you plan on adjusting any analyses"
|
|
19 size="3" />
|
|
20 <param name="t" type="integer" value="5" label="# principal components used for outlier removal"
|
|
21 size="3" />
|
|
22 <param name="s" type="integer" value="6" label="#SDs for outlier removal"
|
|
23 help = "Any individual with SD along one of k top principal components > s will be removed as an outlier."
|
|
24 size="3" />
|
|
25
|
|
26 </inputs>
|
|
27
|
|
28 <outputs>
|
|
29 <data name="out_file1" format="html" label="${title}_rgEig.html"/>
|
|
30 <data name="pca" format="txt" label="${title}_rgEig.txt"/>
|
|
31 </outputs>
|
|
32
|
|
33 <tests>
|
|
34 <test>
|
|
35 <param name='i' value='tinywga' ftype='ldindep' >
|
|
36 <metadata name='base_name' value='tinywga' />
|
|
37 <composite_data value='tinywga.bim' />
|
|
38 <composite_data value='tinywga.bed' />
|
|
39 <composite_data value='tinywga.fam' />
|
|
40 <edit_attributes type='name' value='tinywga' />
|
|
41 </param>
|
|
42 <param name='title' value='rgEigPCAtest1' />
|
|
43 <param name="k" value="4" />
|
|
44 <param name="m" value="2" />
|
|
45 <param name="t" value="2" />
|
|
46 <param name="s" value="2" />
|
|
47 <output name='out_file1' file='rgtestouts/rgEigPCA/rgEigPCAtest1.html' ftype='html' compare='diff' lines_diff='195'>
|
|
48 <extra_files type="file" name='rgEigPCAtest1_PCAPlot.pdf' value="rgtestouts/rgEigPCA/rgEigPCAtest1_PCAPlot.pdf" compare="sim_size" delta="3000"/>
|
|
49 </output>
|
|
50 <output name='pca' file='rgtestouts/rgEigPCA/rgEigPCAtest1.txt' compare='diff'/>
|
|
51 </test>
|
|
52 </tests>
|
|
53
|
|
54 <help>
|
|
55
|
|
56
|
|
57 **Syntax**
|
|
58
|
|
59 - **Genotype data** is an input genotype dataset in Plink lped (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml) format. See below for notes
|
|
60 - **Title** is used to name the output files so you can remember what the outputs are for
|
|
61 - **Tuning parameters** are documented in the Eigensoft (http://genepath.med.harvard.edu/~reich/Software.htm) documentation - see below
|
|
62
|
|
63
|
|
64 -----
|
|
65
|
|
66 **Summary**
|
|
67
|
|
68 Eigensoft requires ld-reduced genotype data.
|
|
69 Galaxy has an automatic converter for genotype data in Plink linkage pedigree (lped) format.
|
|
70 For details of this generic genotype format, please see the Plink documentation at
|
|
71 http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
|
|
72
|
|
73 Reading that documentation, you'll see that the linkage pedigree format is really two related files with the same
|
|
74 file base name - a map and ped file - eg 'mygeno.ped' and 'mygeno.map'.
|
|
75 The map file has the chromosome, offset, genetic offset and snp name corresponding to each
|
|
76 genotype stored as separate alleles in the ped file. The ped file has family id, individual id, father id (or 0), mother id
|
|
77 (or 0), gender (1=male, 2=female, 0=unknown) and affection (1=unaffected, 2=affected, 0=unknown),
|
|
78 then two separate allele columns for each genotype.
|
|
79
|
|
80 Once you have your data in the right format, you can upload those into your Galaxy history using the "upload" tool.
|
|
81
|
|
82 To upload your lped data in the upload tool, choose 'lped' as the 'file format'. The tool form will change to
|
|
83 allow you to navigate to and select each member of the pair of ped and map files stored on your local computer
|
|
84 (or available at a public URL for Galaxy to grab).
|
|
85 Give the dataset a meaningful name (replace rgeneticsData with something more useful!) and click execute.
|
|
86
|
|
87 When the upload is done, your new lped format dataset will appear in your history and then,
|
|
88 when you choose the ancestry tool, that history dataset will be available as input.
|
|
89
|
|
90 **Warning for the Impatient**
|
|
91
|
|
92 When you execute the tool, it will look like it has not started running for a while as the automatic converter
|
|
93 reduces the amount of LD - otherwise eigenstrat gives biased results.
|
|
94
|
|
95
|
|
96 **Attribution**
|
|
97
|
|
98 This tool runs and relies on the work of many others, including the
|
|
99 maintainers of the Eigensoft program, and the R and
|
|
100 Bioconductor projects. For full attribution, source code and documentation, please see
|
|
101 http://genepath.med.harvard.edu/~reich/Software.htm, http://cran.r-project.org/
|
|
102 and http://www.bioconductor.org/ respectively
|
|
103
|
|
104 This implementation is a Galaxy tool wrapper around these third party applications.
|
|
105 It was originally designed and written for family based data from the CAMP Illumina run of 2007 by
|
|
106 ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit.
|
|
107
|
|
108 copyright Ross Lazarus 2007
|
|
109 Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html
|
|
110 but is about as useful as a sponge boat without EIGENSOFT pca code.
|
|
111
|
|
112 **README from eigensoft2 distribution at http://genepath.med.harvard.edu/~reich/Software.htm**
|
|
113
|
|
114 [rerla@beast eigensoft2]$ cat README
|
|
115 EIGENSOFT version 2.0, January 2008 (for Linux only)
|
|
116
|
|
117 This is the same as our EIGENSOFT 2.0 BETA release with a few recent changes
|
|
118 as described at http://genepath.med.harvard.edu/~reich/New_In_EIGENSOFT.htm.
|
|
119
|
|
120 Features of EIGENSOFT version 2.0 include:
|
|
121 -- Keeping track of ref/var alleles in all file formats: see CONVERTF/README
|
|
122 -- Handling data sets up to 8 billion genotypes: see CONVERTF/README
|
|
123 -- Output SNP weightings of each principal component: see POPGEN/README
|
|
124
|
|
125 The EIGENSOFT package implements methods from the following 2 papers:
|
|
126 Patterson N. et al. 2006 PLoS Genetics in press (population structure)
|
|
127 Price A.L. et al. 2006 NG 38:904-9 (EIGENSTRAT stratification correction)
|
|
128
|
|
129 See POPGEN/README for documentation of population structure programs.
|
|
130
|
|
131 See EIGENSTRAT/README for documentation of EIGENSTRAT programs.
|
|
132
|
|
133 See CONVERTF/README for documentation of programs for converting file formats.
|
|
134
|
|
135
|
|
136 Executables and source code:
|
|
137 ----------------------------
|
|
138 All C executables are in the bin/ directory.
|
|
139
|
|
140 We have placed source code for all C executables in the src/ directory,
|
|
141 for users who wish to modify and recompile our programs. For example, to
|
|
142 recompile the eigenstrat program, type
|
|
143 "cd src"
|
|
144 "make eigenstrat"
|
|
145 "mv eigenstrat ../bin"
|
|
146
|
|
147 Note that some of our software will only compile if your system has the
|
|
148 lapack package installed. (This package is used to compute eigenvectors.)
|
|
149 Some users may need to change "blas-3" to "blas" in the Makefile,
|
|
150 depending on how blas and lapack are installed.
|
|
151
|
|
152 If cc is not available on your system, try "cp Makefile.alt Makefile"
|
|
153 and then recompile.
|
|
154
|
|
155 If you have trouble compiling and running our code, try compiling and
|
|
156 running the pcatoy program in the src directory:
|
|
157 "cd src"
|
|
158 "make pcatoy"
|
|
159 "./pcatoy"
|
|
160 If you are unable to run the pcatoy program successfully, please contact
|
|
161 your system administrator for help, as this is a systems issue which is
|
|
162 beyond our scope. Your system administrator will be able to troubleshoot
|
|
163 your systems issue using this trivial program. [You can also try running
|
|
164 the pcatoy program in the bin directory, which we have already compiled.]
|
|
165 </help>
|
|
166 </tool>
|
|
167
|