Mercurial > repos > xuebing > sharplabtool
comparison tools/rgenetics/rgEigPCA.xml @ 0:9071e359b9a3
Uploaded
author | xuebing |
---|---|
date | Fri, 09 Mar 2012 19:37:19 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:9071e359b9a3 |
---|---|
1 <tool id="rgEigPCA1" name="Eigensoft:"> | |
2 <description>PCA Ancestry using SNP</description> | |
3 | |
4 <command interpreter="python"> | |
5 rgEigPCA.py "$i.extra_files_path/$i.metadata.base_name" "$title" "$out_file1" | |
6 "$out_file1.files_path" "$k" "$m" "$t" "$s" "$pca" | |
7 </command> | |
8 | |
9 <inputs> | |
10 | |
11 <param name="i" type="data" label="Input genotype data file" | |
12 size="120" format="ldindep" /> | |
13 <param name="title" type="text" value="Ancestry PCA" label="Title for outputs from this run" | |
14 size="80" /> | |
15 <param name="k" type="integer" value="4" label="Number of principal components to output" | |
16 size="3" /> | |
17 <param name="m" type="integer" value="0" label="Max. outlier removal iterations" | |
18 help="To turn on outlier removal, set m=5 or so. Do this if you plan on adjusting any analyses" | |
19 size="3" /> | |
20 <param name="t" type="integer" value="5" label="# principal components used for outlier removal" | |
21 size="3" /> | |
22 <param name="s" type="integer" value="6" label="#SDs for outlier removal" | |
23 help = "Any individual with SD along one of k top principal components > s will be removed as an outlier." | |
24 size="3" /> | |
25 | |
26 </inputs> | |
27 | |
28 <outputs> | |
29 <data name="out_file1" format="html" label="${title}_rgEig.html"/> | |
30 <data name="pca" format="txt" label="${title}_rgEig.txt"/> | |
31 </outputs> | |
32 | |
33 <tests> | |
34 <test> | |
35 <param name='i' value='tinywga' ftype='ldindep' > | |
36 <metadata name='base_name' value='tinywga' /> | |
37 <composite_data value='tinywga.bim' /> | |
38 <composite_data value='tinywga.bed' /> | |
39 <composite_data value='tinywga.fam' /> | |
40 <edit_attributes type='name' value='tinywga' /> | |
41 </param> | |
42 <param name='title' value='rgEigPCAtest1' /> | |
43 <param name="k" value="4" /> | |
44 <param name="m" value="2" /> | |
45 <param name="t" value="2" /> | |
46 <param name="s" value="2" /> | |
47 <output name='out_file1' file='rgtestouts/rgEigPCA/rgEigPCAtest1.html' ftype='html' compare='diff' lines_diff='195'> | |
48 <extra_files type="file" name='rgEigPCAtest1_PCAPlot.pdf' value="rgtestouts/rgEigPCA/rgEigPCAtest1_PCAPlot.pdf" compare="sim_size" delta="3000"/> | |
49 </output> | |
50 <output name='pca' file='rgtestouts/rgEigPCA/rgEigPCAtest1.txt' compare='diff'/> | |
51 </test> | |
52 </tests> | |
53 | |
54 <help> | |
55 | |
56 | |
57 **Syntax** | |
58 | |
59 - **Genotype data** is an input genotype dataset in Plink lped (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml) format. See below for notes | |
60 - **Title** is used to name the output files so you can remember what the outputs are for | |
61 - **Tuning parameters** are documented in the Eigensoft (http://genepath.med.harvard.edu/~reich/Software.htm) documentation - see below | |
62 | |
63 | |
64 ----- | |
65 | |
66 **Summary** | |
67 | |
68 Eigensoft requires ld-reduced genotype data. | |
69 Galaxy has an automatic converter for genotype data in Plink linkage pedigree (lped) format. | |
70 For details of this generic genotype format, please see the Plink documentation at | |
71 http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml | |
72 | |
73 Reading that documentation, you'll see that the linkage pedigree format is really two related files with the same | |
74 file base name - a map and ped file - eg 'mygeno.ped' and 'mygeno.map'. | |
75 The map file has the chromosome, offset, genetic offset and snp name corresponding to each | |
76 genotype stored as separate alleles in the ped file. The ped file has family id, individual id, father id (or 0), mother id | |
77 (or 0), gender (1=male, 2=female, 0=unknown) and affection (1=unaffected, 2=affected, 0=unknown), | |
78 then two separate allele columns for each genotype. | |
79 | |
80 Once you have your data in the right format, you can upload those into your Galaxy history using the "upload" tool. | |
81 | |
82 To upload your lped data in the upload tool, choose 'lped' as the 'file format'. The tool form will change to | |
83 allow you to navigate to and select each member of the pair of ped and map files stored on your local computer | |
84 (or available at a public URL for Galaxy to grab). | |
85 Give the dataset a meaningful name (replace rgeneticsData with something more useful!) and click execute. | |
86 | |
87 When the upload is done, your new lped format dataset will appear in your history and then, | |
88 when you choose the ancestry tool, that history dataset will be available as input. | |
89 | |
90 **Warning for the Impatient** | |
91 | |
92 When you execute the tool, it will look like it has not started running for a while as the automatic converter | |
93 reduces the amount of LD - otherwise eigenstrat gives biased results. | |
94 | |
95 | |
96 **Attribution** | |
97 | |
98 This tool runs and relies on the work of many others, including the | |
99 maintainers of the Eigensoft program, and the R and | |
100 Bioconductor projects. For full attribution, source code and documentation, please see | |
101 http://genepath.med.harvard.edu/~reich/Software.htm, http://cran.r-project.org/ | |
102 and http://www.bioconductor.org/ respectively | |
103 | |
104 This implementation is a Galaxy tool wrapper around these third party applications. | |
105 It was originally designed and written for family based data from the CAMP Illumina run of 2007 by | |
106 ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit. | |
107 | |
108 copyright Ross Lazarus 2007 | |
109 Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html | |
110 but is about as useful as a sponge boat without EIGENSOFT pca code. | |
111 | |
112 **README from eigensoft2 distribution at http://genepath.med.harvard.edu/~reich/Software.htm** | |
113 | |
114 [rerla@beast eigensoft2]$ cat README | |
115 EIGENSOFT version 2.0, January 2008 (for Linux only) | |
116 | |
117 This is the same as our EIGENSOFT 2.0 BETA release with a few recent changes | |
118 as described at http://genepath.med.harvard.edu/~reich/New_In_EIGENSOFT.htm. | |
119 | |
120 Features of EIGENSOFT version 2.0 include: | |
121 -- Keeping track of ref/var alleles in all file formats: see CONVERTF/README | |
122 -- Handling data sets up to 8 billion genotypes: see CONVERTF/README | |
123 -- Output SNP weightings of each principal component: see POPGEN/README | |
124 | |
125 The EIGENSOFT package implements methods from the following 2 papers: | |
126 Patterson N. et al. 2006 PLoS Genetics in press (population structure) | |
127 Price A.L. et al. 2006 NG 38:904-9 (EIGENSTRAT stratification correction) | |
128 | |
129 See POPGEN/README for documentation of population structure programs. | |
130 | |
131 See EIGENSTRAT/README for documentation of EIGENSTRAT programs. | |
132 | |
133 See CONVERTF/README for documentation of programs for converting file formats. | |
134 | |
135 | |
136 Executables and source code: | |
137 ---------------------------- | |
138 All C executables are in the bin/ directory. | |
139 | |
140 We have placed source code for all C executables in the src/ directory, | |
141 for users who wish to modify and recompile our programs. For example, to | |
142 recompile the eigenstrat program, type | |
143 "cd src" | |
144 "make eigenstrat" | |
145 "mv eigenstrat ../bin" | |
146 | |
147 Note that some of our software will only compile if your system has the | |
148 lapack package installed. (This package is used to compute eigenvectors.) | |
149 Some users may need to change "blas-3" to "blas" in the Makefile, | |
150 depending on how blas and lapack are installed. | |
151 | |
152 If cc is not available on your system, try "cp Makefile.alt Makefile" | |
153 and then recompile. | |
154 | |
155 If you have trouble compiling and running our code, try compiling and | |
156 running the pcatoy program in the src directory: | |
157 "cd src" | |
158 "make pcatoy" | |
159 "./pcatoy" | |
160 If you are unable to run the pcatoy program successfully, please contact | |
161 your system administrator for help, as this is a systems issue which is | |
162 beyond our scope. Your system administrator will be able to troubleshoot | |
163 your systems issue using this trivial program. [You can also try running | |
164 the pcatoy program in the bin directory, which we have already compiled.] | |
165 </help> | |
166 </tool> | |
167 |