Mercurial > repos > xuebing > sharplabtool
diff tools/rgenetics/rgEigPCA.xml @ 0:9071e359b9a3
Uploaded
author | xuebing |
---|---|
date | Fri, 09 Mar 2012 19:37:19 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/rgenetics/rgEigPCA.xml Fri Mar 09 19:37:19 2012 -0500 @@ -0,0 +1,167 @@ +<tool id="rgEigPCA1" name="Eigensoft:"> + <description>PCA Ancestry using SNP</description> + + <command interpreter="python"> + rgEigPCA.py "$i.extra_files_path/$i.metadata.base_name" "$title" "$out_file1" + "$out_file1.files_path" "$k" "$m" "$t" "$s" "$pca" + </command> + + <inputs> + + <param name="i" type="data" label="Input genotype data file" + size="120" format="ldindep" /> + <param name="title" type="text" value="Ancestry PCA" label="Title for outputs from this run" + size="80" /> + <param name="k" type="integer" value="4" label="Number of principal components to output" + size="3" /> + <param name="m" type="integer" value="0" label="Max. outlier removal iterations" + help="To turn on outlier removal, set m=5 or so. Do this if you plan on adjusting any analyses" + size="3" /> + <param name="t" type="integer" value="5" label="# principal components used for outlier removal" + size="3" /> + <param name="s" type="integer" value="6" label="#SDs for outlier removal" + help = "Any individual with SD along one of k top principal components > s will be removed as an outlier." + size="3" /> + + </inputs> + + <outputs> + <data name="out_file1" format="html" label="${title}_rgEig.html"/> + <data name="pca" format="txt" label="${title}_rgEig.txt"/> + </outputs> + +<tests> + <test> + <param name='i' value='tinywga' ftype='ldindep' > + <metadata name='base_name' value='tinywga' /> + <composite_data value='tinywga.bim' /> + <composite_data value='tinywga.bed' /> + <composite_data value='tinywga.fam' /> + <edit_attributes type='name' value='tinywga' /> + </param> + <param name='title' value='rgEigPCAtest1' /> + <param name="k" value="4" /> + <param name="m" value="2" /> + <param name="t" value="2" /> + <param name="s" value="2" /> + <output name='out_file1' file='rgtestouts/rgEigPCA/rgEigPCAtest1.html' ftype='html' compare='diff' lines_diff='195'> + <extra_files type="file" name='rgEigPCAtest1_PCAPlot.pdf' value="rgtestouts/rgEigPCA/rgEigPCAtest1_PCAPlot.pdf" compare="sim_size" delta="3000"/> + </output> + <output name='pca' file='rgtestouts/rgEigPCA/rgEigPCAtest1.txt' compare='diff'/> + </test> +</tests> + +<help> + + +**Syntax** + +- **Genotype data** is an input genotype dataset in Plink lped (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml) format. See below for notes +- **Title** is used to name the output files so you can remember what the outputs are for +- **Tuning parameters** are documented in the Eigensoft (http://genepath.med.harvard.edu/~reich/Software.htm) documentation - see below + + +----- + +**Summary** + +Eigensoft requires ld-reduced genotype data. +Galaxy has an automatic converter for genotype data in Plink linkage pedigree (lped) format. +For details of this generic genotype format, please see the Plink documentation at +http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml + +Reading that documentation, you'll see that the linkage pedigree format is really two related files with the same +file base name - a map and ped file - eg 'mygeno.ped' and 'mygeno.map'. +The map file has the chromosome, offset, genetic offset and snp name corresponding to each +genotype stored as separate alleles in the ped file. The ped file has family id, individual id, father id (or 0), mother id +(or 0), gender (1=male, 2=female, 0=unknown) and affection (1=unaffected, 2=affected, 0=unknown), +then two separate allele columns for each genotype. + +Once you have your data in the right format, you can upload those into your Galaxy history using the "upload" tool. + +To upload your lped data in the upload tool, choose 'lped' as the 'file format'. The tool form will change to +allow you to navigate to and select each member of the pair of ped and map files stored on your local computer +(or available at a public URL for Galaxy to grab). +Give the dataset a meaningful name (replace rgeneticsData with something more useful!) and click execute. + +When the upload is done, your new lped format dataset will appear in your history and then, +when you choose the ancestry tool, that history dataset will be available as input. + +**Warning for the Impatient** + +When you execute the tool, it will look like it has not started running for a while as the automatic converter +reduces the amount of LD - otherwise eigenstrat gives biased results. + + +**Attribution** + +This tool runs and relies on the work of many others, including the +maintainers of the Eigensoft program, and the R and +Bioconductor projects. For full attribution, source code and documentation, please see +http://genepath.med.harvard.edu/~reich/Software.htm, http://cran.r-project.org/ +and http://www.bioconductor.org/ respectively + +This implementation is a Galaxy tool wrapper around these third party applications. +It was originally designed and written for family based data from the CAMP Illumina run of 2007 by +ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit. + +copyright Ross Lazarus 2007 +Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html +but is about as useful as a sponge boat without EIGENSOFT pca code. + +**README from eigensoft2 distribution at http://genepath.med.harvard.edu/~reich/Software.htm** + +[rerla@beast eigensoft2]$ cat README +EIGENSOFT version 2.0, January 2008 (for Linux only) + +This is the same as our EIGENSOFT 2.0 BETA release with a few recent changes +as described at http://genepath.med.harvard.edu/~reich/New_In_EIGENSOFT.htm. + +Features of EIGENSOFT version 2.0 include: +-- Keeping track of ref/var alleles in all file formats: see CONVERTF/README +-- Handling data sets up to 8 billion genotypes: see CONVERTF/README +-- Output SNP weightings of each principal component: see POPGEN/README + +The EIGENSOFT package implements methods from the following 2 papers: +Patterson N. et al. 2006 PLoS Genetics in press (population structure) +Price A.L. et al. 2006 NG 38:904-9 (EIGENSTRAT stratification correction) + +See POPGEN/README for documentation of population structure programs. + +See EIGENSTRAT/README for documentation of EIGENSTRAT programs. + +See CONVERTF/README for documentation of programs for converting file formats. + + +Executables and source code: +---------------------------- +All C executables are in the bin/ directory. + +We have placed source code for all C executables in the src/ directory, +for users who wish to modify and recompile our programs. For example, to +recompile the eigenstrat program, type +"cd src" +"make eigenstrat" +"mv eigenstrat ../bin" + +Note that some of our software will only compile if your system has the +lapack package installed. (This package is used to compute eigenvectors.) +Some users may need to change "blas-3" to "blas" in the Makefile, +depending on how blas and lapack are installed. + +If cc is not available on your system, try "cp Makefile.alt Makefile" +and then recompile. + +If you have trouble compiling and running our code, try compiling and +running the pcatoy program in the src directory: +"cd src" +"make pcatoy" +"./pcatoy" +If you are unable to run the pcatoy program successfully, please contact +your system administrator for help, as this is a systems issue which is +beyond our scope. Your system administrator will be able to troubleshoot +your systems issue using this trivial program. [You can also try running +the pcatoy program in the bin directory, which we have already compiled.] +</help> +</tool> +