diff tools/rgenetics/rgEigPCA.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/rgenetics/rgEigPCA.xml	Fri Mar 09 19:37:19 2012 -0500
@@ -0,0 +1,167 @@
+<tool id="rgEigPCA1" name="Eigensoft:">
+    <description>PCA Ancestry using SNP</description>
+
+    <command interpreter="python">
+    rgEigPCA.py "$i.extra_files_path/$i.metadata.base_name" "$title" "$out_file1"
+    "$out_file1.files_path" "$k" "$m" "$t" "$s" "$pca"
+    </command>
+
+    <inputs>
+
+       <param name="i"  type="data" label="Input genotype data file"
+          size="120" format="ldindep" />
+       <param name="title"  type="text" value="Ancestry PCA" label="Title for outputs from this run"
+          size="80"  />
+       <param name="k"  type="integer" value="4" label="Number of principal components to output"
+          size="3"  />
+       <param name="m"  type="integer" value="0" label="Max. outlier removal iterations"
+          help="To turn on outlier removal, set m=5 or so. Do this if you plan on adjusting any analyses"
+          size="3"  />
+       <param name="t"  type="integer" value="5" label="# principal components used for outlier removal"
+          size="3"  />
+       <param name="s"  type="integer" value="6" label="#SDs for outlier removal"
+          help = "Any individual with SD along one of k top principal components > s will be removed as an outlier."
+          size="3"  />
+
+   </inputs>
+
+   <outputs>
+       <data name="out_file1" format="html" label="${title}_rgEig.html"/>
+       <data name="pca" format="txt" label="${title}_rgEig.txt"/>
+   </outputs>
+
+<tests>
+ <test>
+   <param name='i' value='tinywga' ftype='ldindep' >
+   <metadata name='base_name' value='tinywga' />
+   <composite_data value='tinywga.bim' />
+   <composite_data value='tinywga.bed' />
+   <composite_data value='tinywga.fam' />
+   <edit_attributes type='name' value='tinywga' /> 
+   </param>
+    <param name='title' value='rgEigPCAtest1' />
+    <param name="k" value="4" />
+    <param name="m" value="2" />
+    <param name="t" value="2" />
+    <param name="s" value="2" />
+    <output name='out_file1' file='rgtestouts/rgEigPCA/rgEigPCAtest1.html' ftype='html' compare='diff' lines_diff='195'>
+    <extra_files type="file" name='rgEigPCAtest1_PCAPlot.pdf' value="rgtestouts/rgEigPCA/rgEigPCAtest1_PCAPlot.pdf" compare="sim_size" delta="3000"/>
+    </output>
+    <output name='pca' file='rgtestouts/rgEigPCA/rgEigPCAtest1.txt' compare='diff'/>
+ </test>
+</tests>
+
+<help>
+
+
+**Syntax**
+
+- **Genotype data** is an input genotype dataset in Plink lped (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml) format. See below for notes
+- **Title** is used to name the output files so you can remember what the outputs are for
+- **Tuning parameters** are documented in the Eigensoft (http://genepath.med.harvard.edu/~reich/Software.htm) documentation - see below 
+
+
+-----
+
+**Summary**
+
+Eigensoft requires ld-reduced genotype data. 
+Galaxy has an automatic converter for genotype data in Plink linkage pedigree (lped) format.
+For details of this generic genotype format, please see the Plink documentation at 
+http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
+
+Reading that documentation, you'll see that the linkage pedigree format is really two related files with the same 
+file base name - a map and ped file - eg 'mygeno.ped' and 'mygeno.map'.
+The map file has the chromosome, offset, genetic offset and snp name corresponding to each
+genotype stored as separate alleles in the ped file. The ped file has family id, individual id, father id (or 0), mother id
+(or 0), gender (1=male, 2=female, 0=unknown) and affection (1=unaffected, 2=affected, 0=unknown), 
+then two separate allele columns for each genotype. 
+
+Once you have your data in the right format, you can upload those into your Galaxy history using the "upload" tool.
+
+To upload your lped data in the upload tool, choose 'lped' as the 'file format'. The tool form will change to 
+allow you to navigate to and select each member of the pair of  ped and map files stored on your local computer
+(or available at a public URL for Galaxy to grab). 
+Give the dataset a meaningful name (replace rgeneticsData with something more useful!) and click execute. 
+
+When the upload is done, your new lped format dataset will appear in your history and then, 
+when you choose the ancestry tool, that history dataset will be available as input.
+
+**Warning for the Impatient**
+
+When you execute the tool, it will look like it has not started running for a while as the automatic converter 
+reduces the amount of LD - otherwise eigenstrat gives biased results.
+
+
+**Attribution**
+
+This tool runs and relies on the work of many others, including the
+maintainers of the Eigensoft program, and the R and
+Bioconductor projects. For full attribution, source code and documentation, please see
+http://genepath.med.harvard.edu/~reich/Software.htm, http://cran.r-project.org/
+and http://www.bioconductor.org/ respectively
+
+This implementation is a Galaxy tool wrapper around these third party applications.
+It was originally designed and written for family based data from the CAMP Illumina run of 2007 by
+ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit.
+
+copyright Ross Lazarus 2007
+Licensed under the terms of the LGPL as documented http://www.gnu.org/licenses/lgpl.html
+but is about as useful as a sponge boat without EIGENSOFT pca code.
+
+**README from eigensoft2 distribution at http://genepath.med.harvard.edu/~reich/Software.htm**
+
+[rerla@beast eigensoft2]$ cat README
+EIGENSOFT version 2.0, January 2008 (for Linux only)
+
+This is the same as our EIGENSOFT 2.0 BETA release with a few recent changes
+as described at http://genepath.med.harvard.edu/~reich/New_In_EIGENSOFT.htm.
+
+Features of EIGENSOFT version 2.0 include:
+-- Keeping track of ref/var alleles in all file formats: see CONVERTF/README
+-- Handling data sets up to 8 billion genotypes: see CONVERTF/README
+-- Output SNP weightings of each principal component: see POPGEN/README
+
+The EIGENSOFT package implements methods from the following 2 papers:
+Patterson N. et al. 2006 PLoS Genetics in press (population structure)
+Price A.L. et al. 2006 NG 38:904-9 (EIGENSTRAT stratification correction)
+
+See POPGEN/README for documentation of population structure programs.
+
+See EIGENSTRAT/README for documentation of EIGENSTRAT programs.
+
+See CONVERTF/README for documentation of programs for converting file formats.
+
+
+Executables and source code:
+----------------------------
+All C executables are in the bin/ directory.
+
+We have placed source code for all C executables in the src/ directory,
+for users who wish to modify and recompile our programs.  For example, to
+recompile the eigenstrat program, type
+"cd src"
+"make eigenstrat"
+"mv eigenstrat ../bin"
+
+Note that some of our software will only compile if your system has the
+lapack package installed.  (This package is used to compute eigenvectors.)
+Some users may need to change "blas-3" to "blas" in the Makefile,
+depending on how blas and lapack are installed.
+
+If cc is not available on your system, try "cp Makefile.alt Makefile"
+and then recompile.
+
+If you have trouble compiling and running our code, try compiling and
+running the pcatoy program in the src directory:
+"cd src"
+"make pcatoy"
+"./pcatoy"
+If you are unable to run the pcatoy program successfully, please contact
+your system administrator for help, as this is a systems issue which is
+beyond our scope.  Your system administrator will be able to troubleshoot
+your systems issue using this trivial program.  [You can also try running
+the pcatoy program in the bin directory, which we have already compiled.]
+</help>
+</tool>
+