annotate SNP_Mapping.xml @ 5:fb735600b4ef

Uploaded
author gregory-minevich
date Tue, 27 Mar 2012 11:30:05 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
1 <tool id="snp_mapping_using_wgs" name="CloudMap: SNP mapping with WGS data">
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
2 <description>Map a mutation by plotting recombination frequencies resulting from crossing to a highly polymorphic strain</description>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
3 <command interpreter="python">SNP_Mapping.py --sample_pileup $sample_pileup --haw_vcf $haw_vcf --loess_span $loess_span --d_yaxis $d_yaxis --h_yaxis $h_yaxis --points_color $points_color --loess_color $loess_color --output $output --location_plot_output $location_plot_output --standardize $standardize</command>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
4 <inputs>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
5 <param name="sample_pileup" size = "125" type="data" format="pileup" label="WGS Mutant Pileup File" help="WGS pileup file from pooled F2 mutants that have been crossed to a mapping strain. The pileup should contain data from only mapping strain (e.g. Hawaiian) SNP positions" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
6 <param name="haw_vcf" size = "125" type="data" format="vcf" label="VCF of mapping strain (e.g. Hawaiian) SNPs" help="A VCF reference file that contains mapping strain SNP positions and reference base pairs at each position"/>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
7 <param name="loess_span" size = "15" type="float" value=".01" label="Loess span" help="Parameter that controls the degree of data smoothing."/>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
8 <param name="d_yaxis" size = "15" type="float" value=".7" label="Y-axis upper limit for dot plot" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
9 <param name="h_yaxis" size = "15" type="integer" value="500" label="Y-axis upper limit for histogram plot" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
10 <param name="points_color" size = "15" type="text" value="gray27" label="Color for data points" help="See below for list of supported colors"/>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
11 <param name="loess_color" size = "15" type="text" value="red" label="Color for loess regression line" help="See below for list of supported colors"/>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
12 <param name="standardize" type="boolean" truevalue="true" falsevalue="false" checked="true" label="Standardize X-axis" help="Dot plots and histogram plots from separate chromosomes will have uniform X-axis spacing for comparison"/>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
13 </inputs>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
14 <outputs>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
15 <data name="output" type="text" format="tabular" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
16 <data name="location_plot_output" format="pdf" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
17 </outputs>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
18 <requirements>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
19 <requirement type="python-module">sys</requirement>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
20 <requirement type="python-module">optparse</requirement>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
21 <requirement type="python-module">csv</requirement>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
22 <requirement type="python-module">re</requirement>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
23 <requirement type="python-module">decimal</requirement>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
24 <requirement type="python-module">rpy</requirement>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
25 </requirements>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
26 <tests>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
27 <param name="sample_pileup" value="" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
28 <param name="haw_vcf" value="" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
29 <output name="output" file="" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
30 <output name="plot_output" file="" />
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
31 </tests>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
32 <help>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
33 **What it does:**
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
34
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
35 This tool is part of the CloudMap pipeline for analysis of mutant genome sequences. For further details, please see `Gregory Minevich, Danny Park, Richard J. Poole and Oliver Hobert. CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences. (2012 In Preparation)`__
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
36
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
37 .. __: http://biochemistry.hs.columbia.edu/labs/hobert/literature.html
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
38
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
39 This tool improves upon the method described in Doitsidou et al., PLoS One 2010 for mapping causal mutations using whole genome sequencing data.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
40
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
41 Sample output for a linked chromosome:
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
42
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
43 .. image:: http://biochemistry.hs.columbia.edu/labs/hobert/CloudMap/Linked_LG_500px.png
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
44
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
45
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
46 The polymorphic Hawaiian strain CB4856 is used as a mapping strain in most cases but in principle any sequenced nematode strain that is significantly different from the mutant strain can be used for mapping. The tool plots the ratio of mapping strain (Hawaiian)/mutant strain (N2) nucleotides at all SNP positions, reflecting the number of recombinants in the sequenced pool of animals. Chromosomes which contain regions of linkage to the causal mutation will have regions where the ratio of mapping strain (Hawaiian)/total reads will be equal to 0. The scatter plots for such linked regions will have a high number of data points lying exactly on the X axis. A loess regression line is plotted through all the points on a given chromosome giving further accuracy to the linked region.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
47
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
48 Each scatter plot has a corresponding frequency plot that displays regions of linked chromosome where 0 ratio SNP positions are concentrated. 1Mb bins for the 0 ratio SNP positions are colored gray by default and .5Mb bins are colored in red.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
49
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
50
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
51 The experimental design required to generate data for the plots is described in Doitsidou et al., PLoS One 2010 Figure 1:
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
52
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
53 .. image:: http://biochemistry.hs.columbia.edu/labs/hobert/CloudMap/Doitsidou_2010_PLoS_Fig.1.png
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
54
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
55
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
56 ------
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
57
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
58 **Input:**
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
59
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
60
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
61 The input pileup files are generated by the SAMTools mpileup tool. Default SAMTools mpileup (and Samtools filter pileup) parameters for mapping quality, base quality and coverage at each SNP position typically yield good results, though users may experiment with filtering SNP data by adjusting these parameters. In our testing, low threshold filtering on base pair quality has been useful in improving accuracy of plots while high threshold filtering on coverage has skewed plot accuracy.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
62
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
63 This tool requires a pileup that has been created at each SNP position using SAMTools mpileup (http://samtools.sourceforge.net/samtools.shtml) and a BED file of all Hawaiian SNP positions. Download Hawaiian SNP positions BED file here:
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
64 http://biochemistry.hs.columbia.edu/labs/hobert/protocols.html
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
65
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
66 The required VCF of mapping strain (e.g. Hawaiian) SNPs is a reference file that contains mapping strain SNP positions and reference base pairs at each position.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
67 (download Hawaiian SNPs VCF from: http://biochemistry.hs.columbia.edu/labs/hobert/protocols.html). You may also make your own VCF of SNP positions following the steps described in the CloudMAP paper.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
68
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
69
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
70 **Output:**
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
71
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
72 The tool also provides a tabular output file that contains a count of the number of reference and alternate SNPs at each mapping strain SNP position as well as the ratio of reference/alternate SNPs. The position of each mapping strain SNP in map units and physical coordinates is also provided in the output file.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
73
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
74
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
75 ------
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
76
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
77 **Settings:**
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
78
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
79 .. class:: infomark
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
80
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
81 Information on loess regression and the loess span parameter:
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
82 http://en.wikipedia.org/wiki/Local_regression
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
83
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
84 .. class:: infomark
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
85
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
86 Based on our testing, we've settled on .01 as a loess span default. Larger values result in smoothing of the line to reflect trends at a more macro level. Smaller values result in loess lines that more closely reflect local data fluctuations. Users looking at chromosome subregions will want to increase the loess span.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
87
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
88 .. class:: infomark
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
89
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
90 Supported colors for data points and loess regression line:
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
91
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
92 http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
93
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
94 http://research.stowers-institute.org/efg/R/Color/Chart/ColorChart.pdf
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
95
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
96
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
97
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
98 .. class:: warningmark
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
99
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
100 This tool requires that the statistical programming environment R has been installed on the system hosting Galaxy (http://www.r-project.org/). If you are accessing this tool on Galaxy via the Cloud, this does not apply to you.
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
101
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
102
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
103 ------
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
104
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
105 **Citation:**
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
106
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
107 This tool is part of the CloudMap package from the Hobert Lab. If you use this tool, please cite `Gregory Minevich, Danny Park, Richard J. Poole and Oliver Hobert. CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences. (2012 In Preparation)`__
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
108
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
109 .. __: http://biochemistry.hs.columbia.edu/labs/hobert/literature.html
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
110
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
111 Correspondence to gm2123@columbia.edu (G.M.) or or38@columbia.edu (O.H.)
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
112
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
113 </help>
fb735600b4ef Uploaded
gregory-minevich
parents:
diff changeset
114 </tool>