comparison snp_analysis_conversion/gd_snp2vcf.xml @ 0:3871157bc013

Initial upload to toolshed.g2 via UI.
author cathy
date Tue, 28 May 2013 17:01:14 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:3871157bc013
1 <tool id="gd_snp2vcf" name="gd_snp to VCF" version="1.0.0" force_history_refresh="True">
2 <description>: Convert from gd_snp to VCF format, for submission to dbSNP</description>
3
4 <command interpreter="perl">
5 gd_snp2vcf.pl "$input" -handle=$hand -batch=$batch -ref=$ref -metaOut=$output2
6 #if $individuals.choice == '0'
7 #set $geno = ''
8 #for $individual_col in $input.dataset.metadata.individual_columns
9 ##need to check to number of cols per individual
10 #set $t = $individual_col + 2
11 #set $geno += "%d," % ($t)
12 #end for
13 #if $individuals.pall_id != ''
14 -population=$individuals.pall_id
15 #end if
16 #else if $individuals.choice == '1'
17 #set $geno = ''
18 #set $pop = ''
19 #for $population in $individuals.populations
20 -geno=`perl -ane 'print \$F[0]+2, ",";' $population.p1_input`
21 #set $pop += "%s," % ($population.p1_id)
22 #end for
23 -population=$pop
24 #else if $individuals.choice == '2'
25 #set $geno = $individuals.geno
26 #end if
27 -geno=$geno
28 #if $bioproj.value != ''
29 -bioproj=$bioproj
30 #end if
31 #if $biosamp.value != ''
32 -biosamp=$biosamp
33 #end if
34 > $output
35 </command>
36
37 <inputs>
38 <param name="input" type="data" format="gd_snp" label="SNP dataset" />
39 <conditional name="individuals">
40 <param name="choice" type="select" label="Generate dataset for">
41 <option value="0" selected="true">All individuals</option>
42 <option value="1">Individuals in populations</option>
43 <option value="2">A single individual</option>
44 </param>
45 <when value="0">
46 <param name="pall_id" type="text" size="20" label="ID for this population" help="Leaving this blank will omit allele counts from the output" />
47 </when>
48 <when value="1">
49 <repeat name="populations" title="Population" min="1">
50 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" />
51 <param name="p1_id" type="text" size="20" label="ID for this population" help="Leaving this blank will omit allele counts from the output" />
52 </repeat>
53 </when>
54 <when value="2">
55 <param name="geno" type="data_column" data_ref="input" label="Column containing genotype" value="8" />
56 </when>
57 </conditional>
58 <param name="hand" type="text" size="20" label="dbSNP handle" help="If you do not have a handle, request one at http://www.ncbi.nlm.nih.gov/projects/SNP/handle.html" />
59 <param name="batch" type="text" size="20" label="Batch ID" help="ID used to tie dbSNP metadata to the VCF submission" />
60 <param name="ref" type="text" size="20" label="Reference sequence ID" help="The RefSeq assembly accession.version on which the SNP positions are based (see http://www.ncbi.nlm.nih.gov/assembly/)" />
61 <param name="bioproj" type="text" size="20" label="Optional: Registered BioProject ID" />
62 <param name="biosamp" type="text" size="20" label="Optional: Comma-separated list of registered BioSample IDs" />
63 </inputs>
64
65 <outputs>
66 <data name="output" format="vcf" />
67 <data name="output2" format="text" />
68 </outputs>
69
70 <tests>
71 <test>
72 <param name="input" value="sample.gd_snp" ftype="gd_snp" />
73 <param name="choice" value="2" />
74 <param name="geno" value="11" />
75 <param name="hand" value="MyHandle" />
76 <param name="batch" value="Test1" />
77 <param name="ref" value="pb_000001.1" />
78 <output name="output" file="snpsForSubmission.vcf" ftype="vcf" compare="diff" />
79 <output name="output2" file="snpsForSubmission.text" ftype="text" compare="diff" />
80 </test>
81 </tests>
82
83 <help>
84
85 **Dataset formats**
86
87 The input dataset is in gd_snp_ format.
88 The output consists of two datasets needed for submitting SNPs:
89 a VCF_ file in the specific format required by dbSNP, and a partially
90 completed text_ file for the associated dbSNP metadata.
91 (`Dataset missing?`_)
92
93 .. _gd_snp: ./static/formatHelp.html#gd_snp
94 .. _VCF: ./static/formatHelp.html#vcf
95 .. _text: ./static/formatHelp.html#text
96 .. _Dataset missing?: ./static/formatHelp.html
97
98 -----
99
100 **What it does**
101
102 This tool converts a dataset in gd_snp format to a VCF file formatted
103 for submission to the dbSNP database at NCBI. It also creates a partially
104 filled-in template to assist you in preparing the required "metadata" file
105 describing the SNP submission.
106
107 -----
108
109 **Example**
110
111 - input::
112
113 #{"column_names":["scaf","pos","A","B","qual","ref","rpos","rnuc","1A","1B","1G","1Q","2A","2B","2G","2Q","3A","3B","3G","3Q","4A","4B","4G","4Q","5A","5B","5G","5Q","6A","6B","6G","6Q","pair","dist",
114 #"prim","rflp"],"dbkey":"canFam2","individuals":[["PB1",9],["PB2",13],["PB3",17],["PB4",21],["PB6",25],["PB8",29]],"pos":2,"rPos":7,"ref":6,"scaffold":1,"species":"bear"}
115 Contig161 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0
116 Contig48 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0
117 Contig20 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0
118 etc.
119
120 - VCF output (for all individuals, and giving a population ID)::
121
122 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PB
123 Contig161 115 Contig161;115 C T 73.5 . VRT=6 NA:AC 8:0
124 Contig48 11 Contig48;11 A G 94.3 . VRT=6 NA:AC 8:0
125 Contig 66 Contig20;66 C T 54.0 . VRT=6 NA:AC 8:0
126 etc.
127
128 Note: This excerpt from the output does not show all of the headers. Also,
129 if the population ID had not been given, then the last two columns would not
130 appear in the output.
131
132 -----
133
134 **Reference**
135
136 Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K.
137 dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001
138 Jan 1;29(1):308-11.
139
140 </help>
141 </tool>