Mercurial > repos > miller-lab > snp_analysis_conversion
comparison snp_analysis_conversion/gd_snp2vcf.xml @ 0:3871157bc013
Initial upload to toolshed.g2 via UI.
author | cathy |
---|---|
date | Tue, 28 May 2013 17:01:14 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:3871157bc013 |
---|---|
1 <tool id="gd_snp2vcf" name="gd_snp to VCF" version="1.0.0" force_history_refresh="True"> | |
2 <description>: Convert from gd_snp to VCF format, for submission to dbSNP</description> | |
3 | |
4 <command interpreter="perl"> | |
5 gd_snp2vcf.pl "$input" -handle=$hand -batch=$batch -ref=$ref -metaOut=$output2 | |
6 #if $individuals.choice == '0' | |
7 #set $geno = '' | |
8 #for $individual_col in $input.dataset.metadata.individual_columns | |
9 ##need to check to number of cols per individual | |
10 #set $t = $individual_col + 2 | |
11 #set $geno += "%d," % ($t) | |
12 #end for | |
13 #if $individuals.pall_id != '' | |
14 -population=$individuals.pall_id | |
15 #end if | |
16 #else if $individuals.choice == '1' | |
17 #set $geno = '' | |
18 #set $pop = '' | |
19 #for $population in $individuals.populations | |
20 -geno=`perl -ane 'print \$F[0]+2, ",";' $population.p1_input` | |
21 #set $pop += "%s," % ($population.p1_id) | |
22 #end for | |
23 -population=$pop | |
24 #else if $individuals.choice == '2' | |
25 #set $geno = $individuals.geno | |
26 #end if | |
27 -geno=$geno | |
28 #if $bioproj.value != '' | |
29 -bioproj=$bioproj | |
30 #end if | |
31 #if $biosamp.value != '' | |
32 -biosamp=$biosamp | |
33 #end if | |
34 > $output | |
35 </command> | |
36 | |
37 <inputs> | |
38 <param name="input" type="data" format="gd_snp" label="SNP dataset" /> | |
39 <conditional name="individuals"> | |
40 <param name="choice" type="select" label="Generate dataset for"> | |
41 <option value="0" selected="true">All individuals</option> | |
42 <option value="1">Individuals in populations</option> | |
43 <option value="2">A single individual</option> | |
44 </param> | |
45 <when value="0"> | |
46 <param name="pall_id" type="text" size="20" label="ID for this population" help="Leaving this blank will omit allele counts from the output" /> | |
47 </when> | |
48 <when value="1"> | |
49 <repeat name="populations" title="Population" min="1"> | |
50 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" /> | |
51 <param name="p1_id" type="text" size="20" label="ID for this population" help="Leaving this blank will omit allele counts from the output" /> | |
52 </repeat> | |
53 </when> | |
54 <when value="2"> | |
55 <param name="geno" type="data_column" data_ref="input" label="Column containing genotype" value="8" /> | |
56 </when> | |
57 </conditional> | |
58 <param name="hand" type="text" size="20" label="dbSNP handle" help="If you do not have a handle, request one at http://www.ncbi.nlm.nih.gov/projects/SNP/handle.html" /> | |
59 <param name="batch" type="text" size="20" label="Batch ID" help="ID used to tie dbSNP metadata to the VCF submission" /> | |
60 <param name="ref" type="text" size="20" label="Reference sequence ID" help="The RefSeq assembly accession.version on which the SNP positions are based (see http://www.ncbi.nlm.nih.gov/assembly/)" /> | |
61 <param name="bioproj" type="text" size="20" label="Optional: Registered BioProject ID" /> | |
62 <param name="biosamp" type="text" size="20" label="Optional: Comma-separated list of registered BioSample IDs" /> | |
63 </inputs> | |
64 | |
65 <outputs> | |
66 <data name="output" format="vcf" /> | |
67 <data name="output2" format="text" /> | |
68 </outputs> | |
69 | |
70 <tests> | |
71 <test> | |
72 <param name="input" value="sample.gd_snp" ftype="gd_snp" /> | |
73 <param name="choice" value="2" /> | |
74 <param name="geno" value="11" /> | |
75 <param name="hand" value="MyHandle" /> | |
76 <param name="batch" value="Test1" /> | |
77 <param name="ref" value="pb_000001.1" /> | |
78 <output name="output" file="snpsForSubmission.vcf" ftype="vcf" compare="diff" /> | |
79 <output name="output2" file="snpsForSubmission.text" ftype="text" compare="diff" /> | |
80 </test> | |
81 </tests> | |
82 | |
83 <help> | |
84 | |
85 **Dataset formats** | |
86 | |
87 The input dataset is in gd_snp_ format. | |
88 The output consists of two datasets needed for submitting SNPs: | |
89 a VCF_ file in the specific format required by dbSNP, and a partially | |
90 completed text_ file for the associated dbSNP metadata. | |
91 (`Dataset missing?`_) | |
92 | |
93 .. _gd_snp: ./static/formatHelp.html#gd_snp | |
94 .. _VCF: ./static/formatHelp.html#vcf | |
95 .. _text: ./static/formatHelp.html#text | |
96 .. _Dataset missing?: ./static/formatHelp.html | |
97 | |
98 ----- | |
99 | |
100 **What it does** | |
101 | |
102 This tool converts a dataset in gd_snp format to a VCF file formatted | |
103 for submission to the dbSNP database at NCBI. It also creates a partially | |
104 filled-in template to assist you in preparing the required "metadata" file | |
105 describing the SNP submission. | |
106 | |
107 ----- | |
108 | |
109 **Example** | |
110 | |
111 - input:: | |
112 | |
113 #{"column_names":["scaf","pos","A","B","qual","ref","rpos","rnuc","1A","1B","1G","1Q","2A","2B","2G","2Q","3A","3B","3G","3Q","4A","4B","4G","4Q","5A","5B","5G","5Q","6A","6B","6G","6Q","pair","dist", | |
114 #"prim","rflp"],"dbkey":"canFam2","individuals":[["PB1",9],["PB2",13],["PB3",17],["PB4",21],["PB6",25],["PB8",29]],"pos":2,"rPos":7,"ref":6,"scaffold":1,"species":"bear"} | |
115 Contig161 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 | |
116 Contig48 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 | |
117 Contig20 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 | |
118 etc. | |
119 | |
120 - VCF output (for all individuals, and giving a population ID):: | |
121 | |
122 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PB | |
123 Contig161 115 Contig161;115 C T 73.5 . VRT=6 NA:AC 8:0 | |
124 Contig48 11 Contig48;11 A G 94.3 . VRT=6 NA:AC 8:0 | |
125 Contig 66 Contig20;66 C T 54.0 . VRT=6 NA:AC 8:0 | |
126 etc. | |
127 | |
128 Note: This excerpt from the output does not show all of the headers. Also, | |
129 if the population ID had not been given, then the last two columns would not | |
130 appear in the output. | |
131 | |
132 ----- | |
133 | |
134 **Reference** | |
135 | |
136 Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. | |
137 dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 | |
138 Jan 1;29(1):308-11. | |
139 | |
140 </help> | |
141 </tool> |