Mercurial > repos > miller-lab > genome_diversity
comparison find_intervals.xml @ 28:184d14e4270d
Update to Miller Lab devshed revision 4ede22dd5500
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Wed, 17 Jul 2013 12:46:46 -0400 |
parents | 8997f2ca8c7a |
children | a631c2f6d913 |
comparison
equal
deleted
inserted
replaced
27:8997f2ca8c7a | 28:184d14e4270d |
---|---|
82 <output name="output" file="test_out/find_intervals/find_intervals.interval" /> | 82 <output name="output" file="test_out/find_intervals/find_intervals.interval" /> |
83 </test> | 83 </test> |
84 </tests> | 84 </tests> |
85 | 85 |
86 <help> | 86 <help> |
87 | |
88 **Dataset formats** | 87 **Dataset formats** |
89 | 88 |
90 The input dataset is tabular_, with required columns of chromosome, position, | 89 The input dataset is tabular_ (which includes gd_snp_ and gd_genotype_), |
91 and score (in any column). | 90 with required columns of chromosome, position, and score (in any column). |
92 The output dataset is interval_. (`Dataset missing?`_) | 91 The output dataset is interval_. (`Dataset missing?`_) |
93 | 92 |
93 .. _tabular: ./static/formatHelp.html#tab | |
94 .. _gd_snp: ./static/formatHelp.html#gd_snp | |
95 .. _gd_genotype: ./static/formatHelp.html#gd_genotype | |
94 .. _interval: ./static/formatHelp.html#interval | 96 .. _interval: ./static/formatHelp.html#interval |
95 .. _tabular: ./static/formatHelp.html#tab | |
96 .. _Dataset missing?: ./static/formatHelp.html | 97 .. _Dataset missing?: ./static/formatHelp.html |
97 | 98 |
98 ----- | 99 ----- |
99 | 100 |
100 **What it does** | 101 **What it does** |
101 | 102 |
102 The user selects a tabular dataset (such as a gd_snp dataset) and | 103 The user selects a tabular dataset (such as the SNV formats gd_snp and |
103 if the dataset is not also gd_snp format, specifies | 104 gd_genotype) and if the dataset is not in an SNV format, specifies the |
104 the columns containing chromosome, position, and scores (such as an Fst-value for the SNP). | 105 columns containing chromosome, position, and scores (such as an FST-value |
105 For gd_snp format the metadata can be used to specify the chromosome and | 106 for the SNP). With SNV formats, the metadata tells which columns hold the |
106 position. | 107 chromosome and position. Other inputs include a percentage or raw score |
107 Other inputs include | 108 for the "score-shift" which should be greater than the average value |
108 a percentage or raw score for the "score-shift" which should be greater than the | 109 for the scores column. A higher value will give smaller intervals in |
109 average value for the scores column. A higher value will give smaller intervals | 110 the output. If a percentage (e.g. 95%) is specified then that percentile |
110 in the output. | 111 of the scores is used as the shift; percentile may not work well if many |
111 If a percentage (e.g. 95%) is specified | 112 rows or SNPs have the same score (in that case use a raw score). |
112 then that percentile of the scores is used as the shift; | 113 |
113 percentile may not work well if many rows or SNPs have the same score | 114 The program subtracts the shift from every score, then finds genomic |
114 (in that case use a raw score). The program subtracts the | 115 intervals (i.e., consecutive runs of SNPs) whose total score cannot be |
115 shift from every score, then finds genomic intervals (i.e., consecutive runs | 116 increased by adding or subtracting one or more adjusted scores at the |
116 of SNPs) whose total score cannot be increased by adding or subtracting one | 117 ends of the interval. Another input is the number of times the data |
117 or more adjusted scores at the ends of the interval. | 118 should be randomized (only intervals with score exceeding the maximum |
118 Another input is the number of times the | 119 for the randomized data are reported). If 100 shuffles are requested, |
119 data should be randomized (only intervals with score exceeding the maximum for | 120 then any interval reported by the tool has a score with probability |
120 the randomized data are reported). | 121 less than 0.01 of being equaled or exceeded by chance, assuming that |
121 If 100 shuffles are requested, then any interval reported by the tool has a | 122 the scores vary independently by position. |
122 score with probability less than 0.01 of being equaled or exceeded by chance. | |
123 | 123 |
124 ----- | 124 ----- |
125 | 125 |
126 **Example** | 126 **Example** |
127 | 127 |
128 - input (gd_snp):: | 128 - Input (showing only the chromosome, position, and score columns):: |
129 | 129 |
130 Contig222_chr2_9817738_9818143 220 C T 888.0 chr2 9817960 C 17 0 2 78 12 0 2 63 20 0 2 87 8 0 2 51 11 0 2 60 12 0 2 63 Y 76 0.093 1 | 130 chr2 39 0.40 |
131 Contig47_chr2_25470778_25471576 126 G A 888.0 chr2 25470896 G 12 0 2 63 14 0 2 69 14 0 2 69 10 0 2 57 18 0 2 81 13 0 2 66 N 11 0.289 1 | 131 chr2 103 0.97 |
132 chr2 188 0.72 | |
133 chr2 203 0.68 | |
134 chr2 321 0.92 | |
132 ... | 135 ... |
133 Contig115_chr2_61631913_61632510 310 G T 999.3 chr2 61632216 G 7 0 2 48 9 0 2 54 7 0 2 48 11 0 2 60 10 0 2 57 10 0 2 57 N 13 0.184 0 | 136 chr2 1132 0.85 |
134 Contig31_chr2_67331584_67331785 39 C T 999.0 chr2 67331623 C 11 0 2 60 10 0 2 57 7 0 2 48 9 0 2 54 2 0 2 33 4 0 2 39 N 110 0.647 1 | 137 chr2 1321 0.34 |
135 etc. | 138 ... |
136 | 139 |
137 - output not reporting individual positions:: | 140 - Suppose the user-specified score-shift is 0.75. This value is subtracted from each score, giving:: |
138 | 141 |
139 chr2 9817960 67331624 1272.2000 | 142 chr2 39 -0.35 |
143 chr2 103 0.22 | |
144 chr2 188 -0.03 | |
145 chr2 203 -0.07 | |
146 chr2 321 0.17 | |
147 ... | |
148 chr2 1132 0.10 | |
149 chr2 1321 -0.41 | |
150 ... | |
140 | 151 |
152 - The output, not reporting individual positions, might be (depending on the values not shown above):: | |
153 | |
154 chr2 103 1132 1.42 | |
141 </help> | 155 </help> |
142 </tool> | 156 </tool> |
157 | |
158 |