comparison find_intervals.xml @ 28:184d14e4270d

Update to Miller Lab devshed revision 4ede22dd5500
author Richard Burhans <burhans@bx.psu.edu>
date Wed, 17 Jul 2013 12:46:46 -0400
parents 8997f2ca8c7a
children a631c2f6d913
comparison
equal deleted inserted replaced
27:8997f2ca8c7a 28:184d14e4270d
82 <output name="output" file="test_out/find_intervals/find_intervals.interval" /> 82 <output name="output" file="test_out/find_intervals/find_intervals.interval" />
83 </test> 83 </test>
84 </tests> 84 </tests>
85 85
86 <help> 86 <help>
87
88 **Dataset formats** 87 **Dataset formats**
89 88
90 The input dataset is tabular_, with required columns of chromosome, position, 89 The input dataset is tabular_ (which includes gd_snp_ and gd_genotype_),
91 and score (in any column). 90 with required columns of chromosome, position, and score (in any column).
92 The output dataset is interval_. (`Dataset missing?`_) 91 The output dataset is interval_. (`Dataset missing?`_)
93 92
93 .. _tabular: ./static/formatHelp.html#tab
94 .. _gd_snp: ./static/formatHelp.html#gd_snp
95 .. _gd_genotype: ./static/formatHelp.html#gd_genotype
94 .. _interval: ./static/formatHelp.html#interval 96 .. _interval: ./static/formatHelp.html#interval
95 .. _tabular: ./static/formatHelp.html#tab
96 .. _Dataset missing?: ./static/formatHelp.html 97 .. _Dataset missing?: ./static/formatHelp.html
97 98
98 ----- 99 -----
99 100
100 **What it does** 101 **What it does**
101 102
102 The user selects a tabular dataset (such as a gd_snp dataset) and 103 The user selects a tabular dataset (such as the SNV formats gd_snp and
103 if the dataset is not also gd_snp format, specifies 104 gd_genotype) and if the dataset is not in an SNV format, specifies the
104 the columns containing chromosome, position, and scores (such as an Fst-value for the SNP). 105 columns containing chromosome, position, and scores (such as an FST-value
105 For gd_snp format the metadata can be used to specify the chromosome and 106 for the SNP). With SNV formats, the metadata tells which columns hold the
106 position. 107 chromosome and position. Other inputs include a percentage or raw score
107 Other inputs include 108 for the "score-shift" which should be greater than the average value
108 a percentage or raw score for the "score-shift" which should be greater than the 109 for the scores column. A higher value will give smaller intervals in
109 average value for the scores column. A higher value will give smaller intervals 110 the output. If a percentage (e.g. 95%) is specified then that percentile
110 in the output. 111 of the scores is used as the shift; percentile may not work well if many
111 If a percentage (e.g. 95%) is specified 112 rows or SNPs have the same score (in that case use a raw score).
112 then that percentile of the scores is used as the shift; 113
113 percentile may not work well if many rows or SNPs have the same score 114 The program subtracts the shift from every score, then finds genomic
114 (in that case use a raw score). The program subtracts the 115 intervals (i.e., consecutive runs of SNPs) whose total score cannot be
115 shift from every score, then finds genomic intervals (i.e., consecutive runs 116 increased by adding or subtracting one or more adjusted scores at the
116 of SNPs) whose total score cannot be increased by adding or subtracting one 117 ends of the interval. Another input is the number of times the data
117 or more adjusted scores at the ends of the interval. 118 should be randomized (only intervals with score exceeding the maximum
118 Another input is the number of times the 119 for the randomized data are reported). If 100 shuffles are requested,
119 data should be randomized (only intervals with score exceeding the maximum for 120 then any interval reported by the tool has a score with probability
120 the randomized data are reported). 121 less than 0.01 of being equaled or exceeded by chance, assuming that
121 If 100 shuffles are requested, then any interval reported by the tool has a 122 the scores vary independently by position.
122 score with probability less than 0.01 of being equaled or exceeded by chance.
123 123
124 ----- 124 -----
125 125
126 **Example** 126 **Example**
127 127
128 - input (gd_snp):: 128 - Input (showing only the chromosome, position, and score columns)::
129 129
130 Contig222_chr2_9817738_9818143 220 C T 888.0 chr2 9817960 C 17 0 2 78 12 0 2 63 20 0 2 87 8 0 2 51 11 0 2 60 12 0 2 63 Y 76 0.093 1 130 chr2 39 0.40
131 Contig47_chr2_25470778_25471576 126 G A 888.0 chr2 25470896 G 12 0 2 63 14 0 2 69 14 0 2 69 10 0 2 57 18 0 2 81 13 0 2 66 N 11 0.289 1 131 chr2 103 0.97
132 chr2 188 0.72
133 chr2 203 0.68
134 chr2 321 0.92
132 ... 135 ...
133 Contig115_chr2_61631913_61632510 310 G T 999.3 chr2 61632216 G 7 0 2 48 9 0 2 54 7 0 2 48 11 0 2 60 10 0 2 57 10 0 2 57 N 13 0.184 0 136 chr2 1132 0.85
134 Contig31_chr2_67331584_67331785 39 C T 999.0 chr2 67331623 C 11 0 2 60 10 0 2 57 7 0 2 48 9 0 2 54 2 0 2 33 4 0 2 39 N 110 0.647 1 137 chr2 1321 0.34
135 etc. 138 ...
136 139
137 - output not reporting individual positions:: 140 - Suppose the user-specified score-shift is 0.75. This value is subtracted from each score, giving::
138 141
139 chr2 9817960 67331624 1272.2000 142 chr2 39 -0.35
143 chr2 103 0.22
144 chr2 188 -0.03
145 chr2 203 -0.07
146 chr2 321 0.17
147 ...
148 chr2 1132 0.10
149 chr2 1321 -0.41
150 ...
140 151
152 - The output, not reporting individual positions, might be (depending on the values not shown above)::
153
154 chr2 103 1132 1.42
141 </help> 155 </help>
142 </tool> 156 </tool>
157
158