diff find_intervals.xml @ 28:184d14e4270d

Update to Miller Lab devshed revision 4ede22dd5500
author Richard Burhans <burhans@bx.psu.edu>
date Wed, 17 Jul 2013 12:46:46 -0400
parents 8997f2ca8c7a
children a631c2f6d913
line wrap: on
line diff
--- a/find_intervals.xml	Mon Jul 15 10:47:35 2013 -0400
+++ b/find_intervals.xml	Wed Jul 17 12:46:46 2013 -0400
@@ -84,59 +84,75 @@
   </tests>
 
   <help>
-
 **Dataset formats**
 
-The input dataset is tabular_, with required columns of chromosome, position,
-and score (in any column).
-The output dataset is interval_.  (`Dataset missing?`_)
+The input dataset is tabular_ (which includes gd_snp_ and gd_genotype_),
+with required columns of chromosome, position, and score (in any column).
+The output dataset is interval_. (`Dataset missing?`_)
 
+.. _tabular: ./static/formatHelp.html#tab
+.. _gd_snp: ./static/formatHelp.html#gd_snp
+.. _gd_genotype: ./static/formatHelp.html#gd_genotype
 .. _interval: ./static/formatHelp.html#interval
-.. _tabular: ./static/formatHelp.html#tab
 .. _Dataset missing?: ./static/formatHelp.html
 
 -----
 
 **What it does**
 
-The user selects a tabular dataset (such as a gd_snp dataset) and 
-if the dataset is not also gd_snp format, specifies 
-the columns containing chromosome, position, and scores (such as an Fst-value for the SNP). 
-For gd_snp format the metadata can be used to specify the chromosome and 
-position.
-Other inputs include
-a percentage or raw score for the "score-shift" which should be greater than the 
-average value for the scores column.  A higher value will give smaller intervals
-in the output.
-If a percentage (e.g. 95%) is specified
-then that percentile of the scores is used as the shift; 
-percentile may not work well if many rows or SNPs have the same score
-(in that case use a raw score).  The program subtracts the
-shift from every score, then finds genomic intervals (i.e., consecutive runs
-of SNPs) whose total score cannot be increased by adding or subtracting one
-or more adjusted scores at the ends of the interval.
-Another input is the number of times the
-data should be randomized (only intervals with score exceeding the maximum for
-the randomized data are reported).  
-If 100 shuffles are requested, then any interval reported by the tool has a 
-score with probability less than 0.01 of being equaled or exceeded by chance.
+The user selects a tabular dataset (such as the SNV formats gd_snp and
+gd_genotype) and if the dataset is not in an SNV format, specifies the
+columns containing chromosome, position, and scores (such as an FST-value
+for the SNP).  With SNV formats, the metadata tells which columns hold the
+chromosome and position.  Other inputs include a percentage or raw score
+for the "score-shift" which should be greater than the average value
+for the scores column.  A higher value will give smaller intervals in
+the output.  If a percentage (e.g. 95%) is specified then that percentile
+of the scores is used as the shift; percentile may not work well if many
+rows or SNPs have the same score (in that case use a raw score).
+
+The program subtracts the shift from every score, then finds genomic
+intervals (i.e., consecutive runs of SNPs) whose total score cannot be
+increased by adding or subtracting one or more adjusted scores at the
+ends of the interval.  Another input is the number of times the data
+should be randomized (only intervals with score exceeding the maximum
+for the randomized data are reported).  If 100 shuffles are requested,
+then any interval reported by the tool has a score with probability
+less than 0.01 of being equaled or exceeded by chance, assuming that
+the scores vary independently by position.
 
 -----
 
 **Example**
 
-- input (gd_snp)::
+- Input (showing only the chromosome, position, and score columns)::
 
-    Contig222_chr2_9817738_9818143   220     C       T       888.0   chr2    9817960         C       17      0       2       78      12      0       2       63      20      0       2       87      8       0       2       51      11      0       2       60      12      0       2       63      Y       76      0.093   1
-    Contig47_chr2_25470778_25471576  126     G       A       888.0   chr2    25470896        G       12      0       2       63      14      0       2       69      14      0       2       69      10      0       2       57      18      0       2       81      13      0       2       66      N       11      0.289   1
+    chr2      39      0.40
+    chr2     103      0.97
+    chr2     188      0.72
+    chr2     203      0.68
+    chr2     321      0.92
+    ...
+    chr2    1132      0.85
+    chr2    1321      0.34
     ...
-    Contig115_chr2_61631913_61632510 310     G       T       999.3   chr2    61632216        G       7       0       2       48      9       0       2       54      7       0       2       48      11      0       2       60      10      0       2       57      10      0       2       57      N       13      0.184   0
-    Contig31_chr2_67331584_67331785  39      C       T       999.0   chr2    67331623        C       11      0       2       60      10      0       2       57      7       0       2       48      9       0       2       54      2       0       2       33      4       0       2       39      N       110     0.647   1
-    etc.
+
+- Suppose the user-specified score-shift is 0.75.  This value is subtracted from each score, giving::
 
-- output not reporting individual positions::
+    chr2      39     -0.35
+    chr2     103      0.22
+    chr2     188     -0.03
+    chr2     203     -0.07
+    chr2     321      0.17
+    ...
+    chr2    1132      0.10
+    chr2    1321     -0.41
+    ...
 
-    chr2    9817960 67331624        1272.2000
+- The output, not reporting individual positions, might be (depending on the values not shown above)::
 
+    chr2    103    1132    1.42
   </help>
 </tool>
+
+