comparison annotatePeaks.xml @ 28:f0b5827b6051 draft default tip

Uploaded
author kevyin
date Thu, 20 Dec 2012 18:28:03 -0500
parents
children
comparison
equal deleted inserted replaced
27:d27851e0cbbd 28:f0b5827b6051
1 <tool id="homer_annotatePeaks" name="homer_annotatePeaks" version="0.0.5">
2 <requirements>
3 <requirement type="package" version="4.1">homer</requirement>
4 </requirements>
5 <description></description>
6 <!--<version_command></version_command>-->
7 <command>
8 annotatePeaks.pl $input_bed $genome_selector 1&gt; $out_annotated
9 2&gt; $out_log || echo "Error running annotatePeaks." >&amp;2
10 </command>
11 <inputs>
12 <param format="tabular,bed" name="input_bed" type="data" label="Homer peaks OR BED format"/>
13 <param name="genome_selector" type="select" label="Genome version">
14 <option value="hg19" selected="true">hg19</option>
15 </param>
16 <param type="text" name="options" label="Extra options" value="" help="See link below for more options">
17 <sanitizer>
18 <valid initial="string.printable">
19 <remove value="&apos;"/>
20 <remove value="/"/>
21 </valid>
22 <mapping initial="none">
23 <add source="&apos;" target="__sq__"/>
24 </mapping>
25 </sanitizer>
26 </param>
27 </inputs>
28 <outputs>
29 <!--<data format="html" name="html_outfile" label="index" />-->
30 <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />-->
31 <data format="csv" name="out_annotated" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}" />
32 <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}.log" />
33 </outputs>
34 <tests>
35 <test>
36 <!--<param name="input_file" value="extract_genomic_dna.fa" />-->
37 <!--<output name="html_file" file="sample_output.html" ftype="html" />-->
38 </test>
39 </tests>
40
41 <help>
42
43 .. class:: infomark
44
45 **Homer annoatePeaks**
46
47 More information on accepted formats and options
48
49 http://biowhat.ucsd.edu/homer/ngs/annotation.html
50
51 TIP: use homer_bed2pos and homer_pos2bed to convert between the homer peak positions and the BED format.
52
53 **Parameter list**
54
55 Command line options (not all of them are supported)::
56
57 Usage: annotatePeaks.pl &lt;peak file | tss&gt; &lt;genome version&gt; [additional options...]
58
59 Available Genomes (required argument): (name,org,directory,default promoter set)
60 -- or --
61 Custom: provide the path to genome FASTA files (directory or single file)
62
63 User defined annotation files (default is UCSC refGene annotation):
64 annotatePeaks.pl accepts GTF (gene transfer formatted) files to annotate positions relative
65 to custom annotations, such as those from de novo transcript discovery or Gencode.
66 -gtf &lt;gtf format file&gt; (-gff and -gff3 can work for those files, but GTF is better)
67
68 Peak vs. tss/tts/rna mode (works with custom GTF file):
69 If the first argument is &quot;tss&quot; (i.e. annotatePeaks.pl tss hg18 ...) then a TSS centric
70 analysis will be carried out. Tag counts and motifs will be found relative to the TSS.
71 (no position file needed) [&quot;tts&quot; now works too - e.g. 3&apos; end of gene]
72 [&quot;rna&quot; specifies gene bodies, will automaticall set &quot;-size given&quot;]
73 NOTE: The default TSS peak size is 4000 bp, i.e. +/- 2kb (change with -size option)
74 -list &lt;gene id list&gt; (subset of genes to perform analysis [unigene, gene id, accession,
75 probe, etc.], default = all promoters)
76 -cTSS &lt;promoter position file i.e. peak file&gt; (should be centered on TSS)
77
78 Primary Annotation Options:
79 -mask (Masked repeats, can also add &apos;r&apos; to end of genome name)
80 -m &lt;motif file 1&gt; [motif file 2] ... (list of motifs to find in peaks)
81 -mscore (reports the highest log-odds score within the peak)
82 -nmotifs (reports the number of motifs per peak)
83 -mdist (reports distance to closest motif)
84 -mfasta &lt;filename&gt; (reports sites in a fasta file - for building new motifs)
85 -fm &lt;motif file 1&gt; [motif file 2] (list of motifs to filter from above)
86 -rmrevopp &lt;#&gt; (only count sites found within &lt;#&gt; on both strands once, i.e. palindromic)
87 -matrix &lt;prefix&gt; (outputs a motif co-occurrence files:
88 prefix.count.matrix.txt - number of peaks with motif co-occurrence
89 prefix.ratio.matrix.txt - ratio of observed vs. expected co-occurrence
90 prefix.logPvalue.matrix.txt - co-occurrence enrichment
91 prefix.stats.txt - table of pair-wise motif co-occurrence statistics
92 additional options:
93 -matrixMinDist &lt;#&gt; (minimum distance between motif pairs - to avoid overlap)
94 -matrixMaxDist &lt;#&gt; (maximum distance between motif pairs)
95 -mbed &lt;filename&gt; (Output motif positions to a BED file to load at UCSC (or -mpeak))
96 -mlogic &lt;filename&gt; (will output stats on common motif orientations)
97 -d &lt;tag directory 1&gt; [tag directory 2] ... (list of experiment directories to show
98 tag counts for) NOTE: -dfile &lt;file&gt; where file is a list of directories in first column
99 -bedGraph &lt;bedGraph file 1&gt; [bedGraph file 2] ... (read coverage counts from bedGraph files)
100 -wig &lt;wiggle file 1&gt; [wiggle file 2] ... (read coverage counts from wiggle files)
101 -p &lt;peak file&gt; [peak file 2] ... (to find nearest peaks)
102 -pdist to report only distance (-pdist2 gives directional distance)
103 -pcount to report number of peaks within region
104 -vcf &lt;VCF file&gt; (annotate peaks with genetic variation infomation, one col per individual)
105 -editDistance (Computes the # bp changes relative to reference)
106 -individuals &lt;name1&gt; [name2] ... (restrict analysis to these individuals)
107 -gene &lt;data file&gt; ... (Adds additional data to result based on the closest gene.
108 This is useful for adding gene expression data. The file must have a header,
109 and the first column must be a GeneID, Accession number, etc. If the peak
110 cannot be mapped to data in the file then the entry will be left empty.
111 -go &lt;output directory&gt; (perform GO analysis using genes near peaks)
112 -genomeOntology &lt;output directory&gt; (perform genomeOntology analysis on peaks)
113 -gsize &lt;#&gt; (Genome size for genomeOntology analysis, default: 2e9)
114
115 Annotation vs. Histogram mode:
116 -hist &lt;bin size in bp&gt; (i.e 1, 2, 5, 10, 20, 50, 100 etc.)
117 The -hist option can be used to generate histograms of position dependent features relative
118 to the center of peaks. This is primarily meant to be used with -d and -m options to map
119 distribution of motifs and ChIP-Seq tags. For ChIP-Seq peaks for a Transcription factor
120 you might want to use the -center option (below) to center peaks on the known motif
121 ** If using &quot;-size given&quot;, histogram will be scaled to each region (i.e. 0-100%), with
122 the -hist parameter being the number of bins to divide each region into.
123 Histogram Mode specific Options:
124 -nuc (calculated mononucleotide frequencies at each position,
125 Will report by default if extracting sequence for other purposes like motifs)
126 -di (calculated dinucleotide frequencies at each position)
127 -histNorm &lt;#&gt; (normalize the total tag count for each region to 1, where &lt;#&gt; is the
128 minimum tag total per region - use to avoid tag spikes from low coverage
129 -ghist (outputs profiles for each gene, for peak shape clustering)
130 -rm &lt;#&gt; (remove occurrences of same motif that occur within # bp)
131
132 Peak Centering: (other options are ignored)
133 -center &lt;motif file&gt; (This will re-center peaks on the specified motif, or remove peak
134 if there is no motif in the peak. ONLY recentering will be performed, and all other
135 options will be ignored. This will output a new peak file that can then be reanalyzed
136 to reveal fine-grain structure in peaks (It is advised to use -size &lt; 200) with this
137 to keep peaks from moving too far (-mirror flips the position)
138 -multi (returns genomic positions of all sites instead of just the closest to center)
139
140 Advanced Options:
141 -len &lt;#&gt; / -fragLength &lt;#&gt; (Fragment length, default=auto, might want to set to 0 for RNA)
142 -size &lt;#&gt; (Peak size[from center of peak], default=inferred from peak file)
143 -size #,# (i.e. -size -10,50 count tags from -10 bp to +50 bp from center)
144 -size &quot;given&quot; (count tags etc. using the actual regions - for variable length regions)
145 -log (output tag counts as log2(x+1+rand) values - for scatter plots)
146 -sqrt (output tag counts as sqrt(x+rand) values - for scatter plots)
147 -strand &lt;+|-|both&gt; (Count tags on specific strands relative to peak, default: both)
148 -pc &lt;#&gt; (maximum number of tags to count per bp, default=0 [no maximum])
149 -cons (Retrieve conservation information for peaks/sites)
150 -CpG (Calculate CpG/GC content)
151 -ratio (process tag values as ratios - i.e. chip-seq, or mCpG/CpG)
152 -nfr (report nuclesome free region scores instead of tag counts, also -nfrSize &lt;#&gt;)
153 -norevopp (do not search for motifs on the opposite strand [works with -center too])
154 -noadj (do not adjust the tag counts based on total tags sequenced)
155 -norm &lt;#&gt; (normalize tags to this tag count, default=1e7, 0=average tag count in all directories)
156 -pdist (only report distance to nearest peak using -p, not peak name)
157 -map &lt;mapping file&gt; (mapping between peak IDs and promoter IDs, overrides closest assignment)
158 -noann, -nogene (skip genome annotation step, skip TSS annotation)
159 -homer1/-homer2 (by default, the new version of homer [-homer2] is used for finding motifs)
160
161
162 </help>
163 </tool>
164