16
|
1 <tool id="homer_annotatePeaks" name="homer_annotatePeaks" version="0.0.4">
|
|
2 <requirements>
|
|
3 <requirement type="package" version="4.1">homer</requirement>
|
|
4 </requirements>
|
|
5 <description></description>
|
|
6 <!--<version_command></version_command>-->
|
|
7 <command>
|
|
8 annotatePeaks.pl $input_bed $genome_selector 1> $out_annotated
|
|
9 2> $out_log || echo "Error running annotatePeaks." >&2
|
|
10 </command>
|
|
11 <inputs>
|
|
12 <param format="tabular,bed" name="input_bed" type="data" label="Homer peaks OR BED format"/>
|
|
13 <param name="genome_selector" type="select" label="Genome version">
|
|
14 <option value="hg19" selected="true">hg19</option>
|
|
15 </param>
|
|
16 <param type="text" name="options" label="Extra options" value="" help="See link below for more options">
|
|
17 <sanitizer>
|
|
18 <valid initial="string.printable">
|
|
19 <remove value="'"/>
|
|
20 <remove value="/"/>
|
|
21 </valid>
|
|
22 <mapping initial="none">
|
|
23 <add source="'" target="__sq__"/>
|
|
24 </mapping>
|
|
25 </sanitizer>
|
|
26 </param>
|
|
27 </inputs>
|
|
28 <outputs>
|
|
29 <!--<data format="html" name="html_outfile" label="index" />-->
|
|
30 <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />-->
|
|
31 <data format="csv" name="out_annotated" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}" />
|
|
32 <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}.log" />
|
|
33 </outputs>
|
|
34 <tests>
|
|
35 <test>
|
|
36 <!--<param name="input_file" value="extract_genomic_dna.fa" />-->
|
|
37 <!--<output name="html_file" file="sample_output.html" ftype="html" />-->
|
|
38 </test>
|
|
39 </tests>
|
|
40
|
|
41 <help>
|
|
42
|
|
43 .. class:: infomark
|
|
44
|
|
45 **Homer annoatePeaks**
|
|
46
|
|
47 More information on accepted formats and options
|
|
48
|
|
49 http://biowhat.ucsd.edu/homer/ngs/annotation.html
|
|
50
|
|
51 TIP: use homer_bed2pos and homer_pos2bed to convert between the homer peak positions and the BED format.
|
|
52
|
|
53 **Parameter list**
|
|
54
|
|
55 Command line options (not all of them are supported)::
|
|
56
|
|
57 Usage: annotatePeaks.pl <peak file | tss> <genome version> [additional options...]
|
|
58
|
|
59 Available Genomes (required argument): (name,org,directory,default promoter set)
|
|
60 -- or --
|
|
61 Custom: provide the path to genome FASTA files (directory or single file)
|
|
62
|
|
63 User defined annotation files (default is UCSC refGene annotation):
|
|
64 annotatePeaks.pl accepts GTF (gene transfer formatted) files to annotate positions relative
|
|
65 to custom annotations, such as those from de novo transcript discovery or Gencode.
|
|
66 -gtf <gtf format file> (-gff and -gff3 can work for those files, but GTF is better)
|
|
67
|
|
68 Peak vs. tss/tts/rna mode (works with custom GTF file):
|
|
69 If the first argument is "tss" (i.e. annotatePeaks.pl tss hg18 ...) then a TSS centric
|
|
70 analysis will be carried out. Tag counts and motifs will be found relative to the TSS.
|
|
71 (no position file needed) ["tts" now works too - e.g. 3' end of gene]
|
|
72 ["rna" specifies gene bodies, will automaticall set "-size given"]
|
|
73 NOTE: The default TSS peak size is 4000 bp, i.e. +/- 2kb (change with -size option)
|
|
74 -list <gene id list> (subset of genes to perform analysis [unigene, gene id, accession,
|
|
75 probe, etc.], default = all promoters)
|
|
76 -cTSS <promoter position file i.e. peak file> (should be centered on TSS)
|
|
77
|
|
78 Primary Annotation Options:
|
|
79 -mask (Masked repeats, can also add 'r' to end of genome name)
|
|
80 -m <motif file 1> [motif file 2] ... (list of motifs to find in peaks)
|
|
81 -mscore (reports the highest log-odds score within the peak)
|
|
82 -nmotifs (reports the number of motifs per peak)
|
|
83 -mdist (reports distance to closest motif)
|
|
84 -mfasta <filename> (reports sites in a fasta file - for building new motifs)
|
|
85 -fm <motif file 1> [motif file 2] (list of motifs to filter from above)
|
|
86 -rmrevopp <#> (only count sites found within <#> on both strands once, i.e. palindromic)
|
|
87 -matrix <prefix> (outputs a motif co-occurrence files:
|
|
88 prefix.count.matrix.txt - number of peaks with motif co-occurrence
|
|
89 prefix.ratio.matrix.txt - ratio of observed vs. expected co-occurrence
|
|
90 prefix.logPvalue.matrix.txt - co-occurrence enrichment
|
|
91 prefix.stats.txt - table of pair-wise motif co-occurrence statistics
|
|
92 additional options:
|
|
93 -matrixMinDist <#> (minimum distance between motif pairs - to avoid overlap)
|
|
94 -matrixMaxDist <#> (maximum distance between motif pairs)
|
|
95 -mbed <filename> (Output motif positions to a BED file to load at UCSC (or -mpeak))
|
|
96 -mlogic <filename> (will output stats on common motif orientations)
|
|
97 -d <tag directory 1> [tag directory 2] ... (list of experiment directories to show
|
|
98 tag counts for) NOTE: -dfile <file> where file is a list of directories in first column
|
|
99 -bedGraph <bedGraph file 1> [bedGraph file 2] ... (read coverage counts from bedGraph files)
|
|
100 -wig <wiggle file 1> [wiggle file 2] ... (read coverage counts from wiggle files)
|
|
101 -p <peak file> [peak file 2] ... (to find nearest peaks)
|
|
102 -pdist to report only distance (-pdist2 gives directional distance)
|
|
103 -pcount to report number of peaks within region
|
|
104 -vcf <VCF file> (annotate peaks with genetic variation infomation, one col per individual)
|
|
105 -editDistance (Computes the # bp changes relative to reference)
|
|
106 -individuals <name1> [name2] ... (restrict analysis to these individuals)
|
|
107 -gene <data file> ... (Adds additional data to result based on the closest gene.
|
|
108 This is useful for adding gene expression data. The file must have a header,
|
|
109 and the first column must be a GeneID, Accession number, etc. If the peak
|
|
110 cannot be mapped to data in the file then the entry will be left empty.
|
|
111 -go <output directory> (perform GO analysis using genes near peaks)
|
|
112 -genomeOntology <output directory> (perform genomeOntology analysis on peaks)
|
|
113 -gsize <#> (Genome size for genomeOntology analysis, default: 2e9)
|
|
114
|
|
115 Annotation vs. Histogram mode:
|
|
116 -hist <bin size in bp> (i.e 1, 2, 5, 10, 20, 50, 100 etc.)
|
|
117 The -hist option can be used to generate histograms of position dependent features relative
|
|
118 to the center of peaks. This is primarily meant to be used with -d and -m options to map
|
|
119 distribution of motifs and ChIP-Seq tags. For ChIP-Seq peaks for a Transcription factor
|
|
120 you might want to use the -center option (below) to center peaks on the known motif
|
|
121 ** If using "-size given", histogram will be scaled to each region (i.e. 0-100%), with
|
|
122 the -hist parameter being the number of bins to divide each region into.
|
|
123 Histogram Mode specific Options:
|
|
124 -nuc (calculated mononucleotide frequencies at each position,
|
|
125 Will report by default if extracting sequence for other purposes like motifs)
|
|
126 -di (calculated dinucleotide frequencies at each position)
|
|
127 -histNorm <#> (normalize the total tag count for each region to 1, where <#> is the
|
|
128 minimum tag total per region - use to avoid tag spikes from low coverage
|
|
129 -ghist (outputs profiles for each gene, for peak shape clustering)
|
|
130 -rm <#> (remove occurrences of same motif that occur within # bp)
|
|
131
|
|
132 Peak Centering: (other options are ignored)
|
|
133 -center <motif file> (This will re-center peaks on the specified motif, or remove peak
|
|
134 if there is no motif in the peak. ONLY recentering will be performed, and all other
|
|
135 options will be ignored. This will output a new peak file that can then be reanalyzed
|
|
136 to reveal fine-grain structure in peaks (It is advised to use -size < 200) with this
|
|
137 to keep peaks from moving too far (-mirror flips the position)
|
|
138 -multi (returns genomic positions of all sites instead of just the closest to center)
|
|
139
|
|
140 Advanced Options:
|
|
141 -len <#> / -fragLength <#> (Fragment length, default=auto, might want to set to 0 for RNA)
|
|
142 -size <#> (Peak size[from center of peak], default=inferred from peak file)
|
|
143 -size #,# (i.e. -size -10,50 count tags from -10 bp to +50 bp from center)
|
|
144 -size "given" (count tags etc. using the actual regions - for variable length regions)
|
|
145 -log (output tag counts as log2(x+1+rand) values - for scatter plots)
|
|
146 -sqrt (output tag counts as sqrt(x+rand) values - for scatter plots)
|
|
147 -strand <+|-|both> (Count tags on specific strands relative to peak, default: both)
|
|
148 -pc <#> (maximum number of tags to count per bp, default=0 [no maximum])
|
|
149 -cons (Retrieve conservation information for peaks/sites)
|
|
150 -CpG (Calculate CpG/GC content)
|
|
151 -ratio (process tag values as ratios - i.e. chip-seq, or mCpG/CpG)
|
|
152 -nfr (report nuclesome free region scores instead of tag counts, also -nfrSize <#>)
|
|
153 -norevopp (do not search for motifs on the opposite strand [works with -center too])
|
|
154 -noadj (do not adjust the tag counts based on total tags sequenced)
|
|
155 -norm <#> (normalize tags to this tag count, default=1e7, 0=average tag count in all directories)
|
|
156 -pdist (only report distance to nearest peak using -p, not peak name)
|
|
157 -map <mapping file> (mapping between peak IDs and promoter IDs, overrides closest assignment)
|
|
158 -noann, -nogene (skip genome annotation step, skip TSS annotation)
|
|
159 -homer1/-homer2 (by default, the new version of homer [-homer2] is used for finding motifs)
|
|
160
|
|
161
|
|
162 </help>
|
|
163 </tool>
|
|
164
|