Mercurial > repos > kevyin > homer
comparison annotatePeaks.xml @ 16:687df269e597 draft
Uploaded
author | kevyin |
---|---|
date | Wed, 19 Dec 2012 17:28:55 -0500 |
parents | |
children | b3b65304ee72 |
comparison
equal
deleted
inserted
replaced
15:529485c1dda1 | 16:687df269e597 |
---|---|
1 <tool id="homer_annotatePeaks" name="homer_annotatePeaks" version="0.0.4"> | |
2 <requirements> | |
3 <requirement type="package" version="4.1">homer</requirement> | |
4 </requirements> | |
5 <description></description> | |
6 <!--<version_command></version_command>--> | |
7 <command> | |
8 annotatePeaks.pl $input_bed $genome_selector 1> $out_annotated | |
9 2> $out_log || echo "Error running annotatePeaks." >&2 | |
10 </command> | |
11 <inputs> | |
12 <param format="tabular,bed" name="input_bed" type="data" label="Homer peaks OR BED format"/> | |
13 <param name="genome_selector" type="select" label="Genome version"> | |
14 <option value="hg19" selected="true">hg19</option> | |
15 </param> | |
16 <param type="text" name="options" label="Extra options" value="" help="See link below for more options"> | |
17 <sanitizer> | |
18 <valid initial="string.printable"> | |
19 <remove value="'"/> | |
20 <remove value="/"/> | |
21 </valid> | |
22 <mapping initial="none"> | |
23 <add source="'" target="__sq__"/> | |
24 </mapping> | |
25 </sanitizer> | |
26 </param> | |
27 </inputs> | |
28 <outputs> | |
29 <!--<data format="html" name="html_outfile" label="index" />--> | |
30 <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />--> | |
31 <data format="csv" name="out_annotated" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}" /> | |
32 <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}.log" /> | |
33 </outputs> | |
34 <tests> | |
35 <test> | |
36 <!--<param name="input_file" value="extract_genomic_dna.fa" />--> | |
37 <!--<output name="html_file" file="sample_output.html" ftype="html" />--> | |
38 </test> | |
39 </tests> | |
40 | |
41 <help> | |
42 | |
43 .. class:: infomark | |
44 | |
45 **Homer annoatePeaks** | |
46 | |
47 More information on accepted formats and options | |
48 | |
49 http://biowhat.ucsd.edu/homer/ngs/annotation.html | |
50 | |
51 TIP: use homer_bed2pos and homer_pos2bed to convert between the homer peak positions and the BED format. | |
52 | |
53 **Parameter list** | |
54 | |
55 Command line options (not all of them are supported):: | |
56 | |
57 Usage: annotatePeaks.pl <peak file | tss> <genome version> [additional options...] | |
58 | |
59 Available Genomes (required argument): (name,org,directory,default promoter set) | |
60 -- or -- | |
61 Custom: provide the path to genome FASTA files (directory or single file) | |
62 | |
63 User defined annotation files (default is UCSC refGene annotation): | |
64 annotatePeaks.pl accepts GTF (gene transfer formatted) files to annotate positions relative | |
65 to custom annotations, such as those from de novo transcript discovery or Gencode. | |
66 -gtf <gtf format file> (-gff and -gff3 can work for those files, but GTF is better) | |
67 | |
68 Peak vs. tss/tts/rna mode (works with custom GTF file): | |
69 If the first argument is "tss" (i.e. annotatePeaks.pl tss hg18 ...) then a TSS centric | |
70 analysis will be carried out. Tag counts and motifs will be found relative to the TSS. | |
71 (no position file needed) ["tts" now works too - e.g. 3' end of gene] | |
72 ["rna" specifies gene bodies, will automaticall set "-size given"] | |
73 NOTE: The default TSS peak size is 4000 bp, i.e. +/- 2kb (change with -size option) | |
74 -list <gene id list> (subset of genes to perform analysis [unigene, gene id, accession, | |
75 probe, etc.], default = all promoters) | |
76 -cTSS <promoter position file i.e. peak file> (should be centered on TSS) | |
77 | |
78 Primary Annotation Options: | |
79 -mask (Masked repeats, can also add 'r' to end of genome name) | |
80 -m <motif file 1> [motif file 2] ... (list of motifs to find in peaks) | |
81 -mscore (reports the highest log-odds score within the peak) | |
82 -nmotifs (reports the number of motifs per peak) | |
83 -mdist (reports distance to closest motif) | |
84 -mfasta <filename> (reports sites in a fasta file - for building new motifs) | |
85 -fm <motif file 1> [motif file 2] (list of motifs to filter from above) | |
86 -rmrevopp <#> (only count sites found within <#> on both strands once, i.e. palindromic) | |
87 -matrix <prefix> (outputs a motif co-occurrence files: | |
88 prefix.count.matrix.txt - number of peaks with motif co-occurrence | |
89 prefix.ratio.matrix.txt - ratio of observed vs. expected co-occurrence | |
90 prefix.logPvalue.matrix.txt - co-occurrence enrichment | |
91 prefix.stats.txt - table of pair-wise motif co-occurrence statistics | |
92 additional options: | |
93 -matrixMinDist <#> (minimum distance between motif pairs - to avoid overlap) | |
94 -matrixMaxDist <#> (maximum distance between motif pairs) | |
95 -mbed <filename> (Output motif positions to a BED file to load at UCSC (or -mpeak)) | |
96 -mlogic <filename> (will output stats on common motif orientations) | |
97 -d <tag directory 1> [tag directory 2] ... (list of experiment directories to show | |
98 tag counts for) NOTE: -dfile <file> where file is a list of directories in first column | |
99 -bedGraph <bedGraph file 1> [bedGraph file 2] ... (read coverage counts from bedGraph files) | |
100 -wig <wiggle file 1> [wiggle file 2] ... (read coverage counts from wiggle files) | |
101 -p <peak file> [peak file 2] ... (to find nearest peaks) | |
102 -pdist to report only distance (-pdist2 gives directional distance) | |
103 -pcount to report number of peaks within region | |
104 -vcf <VCF file> (annotate peaks with genetic variation infomation, one col per individual) | |
105 -editDistance (Computes the # bp changes relative to reference) | |
106 -individuals <name1> [name2] ... (restrict analysis to these individuals) | |
107 -gene <data file> ... (Adds additional data to result based on the closest gene. | |
108 This is useful for adding gene expression data. The file must have a header, | |
109 and the first column must be a GeneID, Accession number, etc. If the peak | |
110 cannot be mapped to data in the file then the entry will be left empty. | |
111 -go <output directory> (perform GO analysis using genes near peaks) | |
112 -genomeOntology <output directory> (perform genomeOntology analysis on peaks) | |
113 -gsize <#> (Genome size for genomeOntology analysis, default: 2e9) | |
114 | |
115 Annotation vs. Histogram mode: | |
116 -hist <bin size in bp> (i.e 1, 2, 5, 10, 20, 50, 100 etc.) | |
117 The -hist option can be used to generate histograms of position dependent features relative | |
118 to the center of peaks. This is primarily meant to be used with -d and -m options to map | |
119 distribution of motifs and ChIP-Seq tags. For ChIP-Seq peaks for a Transcription factor | |
120 you might want to use the -center option (below) to center peaks on the known motif | |
121 ** If using "-size given", histogram will be scaled to each region (i.e. 0-100%), with | |
122 the -hist parameter being the number of bins to divide each region into. | |
123 Histogram Mode specific Options: | |
124 -nuc (calculated mononucleotide frequencies at each position, | |
125 Will report by default if extracting sequence for other purposes like motifs) | |
126 -di (calculated dinucleotide frequencies at each position) | |
127 -histNorm <#> (normalize the total tag count for each region to 1, where <#> is the | |
128 minimum tag total per region - use to avoid tag spikes from low coverage | |
129 -ghist (outputs profiles for each gene, for peak shape clustering) | |
130 -rm <#> (remove occurrences of same motif that occur within # bp) | |
131 | |
132 Peak Centering: (other options are ignored) | |
133 -center <motif file> (This will re-center peaks on the specified motif, or remove peak | |
134 if there is no motif in the peak. ONLY recentering will be performed, and all other | |
135 options will be ignored. This will output a new peak file that can then be reanalyzed | |
136 to reveal fine-grain structure in peaks (It is advised to use -size < 200) with this | |
137 to keep peaks from moving too far (-mirror flips the position) | |
138 -multi (returns genomic positions of all sites instead of just the closest to center) | |
139 | |
140 Advanced Options: | |
141 -len <#> / -fragLength <#> (Fragment length, default=auto, might want to set to 0 for RNA) | |
142 -size <#> (Peak size[from center of peak], default=inferred from peak file) | |
143 -size #,# (i.e. -size -10,50 count tags from -10 bp to +50 bp from center) | |
144 -size "given" (count tags etc. using the actual regions - for variable length regions) | |
145 -log (output tag counts as log2(x+1+rand) values - for scatter plots) | |
146 -sqrt (output tag counts as sqrt(x+rand) values - for scatter plots) | |
147 -strand <+|-|both> (Count tags on specific strands relative to peak, default: both) | |
148 -pc <#> (maximum number of tags to count per bp, default=0 [no maximum]) | |
149 -cons (Retrieve conservation information for peaks/sites) | |
150 -CpG (Calculate CpG/GC content) | |
151 -ratio (process tag values as ratios - i.e. chip-seq, or mCpG/CpG) | |
152 -nfr (report nuclesome free region scores instead of tag counts, also -nfrSize <#>) | |
153 -norevopp (do not search for motifs on the opposite strand [works with -center too]) | |
154 -noadj (do not adjust the tag counts based on total tags sequenced) | |
155 -norm <#> (normalize tags to this tag count, default=1e7, 0=average tag count in all directories) | |
156 -pdist (only report distance to nearest peak using -p, not peak name) | |
157 -map <mapping file> (mapping between peak IDs and promoter IDs, overrides closest assignment) | |
158 -noann, -nogene (skip genome annotation step, skip TSS annotation) | |
159 -homer1/-homer2 (by default, the new version of homer [-homer2] is used for finding motifs) | |
160 | |
161 | |
162 </help> | |
163 </tool> | |
164 |