Mercurial > repos > drosofff > repenrich
comparison repenrich.xml @ 0:1435d142041b draft
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit d5ebd581fa3a22ca61ce07a31c01bb70610fbcf5
author | drosofff |
---|---|
date | Tue, 23 May 2017 18:37:22 -0400 |
parents | |
children | 54a3f3a195d6 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:1435d142041b |
---|---|
1 <tool id="repenrich" name="RepEnrich" version="0.2.0"> | |
2 <description>Repeat Element Profiling</description> | |
3 <requirements> | |
4 <requirement type="package" version="1.2.0">bowtie</requirement> | |
5 <requirement type="package" version="0.1.19">samtools</requirement> | |
6 <requirement type="package" version="2.20.1">bedtools</requirement> | |
7 <requirement type="package" version="1.69">biopython</requirement> | |
8 </requirements> | |
9 <stdio> | |
10 <exit_code range="1:" level="fatal" description="Tool exception" /> | |
11 </stdio> | |
12 <command detect_errors="exit_code"><![CDATA[ | |
13 #import re | |
14 #set input_base = re.sub('\.fastq$', '', str($input_fastq.element_identifier)) | |
15 #set baseReference = re.sub('[^\w\-]', '_', str($genome.element_identifier)) | |
16 #set baseReference = re.sub('.fa$', '', $baseReference) | |
17 ln -f -s '$genome' '${baseReference}.fa' && | |
18 ln -f -s '$input_fastq' '${input_base}.fastq' && | |
19 bowtie-build '$genome' ${baseReference} && | |
20 python $__tool_directory__/RepEnrich_setup.py $repeatmasker ${baseReference}.fa setup_folder_${baseReference} && | |
21 bowtie $baseReference -p \${GALAXY_SLOTS:-4} -t -m 1 -S --max ${input_base}_multimap.fastq ${input_base}.fastq ${input_base}_unique.sam && | |
22 samtools view -bS ${input_base}_unique.sam > ${input_base}_unique.bam && | |
23 samtools sort ${input_base}_unique.bam ${input_base}_unique_sorted && | |
24 mv ${input_base}_unique_sorted.bam ${input_base}_unique.bam && | |
25 samtools index ${input_base}_unique.bam && | |
26 rm ${input_base}_unique.sam && | |
27 python $__tool_directory__/RepEnrich.py $repeatmasker ${input_base} ${input_base} setup_folder_${baseReference} ${input_base}_multimap.fastq ${input_base}_unique.bam --cpus "\${GALAXY_SLOTS:-4}" && | |
28 cp $input_base/${input_base}_class_fraction_counts.txt class_fraction_counts.tabular && | |
29 cp $input_base/${input_base}_family_fraction_counts.txt family_fraction_counts.tabular && | |
30 cp $input_base/${input_base}_fraction_counts.txt fraction_counts.tabular | |
31 | |
32 ]]></command> | |
33 <!-- basic error handling --> | |
34 <inputs> | |
35 <param format="fasta" label="Reference genome in fasta format" name="genome" type="data" /> | |
36 <param format="fastq,fastqsanger" label="Single-reads equencing dataset" name="input_fastq" type="data" help="accepted formats: fastq, fastqsanger" /> | |
37 <param format="txt" label="RepeatMasker description file" name="repeatmasker" type="data" help="see help section"/> | |
38 </inputs> | |
39 | |
40 <outputs> | |
41 <data format="tabular" name="class_fraction_counts" label="RepEnrich on ${on_string}: class fraction counts" from_work_dir="class_fraction_counts.tabular"> | |
42 </data> | |
43 <data format="tabular" name="family_fraction_counts" label="RepEnrich on ${on_string}: family fraction counts" from_work_dir="family_fraction_counts.tabular"> | |
44 </data> | |
45 <data format="tabular" name="fraction_counts" label="RepEnrich on ${on_string}: fraction counts" from_work_dir="fraction_counts.tabular"> | |
46 </data> | |
47 </outputs> | |
48 | |
49 <tests> | |
50 <test> | |
51 <param name="input_fastq" value="Samp.fastq" ftype="fastq"/> | |
52 <param name="genome" value="chrM.fa" ftype="fasta"/> | |
53 <param name="repeatmasker" value="chrM_repeatmasker.txt" ftype="txt"/> | |
54 <output name="class_fraction_counts" file="Samp_class_fraction_counts.tabular" ftype="tabular"/> | |
55 <output name="family_fraction_counts" file="Samp_family_fraction_counts.tabular" ftype="tabular"/> | |
56 <output name="fraction_counts" file="Samp_fraction_counts.tabular" ftype="tabular"/> | |
57 </test> | |
58 </tests> | |
59 | |
60 <help> | |
61 | |
62 **What it does** | |
63 | |
64 Reads are mapped to the genome using the Bowtie1 aligner. Reads mapping uniquely to the genome are assigned to subfamilies of repetitive elements based on their degree of overlap to RepeatMasker annotated genomic instances of each repetitive element subfamily. Reads mapping to multiple locations are separately mapped to repetitive element assemblies – referred to as repetitive element psuedogenomes – built from RepeatMasker annotated genomic instances of repetitive element subfamilies. RepEnrich then return tables of counts merged from both strategies, that can be further processed in statistical analysis for differential expression. For detailed information see the `original publication`_. | |
65 | |
66 .. _original publication: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-583 | |
67 | |
68 **Inputs** | |
69 | |
70 *Reference genome* : reference genome in fasta format | |
71 | |
72 *Sequencing dataset*: Single-reads sequencing dataset. Paired-end sequencing dataset in not implemented yet | |
73 | |
74 *RepeatMasker description file*: a txt repeatmasker file which can be downloaded from http://www.repeatmasker.org/genomicDatasets/RMGenomicDatasets.html | |
75 | |
76 This file looks like: | |
77 | |
78 <![CDATA[ | |
79 | |
80 SW perc perc perc query position in query matching repeat position in repeat | |
81 | |
82 score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID | |
83 | |
84 16 20.2 5.9 0.0 chrM 1211 1261 (18263) + (TTTTA)n Simple_repeat 1 54 (0) 84486 | |
85 | |
86 13 23.9 2.2 2.2 chrM 2014 2059 (17465) + (TTA)n Simple_repeat 1 46 (0) 84487 | |
87 | |
88 24 18.8 5.3 2.6 chrM 3924 3999 (15525) + (TAT)n Simple_repeat 1 78 (0) 84488 | |
89 | |
90 18 4.5 0.0 0.0 chrM 5961 5983 (13541) + (AT)n Simple_repeat 1 23 (0) 84489 | |
91 | |
92 13 25.9 4.0 4.0 chrM 6247 6320 (13204) + (ATTTAT)n Simple_repeat 1 74 (0) 84490 | |
93 | |
94 11 14.6 7.5 2.4 chrM 8783 8822 (10702) + (CTAATT)n Simple_repeat 1 42 (0) 84491 | |
95 | |
96 17 19.0 0.0 8.6 chrM 9064 9126 (10398) + A-rich Low_complexity 1 58 (0) 84492 | |
97 | |
98 13 21.0 5.9 1.9 chrM 11723 11773 (7751) + (ATA)n Simple_repeat 1 53 (0) 84493 | |
99 | |
100 66 20.4 12.3 12.3 chrM 12823 13001 (6523) C LSU-rRNA_Cel rRNA (1) 2431 2253 84494 | |
101 | |
102 16 16.6 0.0 2.9 chrM 14361 14396 (5128) + (ATT)n Simple_repeat 1 35 (0) 84495 | |
103 | |
104 44 2.4 0.0 0.0 chrM 15966 16007 (3517) + (TA)n Simple_repeat 1 42 (0) 84496 | |
105 | |
106 35 5.3 0.0 0.0 chrM 16559 16597 (2927) + (AT)n Simple_repeat 1 39 (0) 84497 | |
107 | |
108 36 2.9 0.0 0.0 chrM 16922 16956 (2568) + (AT)n Simple_repeat 1 35 (0) 84498 | |
109 | |
110 37 0.0 0.0 0.0 chrM 17040 17071 (2453) + (TA)n Simple_repeat 1 32 (0) 84499 | |
111 | |
112 20 4.3 0.0 0.0 chrM 17417 17440 (2084) + (T)n Simple_repeat 1 24 (0) 84500 | |
113 | |
114 31 6.9 6.3 1.5 chrM 17451 17513 (2011) + (TA)n Simple_repeat 1 66 (0) 84501 | |
115 | |
116 26 17.0 0.0 0.0 chrM 19469 19514 (10) + A-rich Low_complexity 1 46 (0) 84502 | |
117 | |
118 ]]> | |
119 | |
120 Users may filter this file so that it contains only desired items (for instance only satellites, repeats and transposons) | |
121 | |
122 **Outputs** | |
123 | |
124 (1) Fraction counts, (2) Family fraction counts and (3) Class fraction counts are returned in tabular format, for further statistical tests differential expression analysis or graphics | |
125 | |
126 **RepEnrich** | |
127 | |
128 This Galaxy tool is a wrapper of the RepEnrich tool by steven_criscione@brown.edu et al. whose code and manual are available in `GitHub`_. | |
129 | |
130 .. _GitHub: https://github.com/nskvir/RepEnrich | |
131 | |
132 Python scripts RepEnrich.py and RepEnrich_setup.py have been adapted to python 3. Note that sorting of Fraction counts, Family fraction counts and Class fraction counts is different with this Galaxy wrapper or with RepEnrich as found in the `RepEnrich code repository`_. However, this different sorting does not affect subsequent statistical analyses | |
133 | |
134 .. _RepEnrich code repository: https://github.com/nskvir/RepEnrich | |
135 | |
136 **Execution time** | |
137 | |
138 .. class:: warningmark | |
139 | |
140 This tool includes steps to index the reference genome, index repeat sequences and align reads to these indexes. Therefore the run time may be **long to very long**. | |
141 | |
142 .. class:: infomark | |
143 | |
144 For more information on the tools, please visit our `code repository`_. | |
145 | |
146 If you would like to give us feedback or you run into any trouble, please send an email to artbio.ibps@gmail.com | |
147 | |
148 This tool wrapper is developed by the `ARTbio team`_ at the `Institut de Biologie Paris Seine (IBPS)`_. | |
149 | |
150 .. _code repository: https://github.com/ARTbio/tools-artbio/tree/master/tools/ | |
151 .. _ARTbio team: http://artbio.fr | |
152 .. _Institut de Biologie Paris Seine (IBPS): http://www.ibps.upmc.fr/en/core-facilities/bioinformatics | |
153 | |
154 </help> | |
155 | |
156 <citations> | |
157 <citation type="doi">10.1186/1471-2164-15-583</citation> | |
158 </citations> | |
159 </tool> |