comparison filter_assembly.xml @ 1:a83562c0719f draft

planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 68979144b9949c27bcc3340a9e8375de1391526c
author abims-sbr
date Mon, 03 Feb 2025 14:37:31 +0000
parents 7a813e633d1c
children 000dbfafe31d
comparison
equal deleted inserted replaced
0:7a813e633d1c 1:a83562c0719f
1 <tool name="Filter assemblies" id="filter_assemblies" version="2.0.3"> 1 <tool name="Filter assemblies" id="filter_assemblies" version="2.0.4">
2 2
3 <description> 3 <description>
4 Filter the outputs of Velvet or Trinity assemblies 4 Filter the outputs of Velvet or Trinity assemblies
5 </description> 5 </description>
6 6
7 <macros> 7 <macros>
8 <import>macros.xml</import> 8 <import>macros.xml</import>
9 </macros> 9 </macros>
10 10
11 <requirements> 11 <requirements>
12 <expand macro="python_required" /> 12 <expand macro="python3_required" />
13 <requirement type="package" version="0.0.14">fastx_toolkit</requirement>
14 <requirement type="package" version="10.2011">cap3</requirement> 13 <requirement type="package" version="10.2011">cap3</requirement>
15 </requirements> 14 </requirements>
16 15
17 <command> 16 <command>
18 <![CDATA[ 17 <![CDATA[
21 ln -s '$input' '$input.element_identifier'; 20 ln -s '$input' '$input.element_identifier';
22 #set $infiles = $infiles + $input.element_identifier + "," 21 #set $infiles = $infiles + $input.element_identifier + ","
23 #end for 22 #end for
24 #set $infiles = $infiles[:-1] 23 #set $infiles = $infiles[:-1]
25 24
26 ln -s '$__tool_directory__/scripts/S02a_remove_redondancy_from_velvet_oases.py' . &&
27 ln -s '$__tool_directory__/scripts/S02b_format_fasta_name_trinity.py' . &&
28 ln -s '$__tool_directory__/scripts/S03_choose_one_variants_per_locus_trinity.py' . &&
29 ln -s '$__tool_directory__/scripts/S04_find_orf.py' . &&
30 ln -s '$__tool_directory__/scripts/S05_filter.py' . &&
31
32 python '$__tool_directory__/scripts/S01_script_to_choose.py' 25 python '$__tool_directory__/scripts/S01_script_to_choose.py'
33 26
34 '$infiles' 27 '$infiles'
35 $length_seq_max 28 $length_seq_max
36 $percent_identity 29 $percent_identity
37 $overlap_length 30 $overlap_length
38 > ${log} 31 > '${log}'
39 ]]> 32 ]]>
40 </command> 33 </command>
41 34
42 <inputs> 35 <inputs>
43 <param name="inputs" type="data" format="fasta" multiple="true" label="Input files" /> 36 <param name="inputs" type="data" format="fasta" multiple="true" label="Input files" />
104 97
105 <![CDATA[ 98 <![CDATA[
106 99
107 **Description** 100 **Description**
108 101
109 This tool reformats Velvet Oases or Trinity assemblies for the AdaptSearch galaxy suite and selects only one variant per gene according to its length and quality check. 102 This tool runs the CAP3 software on assembly FASTA data, merge singlets and contigs and then reformat headers to allow any assembly tools.
110 103
111 --------- 104 ---------
112 105
113 **Input format** 106 **Input format**
114 107
115 (1) Sequences are in the sequential format: 108 Sequences are in the FASTA format:
116 109
117 | >seqname1 110 | >seqname1
118 | AAAGAGAGACCACATGTCAGTAGC -on one or several lines - 111 | AAAGAGAGACCACATGTCAGTAGC -on one or several lines -
119 | >seqname2 112 | >seqname2
120 | AAGGCCTGACCACATGAGTTAAGC -on one or several lines - 113 | AAGGCCTGACCACATGAGTTAAGC -on one or several lines -
121 | etc ... 114 | etc ...
122 | 115 |
123
124 2) The file name should begin with a two letter abbreviation of the species name (for isntance, 'Ap' if the species is Alvinella pompejana).
125
126 **For Velvet Oases assemblies input**
127
128 The headers must be as follow : *>Locus_i_Transcript_i/j_Confidence_x.xxx_Length_N* where i is the locus number, j the transcript variant among all versions of the transcript, x.xxx the confidence value and N the length.
129
130 **For Trinity assemblies inputs**
131
132 The headers must be as follow : *>cj_gj_ij Len=j path=[j:0-j]* where all the j are integers (locus number, transcript variant, length, position...)
133
134 **The tool handles the case if input files come from both assemblers (there is no need for input files to be exclusively from one or another assembler).**
135 116
136 --------- 117 ---------
137 118
138 **Parameters** 119 **Parameters**
139 120
148 --------- 129 ---------
149 130
150 **Steps**: 131 **Steps**:
151 132
152 The tool: 133 The tool:
153 1) Modifies the sequence name to add the species abbreviation using the 2 first letters of the name of the transcriptome file : note that each species abbreviation must be unique 134 1) Performs a CAP3 from the full set of ORFs to minimize redundancy
154 2) Selects one allelic sequence from each transcript (c or locus) using the length of the sequence and its level of confidence 135 2) Merges singlets and contigs identified by CAP3
155 3) Selects the best ORF from the sequence between two stop codons 136 3) Reformats headers of the FASTA records by adding a specified prefix (defined from the original filename) and ensures that sequences are on a single line
156 4) Performs a CAP3 from the full set of ORFs to minimize redundancy
157 5) Retrieves the initial transcript sequences from the remaining set of proceeded ORF sequences
158 137
159 **Outputs** 138 **Outputs**
160 139
161 - 'Filter Assemblies Summary' : the log file. 140 - 'Filter Assemblies Summary' : the log file.
162 - 'Filter Assemblies outputs' : the main results. 141 - 'Filter Assemblies outputs' : the main results.
170 --------- 149 ---------
171 150
172 Changelog 151 Changelog
173 --------- 152 ---------
174 153
154
155 **Version 2.2 - 07/10/2024**
156
157 - Input files can be from any assembly tools
158
175 **Version 2.1 - 15/01/2018** 159 **Version 2.1 - 15/01/2018**
176 160
177 - Input files can be a mix from files coming either from Trinity or Velvet Oases assemblers 161 - Input files can be a mix from files coming either from Trinity or Velvet Oases assemblers
178 162
179 **Version 2.0 - 14/04/2017** 163 **Version 2.0 - 14/04/2017**