annotate dante_gff_output_filtering.xml @ 24:df99812ded92 draft

"planemo upload commit a0a9b02c60a91942a271b8b35648c0b152fe1ebd-dirty"
author petr-novak
date Fri, 27 Jan 2023 08:15:31 +0000
parents e2bbc79f0fac
children 02c6dff8c381
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
24
df99812ded92 "planemo upload commit a0a9b02c60a91942a271b8b35648c0b152fe1ebd-dirty"
petr-novak
parents: 23
diff changeset
1 <tool id="domains_filter" name="Protein Domains Filter" version="1.1.5">
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
2 <description> Tool for filtering of gff3 output from DANTE. Filtering can be performed based domain type and alignment quality. </description>
23
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
3 <requirements>
24
df99812ded92 "planemo upload commit a0a9b02c60a91942a271b8b35648c0b152fe1ebd-dirty"
petr-novak
parents: 23
diff changeset
4 <requirement type="package">dante=0.1.5</requirement>
23
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
5 </requirements>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
6 <stdio>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
7 <regex match="Traceback" source="stderr" level="fatal" description="Unknown error" />
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
8 <regex match="error" source="stderr" level="fatal" description="Unknown error" />
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
9 <regex match="Traceback" source="stderr" level="fatal" description="Unknown error" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
10 <regex match="error" source="stderr" level="fatal" description="Unknown error" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
11 </stdio>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
12 <command>
23
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
13 dante_gff_output_filtering.py --dom_gff ${DomGff} --domains_prot_seq ${dom_prot_seq} --domains_filtered ${dom_filtered} --selected_dom ${selected_domain} --th_identity ${th_identity} --th_similarity ${th_similarity} --th_length ${th_length} --interruptions ${interruptions} --max_len_proportion ${th_len_ratio} --element_type '${element_type}'
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
14
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
15 </command>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
16 <inputs>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
17 <param format="gff" type="data" name="DomGff" label="Choose primary GFF3 file of all domains from Protein Domains Finder" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
18 <param name="th_identity" type="float" value="0.35" min="0" max="1" label="Minimum identity" help="Protein sequence indentity threshold between input and mapped protein from db [0-1]" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
19 <param name="th_similarity" type="float" value="0.45" min="0" max="1" label="Minimum similarity" help="Protein sequence similarity threshold between input and mapped protein from db [0-1]" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
20 <param name="th_length" type="float" value="0.8" min="0" max="1" label="Minimum alignment length" help="Proportion of the hit length without gaps to the length of the database sequence [0-1]" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
21 <param name="interruptions" type="integer" value="3" label="Interruptions [frameshifts + stop codons]" help="Tolerance threshold per every starting 100 amino acids of alignment sequence" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
22 <param name="th_len_ratio" type="float" value="1.2" label="Maximal length proportion" help="Maximal proportion of alignment length to the original length of protein domain from database (including indels)" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
23 <param name="selected_domain" type="select" label="Select protein domain type" >
23
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
24 <option value="All" selected="true">All</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
25 <option value="GAG">GAG</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
26 <option value="INT">INT</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
27 <option value="PROT">PROT</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
28 <option value="RH">RH</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
29 <option value="RT">RT</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
30 <option value="aRH">aRH</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
31 <option value="CHDCR">CHDCR</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
32 <option value="CHDII">CHDII</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
33 <option value="TPase">TPase</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
34 <option value="YR">YR</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
35 <option value="HEL1">HEL1</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
36 <option value="HEL2">HEL2</option>
e2bbc79f0fac "planemo upload commit baf4ca09569b1b709c37f2df712e778da05edaf9-dirty"
petr-novak
parents: 22
diff changeset
37 <option value="ENDO">ENDO</option>
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
38 </param>
10
d0431a839606 Uploaded
petr-novak
parents: 0
diff changeset
39 <param name="element_type" type="text" value="" label="Filter based on classification" help="You can use preset options or enter an arbitrary string to filter a certain repetitive element type of any level. It must be a continuous substring in a proper format of Final_Classification attribute of GFF3 file. Classification levels are separated by | character. Filtering is case sensitive">
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
40 <option value="Ty1/copia">Ty1/copia</option>
10
d0431a839606 Uploaded
petr-novak
parents: 0
diff changeset
41 <option value="Ty3/gypsy">Ty3/gypsy</option>
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
42 <option value="Class_I|">Class_I|</option>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
43 <option value="Class_II|">Class_II|</option>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
44 <sanitizer>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
45 <valid initial="string.ascii_letters,string.digits">
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
46 <add value="_" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
47 <add value="/" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
48 <add value="|" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
49 </valid>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
50 </sanitizer>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
51 </param>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
52 </inputs>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
53 <outputs>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
54 <data format="gff3" name="dom_filtered" label="Filtered GFF3 file of ${selected_domain} domains from dataset ${DomGff.hid}" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
55 <data format="fasta" name="dom_prot_seq" label="Protein sequences of ${selected_domain} domains from dataset ${DomGff.hid}" />
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
56
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
57
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
58 </outputs>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
59
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
60 <help>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
61
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
62 **WHAT IT DOES**
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
63
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
64 This tool runs filtering on either primary GFF3 file of all domains, i.e. output of *Protein Domains Finder* tool or already filtered GFF3 file. Domains can be filtered based on:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
65
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
66 **Quality of alignment such as**:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
67 - alignment sequence identity
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
68 - alignment similarity
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
69 - alignment proportion length
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
70 - number of interruptions (frameshifts or stop codons) per 100 AA
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
71
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
72 **Protein domain type**
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
73 This filtration is based on "Name" attribute of GFF3 file
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
74
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
75 **Repetitive element classification**
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
76 In the text field you can specify a classification string you wish to filter. This filtration is based on "Final_Classification" attribute of GFF file, so it must be in the proper form (classification levels are separated by "|"). You can see which classifications occurs in your data taking a look into Classification summary table output. If you leave the field blank, domains of all classifications will be reported
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
77
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
78 All the records containing ambiguous domain type (e.g. RH/INT) are filtered out automatically. They do not take place in filtered gff file neither the protein sequence is derived from these potentially chimeric domains. Optimal results (for general usage) should be reached using the default quality filtering parameters which are appropriate to find all types of protein domains. Keep in mind that the results should be critically assessed based on your input data anyhow.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
79
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
80
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
81 **OUTPUTS PRODUCED:**
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
82 1. Filtered GFF3 file
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
83 2. Translated protein sequences of the filtered domains regions of original DNA sequence in fasta format
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
84
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
85 *Translated sequences are taken from the best alignment (Best_Hit attribute) within a domain region, however this alignment does not necessarily have to cover the whole region reported as a domain in gff file*
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
86
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
87
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
88 </help>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
89 </tool>
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
90