0
|
1
|
|
2 <tool id="removeFastaSubSequence" name="Remove Fasta Substring Sequence" version="1.0.0">
|
1
|
3 <description>Removes sequences from query fasta file that are present in a reference Fasta File.</description>
|
0
|
4 <requirements>
|
|
5 <requirement type="package" version="1.70">biopython</requirement>
|
|
6 </requirements>
|
|
7 <command interpreter="python"><![CDATA[removeFastaSubSequence.py $ref_fastafile $query_fastafile $output]]></command>
|
|
8 <inputs>
|
|
9 <param name="ref_fastafile" type="data" format="fasta">
|
|
10 <label>Input Reference Fasta File</label>
|
|
11 </param>
|
|
12 <param name="query_fastafile" type="data" format="fasta">
|
|
13 <label>Input Query Fasta File</label>
|
|
14 </param>
|
|
15 </inputs>
|
|
16
|
|
17 <outputs>
|
|
18 <data format="fasta" name="output" label="uniqSeq_${query_fastafile.name.rsplit('.',1)[0]}.fasta" />
|
|
19 </outputs>
|
|
20
|
|
21 <tests>
|
|
22 <test>
|
|
23 <param name="ref_fastafile" value="test_ref.fasta" />
|
|
24 <param name="query_fastafile" value="test_query.fasta" />
|
|
25 <output name="output" file="uniqSeq_test_query.fasta">
|
|
26 <assert_contents>
|
|
27 <has_text text="ENSMUST00000193003" />
|
|
28 </assert_contents>
|
|
29 </output>
|
|
30 </test>
|
|
31 </tests>
|
|
32
|
|
33
|
|
34 <help>
|
1
|
35 This program removes the sequences from the query fasta file that are present in a reference fasta file (removes even those query sequences that are present as substring in reference fasta file).
|
0
|
36
|
|
37 EXAMPLE:
|
|
38
|
|
39 ----
|
|
40
|
|
41 Ref sequences:
|
|
42
|
|
43 >reference_seq_1
|
|
44
|
|
45 TSLDKDHLELCCTLSLPFSWACSWVLVLRLSINGQLPRSRLWAAHCLWGVP
|
|
46
|
|
47 >reference_seq_2
|
|
48
|
|
49 RGLCISGLEKEVQVQSRQAEGPVHLWLRKGSTSAE
|
|
50
|
|
51 ----
|
|
52
|
|
53 Query Sequences:
|
|
54
|
|
55 >query_seq_1
|
|
56
|
|
57 TKTILNYAVLSPCLSPGHVLGC
|
|
58
|
|
59
|
|
60 >query_seq_2
|
|
61
|
|
62 LDKDHLELCCTLSLPFSWACSWVLVL
|
|
63
|
|
64
|
|
65 >query_seq_3
|
|
66
|
|
67 LWGVPRGLCISG
|
|
68
|
|
69 ----
|
|
70
|
|
71 Output Sequences:
|
|
72
|
|
73 >query_seq_1
|
|
74
|
|
75 TKTILNYAVLSPCLSPGHVLGC
|
|
76
|
|
77
|
|
78 >query_seq_3
|
|
79
|
|
80 LWGVPRGLCISG
|
|
81
|
|
82 ----
|
|
83
|
|
84 Output Sequence file will have only query_seq_1 and query_seq_3. query_seq_2 is removed because query_seq_2's sequence "LDKDHLELCCTLSLPFSWACSWVLVL" is
|
|
85 present as substring in reference_seq_1's sequence "TSLDKDHLELCCTLSLPFSWACSWVLVLRLSINGQLPRSRLWAAHCLWGVP".
|
|
86
|
|
87 </help>
|
|
88 </tool>
|