comparison scripts/ReMatCh/utils/README.md @ 0:965517909457 draft

planemo upload commit 15239f1674081ab51ab8dd75a9a40cf1bfaa93e8
author cstrittmatter
date Wed, 22 Jan 2020 08:41:44 -0500
parents
children 0cbed1c0a762
comparison
equal deleted inserted replaced
-1:000000000000 0:965517909457
1 ReMatCh
2 =======
3 *Reads mapping against target sequences, checking mapping and consensus sequences production*
4
5 <https://github.com/B-UMMI/ReMatCh>
6
7 Table of Contents
8 --
9
10 [Combine alignment consensus](#combine-alignment-consensus)
11 [Convert Ns to gaps](#convert-ns-to-gaps)
12 [gffParser](#gffparser)
13 [Restart ReMatCh](#restart-rematch)
14 [Strip Alignment](#strip-alignment)
15
16
17 ## Combine Alignment Consensus
18
19 Combine the alignment consensus sequences from ReMatCh first run by reference sequences into single files.
20
21 **Dependencies**
22 - Python (2.7.x)
23
24 **Usage**
25
26 usage: combine_alignment_consensus.py [-h] [--version]
27 -w /path/to/rematch/working/directory/
28 [-o /path/to/output/directory/]
29
30 Combine the alignment consensus sequences from ReMatCh first run by reference sequences into single files
31
32 optional arguments:
33 -h, --help show this help message and exit
34 --version Version information
35
36 Required options:
37 -w /path/to/rematch/working/directory/
38 --workdir /path/to/rematch/working/directory/ Path to the directory where ReMatCh was running (default: None)
39
40 General facultative options:
41 -o --outdir /path/to/output/directory/ Path to the directory where the combined sequence files will stored (default: .)
42
43
44
45 ## Convert Ns to Gaps
46
47
48 Convert the Ns into gaps in a fasta file.
49
50 **Dependencies**
51 - Python (2.7.x)
52
53 **Usage**
54
55 usage: convert_Ns_to_gaps.py [-h] [--version]
56 -i /path/to/input/file.fasta
57 -o /path/to/converted/output/file.fasta
58
59 Convert the Ns into gaps
60
61 optional arguments:
62 -h, --help show this help message and exit
63 --version Version information
64
65 Required options:
66 -i --infile /path/to/input/file.fasta Path to the fasta file (default: None)
67 -o --outfile /path/to/converted/output/file.fasta Converted output fasta file (default: converted_Ns_to_gaps.fasta)
68
69
70
71 ## gffParser
72
73
74 Parser for GFF3 files, as the ones obtained by [PROKKA](https://github.com/tseemann/prokka). This files require to have both the features and sequence. It will retrieve the CDS sequences in the GFF file, allowing these to be extended by the number of nucleotides specifiend in `--extraSeq`. A selection of CDS of interest to be parsed can also be obtained by providing `--select` with a txt file of the IDs of interest, one per line. As an alternative, wanted sequences can be obtained from the GFF file from a txt file containing the coontig ID, start and end position (one per line) of the sequences of interest, using the `-fromFile` option. `-extraSeq` can also be obtain through this method.
75
76 **Dependencies**
77 - Python (2.7.x)
78 - [Biopython](http://biopython.org/) (1.68 or similar)
79
80 **Usage**
81
82 usage: gffParser.py [-h]
83 -i INPUT [-x EXTRASEQ] [-k] [-o OUTPUTDIR]
84 [-s SELECT] [-f FROMFILE] [--version]
85
86 GFF3 parser for feature sequence retrival, containing both sequences and annotations.
87
88 optional arguments:
89 -h, --help Show this help message and exit
90 -i --input INPUT
91 GFF3 file to parse, containing both sequences and annotations (like the one obtained from PROKKA).
92 -x --extraSeq EXTRASEQ
93 Extra sequence to retrieve per feature in gff.
94 -k, --keepTemporaryFiles
95 Keep temporary gff(without sequence) and fasta files.
96 -o --outputDir OUTPUTDIR
97 Path to where the output is to be saved.
98 -s --select SELECT
99 txt file with the IDs of interest, one per line
100 -f --fromFile FROMFILE
101 Sequence coordinates to be retrieved. Requires contig ID and coords (contig,strart,end) in a csv file, one per line.
102 --version Display version, and exit.
103
104 **Output**
105
106 *<filename>.fasta*
107 Multi-fasta file with the retrieved sequences.
108 Headers will contain the feature ID, followed by '=', and the position of that feature in the sequence, starting with the original sequence ID,a '# and' the start and end coordinates separated with '_' (>featureID=contig#start_end).
109 If the `--fromFile` option is used, there's no feature ID, so the header will only contain it's position in the original sequence, followed by the start and end coordinates separated with '_' (>contig#start_end).
110
111 *<filename>.txt*
112 Feature ID of the sequences that failed to be retireved, due to the start position or end position being outside of the sequence where the feature is (due to the `--extraSeq` option).
113
114
115
116 ## Restart ReMatCh
117
118
119 Restart a ReMatCh run abruptly terminated
120
121 **Dependencies**
122 - Python (2.7.x)
123
124 **Usage**
125
126 usage: restart_rematch.py [-h] [--version] -i
127 /path/to/initial/workdir/directory/
128 [-w /path/to/workdir/directory/] [-j N]
129 [--runFailedSamples]
130
131 Restart a ReMatCh run abruptly terminated
132
133 optional arguments:
134 -h, --help show this help message and exit
135 --version Version information
136
137 Required options:
138 -i /path/to/initial/workdir/directory/, --initialWorkdir /path/to/initial/workdir/directory/
139 Path to the directory where ReMatCh was running (default: None)
140
141 General facultative options:
142 -w, --workdir /path/to/workdir/directory/
143 Path to the directory where ReMatCh will run again (default: .)
144 -j N, --threads N
145 Number of threads to use instead of the ones set in initial ReMatCh run (default: None)
146 --runFailedSamples
147 Will run ReMatCh for those samples missing, as well as for samples that did not run successfully in initial ReMatCh run (default: False)
148
149
150
151 ## Strip Alignment
152
153
154 Strip alignment positions containing gaps,
155 missing data and invariable positions.
156
157 **Dependencies**
158 - Python (2.7.x)
159 - [Biopython](http://biopython.org/) (1.68 or similar)
160
161 **Usage**
162
163 usage: strip_alignment.py [-h] [--version]
164 -i /path/to/aligned/input/file.fasta -o /path/to/stripped/output/file.fasta [--notGAPs]
165 [--notMissing] [--notInvariable]
166
167 Strip alignment positions containing gaps, missing data and invariable positions
168
169 optional arguments:
170 -h, --help show this help message and exit
171 --version Version information
172
173 Required options:
174 -i, --infile /path/to/aligned/input/file.fasta
175 Path to the aligned fasta file (default: None)
176 -o, --outfile /path/to/stripped/output/file.fasta
177 Stripped output fasta file (default: alignment_stripped.fasta)
178
179 General facultative options:
180 --notGAPs Not strip positions with GAPs (default: False)
181 --notMissing Not strip positions with missing data (default: False)
182 --notInvariable
183 Not strip invariable sites (default: False)