Mercurial > repos > iss > eurl_vtec_wgs_pt
view scripts/ReMatCh/utils/README.md @ 6:20ff3dca457f draft default tip
planemo upload commit 6857c749c21f580c828aba3543e294b69d32b662
| author | iss | 
|---|---|
| date | Mon, 23 Oct 2023 11:45:36 +0000 | 
| parents | c6bab5103a14 | 
| children | 
line wrap: on
 line source
ReMatCh ======= *Reads mapping against target sequences, checking mapping and consensus sequences production* <https://github.com/B-UMMI/ReMatCh> Table of Contents -- [Combine alignment consensus](#combine-alignment-consensus) [Convert Ns to gaps](#convert-ns-to-gaps) [gffParser](#gffparser) [Restart ReMatCh](#restart-rematch) [Strip Alignment](#strip-alignment) ## Combine Alignment Consensus Combine the alignment consensus sequences from ReMatCh first run by reference sequences into single files. **Dependencies** - Python v3 **Usage** usage: combine_alignment_consensus.py [-h] [--version] -w /path/to/rematch/working/directory/ [-o /path/to/output/directory/] Combine the alignment consensus sequences from ReMatCh first run by reference sequences into single files optional arguments: -h, --help show this help message and exit --version Version information Required options: -w /path/to/rematch/working/directory/ --workdir /path/to/rematch/working/directory/ Path to the directory where ReMatCh was running (default: None) General facultative options: -o --outdir /path/to/output/directory/ Path to the directory where the combined sequence files will stored (default: .) ## Convert Ns to Gaps Convert the Ns into gaps in a fasta file. **Dependencies** - Python (2.7.x) **Usage** usage: convert_Ns_to_gaps.py [-h] [--version] -i /path/to/input/file.fasta -o /path/to/converted/output/file.fasta Convert the Ns into gaps optional arguments: -h, --help show this help message and exit --version Version information Required options: -i --infile /path/to/input/file.fasta Path to the fasta file (default: None) -o --outfile /path/to/converted/output/file.fasta Converted output fasta file (default: converted_Ns_to_gaps.fasta) ## gffParser Parser for GFF3 files, as the ones obtained by [PROKKA](https://github.com/tseemann/prokka). This files require to have both the features and sequence. It will retrieve the CDS sequences in the GFF file, allowing these to be extended by the number of nucleotides specifiend in `--extraSeq`. A selection of CDS of interest to be parsed can also be obtained by providing `--select` with a txt file of the IDs of interest, one per line. As an alternative, wanted sequences can be obtained from the GFF file from a txt file containing the coontig ID, start and end position (one per line) of the sequences of interest, using the `-fromFile` option. `-extraSeq` can also be obtain through this method. **Dependencies** - Python (2.7.x) - [Biopython](http://biopython.org/) (1.68 or similar) **Usage** usage: gffParser.py [-h] -i INPUT [-x EXTRASEQ] [-k] [-o OUTPUTDIR] [-s SELECT] [-f FROMFILE] [--version] GFF3 parser for feature sequence retrival, containing both sequences and annotations. optional arguments: -h, --help Show this help message and exit -i --input INPUT GFF3 file to parse, containing both sequences and annotations (like the one obtained from PROKKA). -x --extraSeq EXTRASEQ Extra sequence to retrieve per feature in gff. -k, --keepTemporaryFiles Keep temporary gff(without sequence) and fasta files. -o --outputDir OUTPUTDIR Path to where the output is to be saved. -s --select SELECT txt file with the IDs of interest, one per line -f --fromFile FROMFILE Sequence coordinates to be retrieved. Requires contig ID and coords (contig,strart,end) in a csv file, one per line. --version Display version, and exit. **Output** *<filename>.fasta* Multi-fasta file with the retrieved sequences. Headers will contain the feature ID, followed by '=', and the position of that feature in the sequence, starting with the original sequence ID,a '# and' the start and end coordinates separated with '_' (>featureID=contig#start_end). If the `--fromFile` option is used, there's no feature ID, so the header will only contain it's position in the original sequence, followed by the start and end coordinates separated with '_' (>contig#start_end). *<filename>.txt* Feature ID of the sequences that failed to be retireved, due to the start position or end position being outside of the sequence where the feature is (due to the `--extraSeq` option). ## Restart ReMatCh Restart a ReMatCh run abruptly terminated **Dependencies** - Python (2.7.x) **Usage** usage: restart_rematch.py [-h] [--version] -i /path/to/initial/workdir/directory/ [-w /path/to/workdir/directory/] [-j N] [--runFailedSamples] Restart a ReMatCh run abruptly terminated optional arguments: -h, --help show this help message and exit --version Version information Required options: -i /path/to/initial/workdir/directory/, --initialWorkdir /path/to/initial/workdir/directory/ Path to the directory where ReMatCh was running (default: None) General facultative options: -w, --workdir /path/to/workdir/directory/ Path to the directory where ReMatCh will run again (default: .) -j N, --threads N Number of threads to use instead of the ones set in initial ReMatCh run (default: None) --runFailedSamples Will run ReMatCh for those samples missing, as well as for samples that did not run successfully in initial ReMatCh run (default: False) ## Strip Alignment Strip alignment positions containing gaps, missing data and invariable positions. **Dependencies** - Python (2.7.x) - [Biopython](http://biopython.org/) (1.68 or similar) **Usage** usage: strip_alignment.py [-h] [--version] -i /path/to/aligned/input/file.fasta -o /path/to/stripped/output/file.fasta [--notGAPs] [--notMissing] [--notInvariable] Strip alignment positions containing gaps, missing data and invariable positions optional arguments: -h, --help show this help message and exit --version Version information Required options: -i, --infile /path/to/aligned/input/file.fasta Path to the aligned fasta file (default: None) -o, --outfile /path/to/stripped/output/file.fasta Stripped output fasta file (default: alignment_stripped.fasta) General facultative options: --notGAPs Not strip positions with GAPs (default: False) --notMissing Not strip positions with missing data (default: False) --notInvariable Not strip invariable sites (default: False)
