Mercurial > repos > dcouvin > mirureader
diff MIRUReader/README.md @ 0:f0e3646a4e45 draft
Uploaded
author | dcouvin |
---|---|
date | Tue, 17 Aug 2021 19:15:15 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/MIRUReader/README.md Tue Aug 17 19:15:15 2021 +0000 @@ -0,0 +1,88 @@ +# MIRUReader + +## Description + +Identify 24-locus MIRU-VNTR for *Mycobacterium tuberculosis* complex (MTBC) directly from long reads generated by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). Also work on assembled genome. + +## Requirements + +* Linux +* primersearch from [EMBOSS](http://emboss.sourceforge.net/download/) + * install from the official website or + * install via conda `conda install -c bioconda emboss` + * Ensure the primersearch command is in your device's environment path, where primersearch program can be executed directly by typing `primersearch` on the commandline +* [*pandas*](https://pandas.pydata.org/) + * can be installed via conda `conda install pandas` or via PyPI `pip install pandas` +* [*statistics*](https://pypi.org/project/statistics/) + * can be installed via PyPI `pip install statistics` + +## Installation + +`git clone https://github.com/phglab/MIRUReader.git` + +## Change log +#### 13/09/2019 +- Added a check to ensure primersearch is executable prior to MIRUReader program execution +- Updated documentation to the README + +#### 04/07/2019 +- Update output format for option '--details'. + +#### 14/06/2019 +- Auto convert fastq to fasta. + +## Usage example + +For one sample analysis: +``` +python /your/path/to/MIRUReader.py -r sample.fasta -p sampleID > miru.txt +``` + +For multiple samples analysis: +1. Create a mapping file (mappingFile.txt) that looks like: + + sample_001.fasta sample_001 \ + sample_002.fasta sample_002 \ + ... + +2. Then run the program: +``` +cat mappingFile.txt | while read -a line; do python /your/path/to/MIRUReader.py -r ${line[0]} -p ${line[1]}; done > miru.multiple.txt +``` + +## Output example + +``` +sample_prefix 0154 0424 0577 0580 0802 0960 1644 1955 2059 2163b 2165 2347 2401 2461 2531 2687 2996 3007 3171 3192 3690 4052 4156 4348 +sample_001 2 4 4 2 3 3 3 2 2 5 4 4 4 2 5 1 6 3 3 5 3 7 2 3 +``` + +Notes: +* The program is compatible to Python 2 and Python 3. +* Accepted reads file format includes '.fastq', '.fastq.gz', '.fasta', and '.fasta.gz'. +* The program output is a tab-delimited plain text which can be copied to or opened in Excel spreadsheet. + +## Full usage + +| Main options | Description | +| ------------ | ----------- | +| -r READS | Input reads file in fastq/fasta format, can be gzipped or not gzipped | +| -p PREFIX | Sample ID required for naming output file. | +| --table TABLE | Allele calling table, default is MIRU_table. Can be user-defined in fixed format. However, providing custom allele calling table for other VNTR is not tested. | +| --primers PRIMERS | Primers sequences, default is MIRU_primers. Can be user-defined in fixed format. | + + +| Optional options | Description | +| ---------------- | ----------- | +| --amplicons | Use output from primersearch ("prefix.18.primersearch.out") and summarize MIRU profile directly. | +| --details | This option is for further inspection. It displays details of repeat count for each loci with total mismatch error in the primer sequences alignment. | +| --nofasta | Delete fasta file generated if your input read is in fastq format. | + +## FAQ +1. **Why are there two MIRU allele calling tables (MIRU_table and MIRU_table_0580)?** + +MIRU loci 0580 (MIRU_table_0580) consist of a different numbering system for determination of repeat numbers as compared to the other 23 MIRU locus (MIRU_table) for MTBC isolates. + + +## Troubleshooting +1. If an error message `OSError: primersearch is not found.` appears, please ensure your `primersearch` executable file is in your environment path (`echo $PATH`) and can be called directly.