annotate MIRUReader/README.md @ 0:f0e3646a4e45 draft

Uploaded
author dcouvin
date Tue, 17 Aug 2021 19:15:15 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
1 # MIRUReader
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
2
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
3 ## Description
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
4
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
5 Identify 24-locus MIRU-VNTR for *Mycobacterium tuberculosis* complex (MTBC) directly from long reads generated by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). Also work on assembled genome.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
6
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
7 ## Requirements
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
8
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
9 * Linux
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
10 * primersearch from [EMBOSS](http://emboss.sourceforge.net/download/)
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
11 * install from the official website or
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
12 * install via conda `conda install -c bioconda emboss`
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
13 * Ensure the primersearch command is in your device's environment path, where primersearch program can be executed directly by typing `primersearch` on the commandline
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
14 * [*pandas*](https://pandas.pydata.org/)
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
15 * can be installed via conda `conda install pandas` or via PyPI `pip install pandas`
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
16 * [*statistics*](https://pypi.org/project/statistics/)
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
17 * can be installed via PyPI `pip install statistics`
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
18
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
19 ## Installation
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
20
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
21 `git clone https://github.com/phglab/MIRUReader.git`
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
22
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
23 ## Change log
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
24 #### 13/09/2019
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
25 - Added a check to ensure primersearch is executable prior to MIRUReader program execution
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
26 - Updated documentation to the README
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
27
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
28 #### 04/07/2019
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
29 - Update output format for option '--details'.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
30
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
31 #### 14/06/2019
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
32 - Auto convert fastq to fasta.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
33
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
34 ## Usage example
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
35
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
36 For one sample analysis:
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
37 ```
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
38 python /your/path/to/MIRUReader.py -r sample.fasta -p sampleID > miru.txt
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
39 ```
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
40
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
41 For multiple samples analysis:
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
42 1. Create a mapping file (mappingFile.txt) that looks like:
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
43
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
44 sample_001.fasta sample_001 \
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
45 sample_002.fasta sample_002 \
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
46 ...
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
47
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
48 2. Then run the program:
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
49 ```
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
50 cat mappingFile.txt | while read -a line; do python /your/path/to/MIRUReader.py -r ${line[0]} -p ${line[1]}; done > miru.multiple.txt
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
51 ```
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
52
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
53 ## Output example
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
54
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
55 ```
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
56 sample_prefix 0154 0424 0577 0580 0802 0960 1644 1955 2059 2163b 2165 2347 2401 2461 2531 2687 2996 3007 3171 3192 3690 4052 4156 4348
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
57 sample_001 2 4 4 2 3 3 3 2 2 5 4 4 4 2 5 1 6 3 3 5 3 7 2 3
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
58 ```
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
59
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
60 Notes:
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
61 * The program is compatible to Python 2 and Python 3.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
62 * Accepted reads file format includes '.fastq', '.fastq.gz', '.fasta', and '.fasta.gz'.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
63 * The program output is a tab-delimited plain text which can be copied to or opened in Excel spreadsheet.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
64
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
65 ## Full usage
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
66
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
67 | Main options | Description |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
68 | ------------ | ----------- |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
69 | -r READS | Input reads file in fastq/fasta format, can be gzipped or not gzipped |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
70 | -p PREFIX | Sample ID required for naming output file. |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
71 | --table TABLE | Allele calling table, default is MIRU_table. Can be user-defined in fixed format. However, providing custom allele calling table for other VNTR is not tested. |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
72 | --primers PRIMERS | Primers sequences, default is MIRU_primers. Can be user-defined in fixed format. |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
73
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
74
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
75 | Optional options | Description |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
76 | ---------------- | ----------- |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
77 | --amplicons | Use output from primersearch ("prefix.18.primersearch.out") and summarize MIRU profile directly. |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
78 | --details | This option is for further inspection. It displays details of repeat count for each loci with total mismatch error in the primer sequences alignment. |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
79 | --nofasta | Delete fasta file generated if your input read is in fastq format. |
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
80
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
81 ## FAQ
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
82 1. **Why are there two MIRU allele calling tables (MIRU_table and MIRU_table_0580)?**
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
83
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
84 MIRU loci 0580 (MIRU_table_0580) consist of a different numbering system for determination of repeat numbers as compared to the other 23 MIRU locus (MIRU_table) for MTBC isolates.
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
85
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
86
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
87 ## Troubleshooting
f0e3646a4e45 Uploaded
dcouvin
parents:
diff changeset
88 1. If an error message `OSError: primersearch is not found.` appears, please ensure your `primersearch` executable file is in your environment path (`echo $PATH`) and can be called directly.