| 0 | 1 #+TITLE:  Sequence Read Simulator | 
|  | 2 #+AUTHOR: Petr Novak | 
|  | 3 | 
|  | 4 Create pseudo short reads from long reads (Illumina Like). | 
|  | 5 | 
|  | 6 * Requirements | 
|  | 7 - python version > 3.4 | 
|  | 8 - biopython | 
|  | 9 | 
|  | 10 * Available tools | 
|  | 11 ** long_reads_sampling | 
|  | 12 #+BEGIN_EXAMPLE | 
|  | 13 | 
|  | 14 usage: long_reads_sampling.py [-h] [-i INPUT] [-o OUTPUT] [-l TOTAL_LENGTH] | 
|  | 15                               [-s SEED] | 
|  | 16 | 
|  | 17 Create sample of long reads, instead of setting number of reads to be sampled, | 
|  | 18 total length of all sampled sequences is defined | 
|  | 19 | 
|  | 20 optional arguments: | 
|  | 21   -h, --help            show this help message and exit | 
|  | 22   -i INPUT, --input INPUT | 
|  | 23                         file with long reads in fasta format (default: None) | 
|  | 24   -o OUTPUT, --output OUTPUT | 
|  | 25                         Output file name (default: None) | 
|  | 26   -l TOTAL_LENGTH, --total_length TOTAL_LENGTH | 
|  | 27                         total length of sampled output (default: None) | 
|  | 28   -s SEED, --seed SEED  random number generator seed (default: 123) | 
|  | 29 #+END_EXAMPLE | 
|  | 30 | 
|  | 31 ** long2short | 
|  | 32 #+BEGIN_EXAMPLE | 
|  | 33 usage: long2short.py [-h] [-i INPUT] [-o OUTPUT] [-cov COVERAGE] | 
|  | 34                      [-L INSERT_LENGTH] [-l READ_LENGTH] | 
|  | 35 | 
|  | 36 Creates pseudo short reads from long oxford nanopore reads | 
|  | 37 | 
|  | 38 optional arguments: | 
|  | 39   -h, --help            show this help message and exit | 
|  | 40   -i INPUT, --input INPUT | 
|  | 41                         file with long reads in fasta format (default: None) | 
|  | 42   -o OUTPUT, --output OUTPUT | 
|  | 43                         Output file name (default: None) | 
|  | 44   -cov COVERAGE, --coverage COVERAGE | 
|  | 45                         samplig coverage (default: 0.1) | 
|  | 46   -L INSERT_LENGTH, --insert_length INSERT_LENGTH | 
|  | 47                         length of insert, must be longer than read length | 
|  | 48                         (default: 600) | 
|  | 49   -l READ_LENGTH, --read_length READ_LENGTH | 
|  | 50                         read length (default: 100) | 
|  | 51 | 
|  | 52 #+END_EXAMPLE | 
|  | 53 resulting reads in fasta format has names which include following information: | 
|  | 54  - original long read name index | 
|  | 55  - position of pseudo forward read in long reads | 
|  | 56 forward a reverse reads are interlaced a reverse reads are reverse complement of original long sequence | 
|  | 57 example outut: | 
|  | 58 #+BEGIN_EXAMPLE | 
|  | 59 >1_1_101_f | 
|  | 60 TGGTACTTGCGGTTACGTATTGCTAGCTAGTCTCCATTTGTCCGTTGGTCTTAGGTGATT | 
|  | 61 TTCCAAGCTTTGTGTGTAAATGTAAGGATCCTCATTTGTA | 
|  | 62 >1_1_101_r | 
|  | 63 GTTTTGTTATCGTGATCCACAGATCAGAAGATATCGCCGCTCACCTGTCAATTAATCTTA | 
|  | 64 ACTTAATGTACACTAGGGTTTTGGTTTTAACTGCTATCTT | 
|  | 65 >1_2001_2101_f | 
|  | 66 CTGAGTTGGGCAACATAGCCGACAAATTTGAACAATAAGCCGGTCCAGCCTTCTTTCTCA | 
|  | 67 GCTGATACATGAAACAAATCAAAGGAGCATTGTAAAGGCG | 
|  | 68 >1_2001_2101_r | 
|  | 69 TTTTGAATGATGGCACTACCGTGATCAAGGACGATGGTCTCCGTTCACTCGCTTTTGTTG | 
|  | 70 TACGTTCTCTATGAACTTGGTTTCTTTGCATTCGGTTCTT | 
|  | 71 >1_4001_4101_f | 
|  | 72 GAAGTTGAAGGAACATTTGGAAAGGTGTGTGAAGACTAATTTGGTCT | 
|  | 73 #+END_EXAMPLE |