| 
0
 | 
     1 #+TITLE:  Sequence Read Simulator
 | 
| 
 | 
     2 #+AUTHOR: Petr Novak
 | 
| 
 | 
     3 
 | 
| 
 | 
     4 Create pseudo short reads from long reads (Illumina Like). 
 | 
| 
 | 
     5 
 | 
| 
 | 
     6 * Requirements
 | 
| 
 | 
     7 - python version > 3.4
 | 
| 
 | 
     8 - biopython
 | 
| 
 | 
     9 
 | 
| 
 | 
    10 * Available tools
 | 
| 
 | 
    11 ** long_reads_sampling
 | 
| 
 | 
    12 #+BEGIN_EXAMPLE
 | 
| 
 | 
    13 
 | 
| 
 | 
    14 usage: long_reads_sampling.py [-h] [-i INPUT] [-o OUTPUT] [-l TOTAL_LENGTH]
 | 
| 
 | 
    15                               [-s SEED]
 | 
| 
 | 
    16 
 | 
| 
 | 
    17 Create sample of long reads, instead of setting number of reads to be sampled,
 | 
| 
 | 
    18 total length of all sampled sequences is defined
 | 
| 
 | 
    19 
 | 
| 
 | 
    20 optional arguments:
 | 
| 
 | 
    21   -h, --help            show this help message and exit
 | 
| 
 | 
    22   -i INPUT, --input INPUT
 | 
| 
 | 
    23                         file with long reads in fasta format (default: None)
 | 
| 
 | 
    24   -o OUTPUT, --output OUTPUT
 | 
| 
 | 
    25                         Output file name (default: None)
 | 
| 
 | 
    26   -l TOTAL_LENGTH, --total_length TOTAL_LENGTH
 | 
| 
 | 
    27                         total length of sampled output (default: None)
 | 
| 
 | 
    28   -s SEED, --seed SEED  random number generator seed (default: 123)
 | 
| 
 | 
    29 #+END_EXAMPLE
 | 
| 
 | 
    30 
 | 
| 
 | 
    31 ** long2short
 | 
| 
 | 
    32 #+BEGIN_EXAMPLE
 | 
| 
 | 
    33 usage: long2short.py [-h] [-i INPUT] [-o OUTPUT] [-cov COVERAGE]
 | 
| 
 | 
    34                      [-L INSERT_LENGTH] [-l READ_LENGTH]
 | 
| 
 | 
    35 
 | 
| 
 | 
    36 Creates pseudo short reads from long oxford nanopore reads
 | 
| 
 | 
    37 
 | 
| 
 | 
    38 optional arguments:
 | 
| 
 | 
    39   -h, --help            show this help message and exit
 | 
| 
 | 
    40   -i INPUT, --input INPUT
 | 
| 
 | 
    41                         file with long reads in fasta format (default: None)
 | 
| 
 | 
    42   -o OUTPUT, --output OUTPUT
 | 
| 
 | 
    43                         Output file name (default: None)
 | 
| 
 | 
    44   -cov COVERAGE, --coverage COVERAGE
 | 
| 
 | 
    45                         samplig coverage (default: 0.1)
 | 
| 
 | 
    46   -L INSERT_LENGTH, --insert_length INSERT_LENGTH
 | 
| 
 | 
    47                         length of insert, must be longer than read length
 | 
| 
 | 
    48                         (default: 600)
 | 
| 
 | 
    49   -l READ_LENGTH, --read_length READ_LENGTH
 | 
| 
 | 
    50                         read length (default: 100)
 | 
| 
 | 
    51 
 | 
| 
 | 
    52 #+END_EXAMPLE
 | 
| 
 | 
    53 resulting reads in fasta format has names which include following information:
 | 
| 
 | 
    54  - original long read name index
 | 
| 
 | 
    55  - position of pseudo forward read in long reads
 | 
| 
 | 
    56 forward a reverse reads are interlaced a reverse reads are reverse complement of original long sequence
 | 
| 
 | 
    57 example outut:
 | 
| 
 | 
    58 #+BEGIN_EXAMPLE
 | 
| 
 | 
    59 >1_1_101_f
 | 
| 
 | 
    60 TGGTACTTGCGGTTACGTATTGCTAGCTAGTCTCCATTTGTCCGTTGGTCTTAGGTGATT
 | 
| 
 | 
    61 TTCCAAGCTTTGTGTGTAAATGTAAGGATCCTCATTTGTA
 | 
| 
 | 
    62 >1_1_101_r
 | 
| 
 | 
    63 GTTTTGTTATCGTGATCCACAGATCAGAAGATATCGCCGCTCACCTGTCAATTAATCTTA
 | 
| 
 | 
    64 ACTTAATGTACACTAGGGTTTTGGTTTTAACTGCTATCTT
 | 
| 
 | 
    65 >1_2001_2101_f
 | 
| 
 | 
    66 CTGAGTTGGGCAACATAGCCGACAAATTTGAACAATAAGCCGGTCCAGCCTTCTTTCTCA
 | 
| 
 | 
    67 GCTGATACATGAAACAAATCAAAGGAGCATTGTAAAGGCG
 | 
| 
 | 
    68 >1_2001_2101_r
 | 
| 
 | 
    69 TTTTGAATGATGGCACTACCGTGATCAAGGACGATGGTCTCCGTTCACTCGCTTTTGTTG
 | 
| 
 | 
    70 TACGTTCTCTATGAACTTGGTTTCTTTGCATTCGGTTCTT
 | 
| 
 | 
    71 >1_4001_4101_f
 | 
| 
 | 
    72 GAAGTTGAAGGAACATTTGGAAAGGTGTGTGAAGACTAATTTGGTCT
 | 
| 
 | 
    73 #+END_EXAMPLE
 |