comparison README.md @ 12:ff01d4263391 draft

"planemo upload commit 414119ad7c44562d2e956b765e97ca113bc35b2b-dirty"
author petr-novak
date Thu, 21 Jul 2022 08:23:15 +0000
parents 9de392f2fc02
children
comparison
equal deleted inserted replaced
11:54bd36973253 12:ff01d4263391
27 ## Usage 27 ## Usage
28 28
29 ### Detection of complete LTR retrotransposons 29 ### Detection of complete LTR retrotransposons
30 30
31 ```shell 31 ```shell
32 Usage: ./extract_putative_ltr.R COMMAND [OPTIONS] 32 Usage: ./detect_putative_ltr.R COMMAND [OPTIONS]
33 33
34 34
35 Options: 35 Options:
36 -g GFF3, --gff3=GFF3 36 -g GFF3, --gff3=GFF3
37 gff3 with dante results 37 gff3 with dante results
57 57
58 #### Example: 58 #### Example:
59 59
60 ```shell 60 ```shell
61 mkdir -p tmp 61 mkdir -p tmp
62 ./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation 62 ./detect_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation
63 ``` 63 ```
64 64
65 #### Files in the output of `extract_putative_ltr.R`: 65 #### Files in the output of `extract_putative_ltr.R`:
66 66
67 - `prefix.gff3` - annotation of all identified elements 67 - `prefix.gff3` - annotation of all identified elements
70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS 70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS
71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS 71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS
72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD 72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD
73 - `prefix_statistics.csv` - number of elements in individual categories 73 - `prefix_statistics.csv` - number of elements in individual categories
74 74
75 For large genomes, you can your `detect_putative_ltr_wrapper.py`. This script will split input fasta to smaller chunks and run `detect_putative_ltr.R` on each chunk to limit memory usage. Output will be merged after all chunks are processed.
75 76
77 ```shell
78 usage: detect_putative_ltr_wrapper.py [-h] -g GFF3 -s REFERENCE_SEQUENCE -o
79 OUTPUT [-c CPU] [-M MAX_MISSING_DOMAINS]
80 [-L MIN_RELATIVE_LENGTH]
81 [-S MAX_CHUNK_SIZE]
82
83 detect_putative_ltr_wrapper.py is a wrapper for
84 detect_putative_ltr.R
85
86 optional arguments:
87 -h, --help show this help message and exit
88 -g GFF3, --gff3 GFF3 gff3 file
89 -s REFERENCE_SEQUENCE, --reference_sequence REFERENCE_SEQUENCE
90 reference sequence as fasta file
91 -o OUTPUT, --output OUTPUT
92 output file path and prefix
93 -c CPU, --cpu CPU number of CPUs
94 -M MAX_MISSING_DOMAINS, --max_missing_domains MAX_MISSING_DOMAINS
95 -L MIN_RELATIVE_LENGTH, --min_relative_length MIN_RELATIVE_LENGTH
96 Minimum relative length of protein domain to be considered
97 for retrostransposon detection
98 -S MAX_CHUNK_SIZE, --max_chunk_size MAX_CHUNK_SIZE
99 If size of reference sequence is greater than this value,
100 reference is analyzed in chunks of this size. This is
101 just approximate value - sequences which are longer
102 are are not split, default is 100000000
103 ```
76 104
77 ### Validation of LTR retrotransposons detected un previous step: 105 ### Validation of LTR retrotransposons detected un previous step:
78 106
79 ```shell 107 ```shell
80 ./clean_ltr.R --help 108 ./clean_ltr.R --help