annotate README.md @ 10:a004cd05177d draft

"planemo upload commit ca5a8b9bbf761419a408bce11a17e880d1b1152c"
author petr-novak
date Wed, 13 Jul 2022 10:59:58 +0000
parents 9de392f2fc02
children ff01d4263391
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
1 # DANTE_LTR
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
2
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
3 Tool for identifying complete LTR retrotransposons based on analysis of protein domains identified with the [DANTE tool](https://github.com/kavonrtep/dante). Both DANTE and DANTE_LTR are available on [Galaxy server](ttps://repeatexplorer-elixir.cerit-sc.cz/).
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
4
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
5 ## Principle of DANTE _LTR
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
6 Complete retrotransposons are identified as clusters of protein domains recognized by the DANTE tool. The domains in the clusters must be assigned to a single retrotransposon lineage by DANTE. In addition, the orientation and order of the protein domains, as well as the distances between them, must conform to the characteristics of elements from REXXdb database [Neumann et al. (2019)](https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-018-0144-1).
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
7 In the next step, the 5' and 3' regions of the putative retrotransposon are examined for the presence of 5' and 3' long terminal repeats. If 5'- and 3'-long terminal repeats are detected, detection of target site duplication (TSD) and primer binding site (PSB) is performed. The detected LTR retrotranspsons are classified into 5 categories:
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
8 - Elements with protein domains, 5'LTR, 3'LTR, TSD and PBS - rank **DLTP**.
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
9 - Elements with protein domains, 5'LTR, 3'LTR, and PBS (TSD was not found) Rank **DLP**
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
10 - Elements with protein domains, 5' LTR, 3'LTR, TSD (PBS was not found) - rank **DTL**
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
11 - Elements with protein domains, 5'LTR and 3'LTR (PBS and TDS were not found) - rank **DL**
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
12 - Elements as clusters of protein domains with the same classification, no LTRs - rank **D**.
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
13
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
14 ![dante_ltr_workflow.png](dante_ltr_workflow.png)
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
15
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
16
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
17 ## Installation:
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
18
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
19 ```shell
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
20 conda create -n dante_ltr -c bioconda -c conda-forge -c petrnovak dante_ltr
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
21 ```
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
22
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
23 ## Input data
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
24 One input is a reference sequence in fasta fromat. The second input is an annotation of the reference genome using the tool DANTE in GFF3 format. For better results, use the unfiltered full output of the DANTE pipeline.
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
25
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
26
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
27 ## Usage
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
28
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
29 ### Detection of complete LTR retrotransposons
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
30
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
31 ```shell
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
32 Usage: ./extract_putative_ltr.R COMMAND [OPTIONS]
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
33
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
34
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
35 Options:
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
36 -g GFF3, --gff3=GFF3
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
37 gff3 with dante results
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
38
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
39 -s REFERENCE_SEQUENCE, --reference_sequence=REFERENCE_SEQUENCE
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
40 reference sequence as fasta
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
41
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
42 -o OUTPUT, --output=OUTPUT
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
43 output file path and prefix
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
44
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
45 -c NUMBER, --cpu=NUMBER
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
46 Number of cpu to use [default 5]
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
47
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
48 -M NUMBER, --max_missing_domains=NUMBER
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
49 Maximum number of missing domains is retrotransposon [default 0]
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
50
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
51 -L NUMBER, --min_relative_length=NUMBER
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
52 Minimum relative length of protein domain to be considered for retrostransposon detection [default 0.6]
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
53 -h, --help
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
54 Show this help message and exit
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
55
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
56 ```
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
57
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
58 #### Example:
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
59
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
60 ```shell
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
61 mkdir -p tmp
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
62 ./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
63 ```
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
64
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
65 #### Files in the output of `extract_putative_ltr.R`:
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
66
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
67 - `prefix.gff3` - annotation of all identified elements
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
68 - `prefix_D.fasta` - partial elements with protein **d**omains
0
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
69 - `prefix_DL.fasta` - elements with protein **d**omains and **L**TR
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD
7b0bbe7477c4 "planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff changeset
73 - `prefix_statistics.csv` - number of elements in individual categories
8
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
74
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
75
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
76
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
77 ### Validation of LTR retrotransposons detected un previous step:
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
78
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
79 ```shell
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
80 ./clean_ltr.R --help
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
81 Usage: ./clean_ltr.R COMMAND [OPTIONS]
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
82
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
83
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
84 Options:
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
85 -g GFF3, --gff3=GFF3
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
86 gff3 with LTR Transposable elements
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
87
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
88 -s REFERENCE_SEQUENCE, --reference_sequence=REFERENCE_SEQUENCE
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
89 reference sequence as fasta
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
90
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
91 -o OUTPUT, --output=OUTPUT
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
92 output file prefix
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
93
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
94 -c NUMBER, --cpu=NUMBER
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
95 Number of cpu to use [default 5]
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
96
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
97 -h, --help
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
98 Show this help message and exit
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
99 ```
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
100
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
101 This script check for potentially chimeric elements and removes them from GFF3 file.
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
102
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
103 #### Example
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
104 ```shell
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
105 ./clean_ltr.R -g test_data/sample_DANTE_LTR_annotation.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation_clean
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
106 ```
9de392f2fc02 "planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents: 0
diff changeset
107