Mercurial > repos > petr-novak > dante_ltr
annotate README.md @ 9:1aa578e6c8b3 draft
"planemo upload commit 9488b982bae902f1868785ec4ad47134dac50ff3"
author | petr-novak |
---|---|
date | Wed, 29 Jun 2022 09:25:54 +0000 |
parents | 9de392f2fc02 |
children | ff01d4263391 |
rev | line source |
---|---|
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
1 # DANTE_LTR |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
2 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
3 Tool for identifying complete LTR retrotransposons based on analysis of protein domains identified with the [DANTE tool](https://github.com/kavonrtep/dante). Both DANTE and DANTE_LTR are available on [Galaxy server](ttps://repeatexplorer-elixir.cerit-sc.cz/). |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
4 |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
5 ## Principle of DANTE _LTR |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
6 Complete retrotransposons are identified as clusters of protein domains recognized by the DANTE tool. The domains in the clusters must be assigned to a single retrotransposon lineage by DANTE. In addition, the orientation and order of the protein domains, as well as the distances between them, must conform to the characteristics of elements from REXXdb database [Neumann et al. (2019)](https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-018-0144-1). |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
7 In the next step, the 5' and 3' regions of the putative retrotransposon are examined for the presence of 5' and 3' long terminal repeats. If 5'- and 3'-long terminal repeats are detected, detection of target site duplication (TSD) and primer binding site (PSB) is performed. The detected LTR retrotranspsons are classified into 5 categories: |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
8 - Elements with protein domains, 5'LTR, 3'LTR, TSD and PBS - rank **DLTP**. |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
9 - Elements with protein domains, 5'LTR, 3'LTR, and PBS (TSD was not found) Rank **DLP** |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
10 - Elements with protein domains, 5' LTR, 3'LTR, TSD (PBS was not found) - rank **DTL** |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
11 - Elements with protein domains, 5'LTR and 3'LTR (PBS and TDS were not found) - rank **DL** |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
12 - Elements as clusters of protein domains with the same classification, no LTRs - rank **D**. |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
13 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
14 ![dante_ltr_workflow.png](dante_ltr_workflow.png) |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
15 |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
16 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
17 ## Installation: |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
18 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
19 ```shell |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
20 conda create -n dante_ltr -c bioconda -c conda-forge -c petrnovak dante_ltr |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
21 ``` |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
22 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
23 ## Input data |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
24 One input is a reference sequence in fasta fromat. The second input is an annotation of the reference genome using the tool DANTE in GFF3 format. For better results, use the unfiltered full output of the DANTE pipeline. |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
25 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
26 |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
27 ## Usage |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
28 |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
29 ### Detection of complete LTR retrotransposons |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
30 |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
31 ```shell |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
32 Usage: ./extract_putative_ltr.R COMMAND [OPTIONS] |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
33 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
34 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
35 Options: |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
36 -g GFF3, --gff3=GFF3 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
37 gff3 with dante results |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
38 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
39 -s REFERENCE_SEQUENCE, --reference_sequence=REFERENCE_SEQUENCE |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
40 reference sequence as fasta |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
41 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
42 -o OUTPUT, --output=OUTPUT |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
43 output file path and prefix |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
44 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
45 -c NUMBER, --cpu=NUMBER |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
46 Number of cpu to use [default 5] |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
47 |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
48 -M NUMBER, --max_missing_domains=NUMBER |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
49 Maximum number of missing domains is retrotransposon [default 0] |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
50 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
51 -L NUMBER, --min_relative_length=NUMBER |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
52 Minimum relative length of protein domain to be considered for retrostransposon detection [default 0.6] |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
53 -h, --help |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
54 Show this help message and exit |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
55 |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
56 ``` |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
57 |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
58 #### Example: |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
59 |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
60 ```shell |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
61 mkdir -p tmp |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
62 ./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
63 ``` |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
64 |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
65 #### Files in the output of `extract_putative_ltr.R`: |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
66 |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
67 - `prefix.gff3` - annotation of all identified elements |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
68 - `prefix_D.fasta` - partial elements with protein **d**omains |
0
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
69 - `prefix_DL.fasta` - elements with protein **d**omains and **L**TR |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD |
7b0bbe7477c4
"planemo upload commit 92c684dff3b377c8c08654c7f3d46a133385e3e0-dirty"
petr-novak
parents:
diff
changeset
|
73 - `prefix_statistics.csv` - number of elements in individual categories |
8
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
74 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
75 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
76 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
77 ### Validation of LTR retrotransposons detected un previous step: |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
78 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
79 ```shell |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
80 ./clean_ltr.R --help |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
81 Usage: ./clean_ltr.R COMMAND [OPTIONS] |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
82 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
83 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
84 Options: |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
85 -g GFF3, --gff3=GFF3 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
86 gff3 with LTR Transposable elements |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
87 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
88 -s REFERENCE_SEQUENCE, --reference_sequence=REFERENCE_SEQUENCE |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
89 reference sequence as fasta |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
90 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
91 -o OUTPUT, --output=OUTPUT |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
92 output file prefix |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
93 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
94 -c NUMBER, --cpu=NUMBER |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
95 Number of cpu to use [default 5] |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
96 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
97 -h, --help |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
98 Show this help message and exit |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
99 ``` |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
100 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
101 This script check for potentially chimeric elements and removes them from GFF3 file. |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
102 |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
103 #### Example |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
104 ```shell |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
105 ./clean_ltr.R -g test_data/sample_DANTE_LTR_annotation.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation_clean |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
106 ``` |
9de392f2fc02
"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
petr-novak
parents:
0
diff
changeset
|
107 |