annotate README.org @ 8:2557cad81607 draft

Uploaded
author petr-novak
date Wed, 03 May 2023 11:12:41 +0000
parents 4ea506b39297
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
1 #+TITLE: RepeatExplorer based Assembly Annotation Pipeline
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
2
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
3 * Tools in repository
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
4 ** Extract Repeat Library from RepeatExplorer Archive
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
5 (=extract_re_contigs.xml=)
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
6
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
7 This toll will extract library of repeats based on RepeatExplorer2 analysis. Library is available as fasta file. Tool also filter out all the contig parts which has read depth and length below threshold. Parts of contigs with read depth below threshold are hardmasker. Contigs with full hardmasking are removed completelly
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
8
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
9 ** Format repeat library
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
10 (=format_repeat_library.xml=)
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
11
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
12 This tool append classification of repeats to library of repeats. Type of repeat is then part of sequence name in format:
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
13
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
14 ~>sequence_id#classification_level1/classification_level2/...~ this enable to specify classification hierarchy
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
15 Classification of sequneces in library is provided using =CLUSTER_TABLE.csv= (part of RE2 output)
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
16
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
17 This file can then be used for annotation of repeat in your assembly:
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
18 ** Repeat Annotation
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
19 (=repeat_annotate_custom.xml=)
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
20
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
21 Internally annotation is performed using RepeatMasker search. Output from RepeatMasker is parsed to remove duplicated and overlaping annotations, Conflicts in annotations are resolved using hierarchical classification of repeats provided in custom database.
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
22 ** TODO Summarize Annotation
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
23 This tool will create summary table from GFF annotation.
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
24 * test data
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
25
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
26 - ~test_assembly_1.fasta~ with ~test_db_1_satellites.fasta~ (include CLASS followed by double underscore - syntax 1)
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
27 - ~test_assembly_2.fasta~ with ~test_db_2_RE_repeats.fasta~ (include full hierarchical classification)
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
28
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
29
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
30
ea6a3059a6af Uploaded
petr-novak
parents:
diff changeset
31 #+begin_comment
3
4ea506b39297 "planemo upload"
petr-novak
parents: 2
diff changeset
32 # create tarball for toolshed:
2
7f1032da7a0a Uploaded
petr-novak
parents: 1
diff changeset
33 tar -czvf ../repeat_annotation_pipeline.tar.gz --exclude test_data \
7f1032da7a0a Uploaded
petr-novak
parents: 1
diff changeset
34 --exclude .git --exclude tmp --exclude hg_repository --exclude .idea --exclude .gitignore .
1
814cba36e435 Uploaded
mvdbeek
parents: 0
diff changeset
35 #+end_comment
3
4ea506b39297 "planemo upload"
petr-novak
parents: 2
diff changeset
36
4ea506b39297 "planemo upload"
petr-novak
parents: 2
diff changeset
37
4ea506b39297 "planemo upload"
petr-novak
parents: 2
diff changeset
38
4ea506b39297 "planemo upload"
petr-novak
parents: 2
diff changeset
39 - TODO RM take only short name of sequences - validate name / adjust