annotate lib/tarean_output_help.org @ 0:1d1b9e1b2e2f draft

Uploaded
author petr-novak
date Thu, 19 Dec 2019 10:24:45 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
1 #+TITLE: TAREAN output description
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
2 #+HTML_HEAD_EXTRA: <link rel="stylesheet" type="text/css" href="style1.css" />
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
3 #+LANGUAGE: en
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
4
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
5 * Introduction
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
6 TAREAN output includes *HTML report* with list of all analyzed clusters; the clusters are classified into five categories:
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
7 + high confidence satellites
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
8 + low confidence satellites
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
9 + potential LTR elements
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
10 + rDNA
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
11 + other clusters
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
12 Each cluster for which consensus sequences was reconstructed has also its own detailed report, linked to the main report.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
13
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
14 * Main HTML report
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
15 This report contains basic information about all clusters larger than specified threshold (default value is 0.01% of analyzed reads)
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
16 ** Table legend
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
17 + Cluster :: Cluster identifier
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
18 + Genome Proportion[%] :: /(Number of sequences in cluster/Number of sequences in clustering) x 100%/
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
19 + Size :: Number of reads in the cluster
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
20 + Satellite probability :: Empirical probability estimate that cluster sequences
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
21 are derived from satellite repeat. This estimate is based on analysis of more
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
22 than xxx clusters including yyy manually anotated and zzz experimentaly
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
23 validated satellite repeats
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
24 + Consensus :: Consensus sequence is outcome of kmer-based
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
25 analysis and represents the most probable satellite monomer
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
26 sequence
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
27 + Kmer analysis ::
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
28 link to analysis report for individual clusters
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
29 + Graph layout :: Graph-based visualization of similarities among sequence
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
30 reads
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
31 + Connected component index :: Proportion of nodes of the graph which are part
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
32 of the the largest strongly connected component
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
33 + Pair completeness index :: Proportion of reads with available
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
34 mate-pair within the same cluster
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
35 + Kmer coverage :: Sum of relative frequencies of all kmers used for consensus
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
36 sequence reconstruction
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
37 + |V| :: Number of vertices of the graph
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
38 + |E| :: Number of edges of the graph
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
39 + PBS score :: Primer binding site detection score
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
40 + The longest ORF length :: Length of the longest open reading frame found in
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
41 any of the possible six reading frames. Search was done on dimer of
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
42 consensus so ORFs can be longer than 'monomer' length
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
43 + Similarity-based annotation :: Annotation based on
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
44 similarity search using blastn/blastx against database of known
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
45 repeats.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
46 * Detailed cluster report
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
47 Cluster report includes a list of major monomer sequence varinats reconstructed from the most frequent k-mers. The reconstructed consensus sequences are sorted based on their significance (that is, what proportion of k-mer they represent).
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
48 ** Table legend
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
49 - kmer :: length of kmer used for consensus reconstruction.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
50 - variant :: identifier of consensus variant.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
51 - total score :: measure of significance of consensus variant. Score is calculated as a sum of weights of all k-mers used for consensus reconstruction.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
52 - monomer length :: length of the consensus
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
53 - consensus :: consensus sequence without ambiguous bases.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
54 - graph image :: part of de-Bruijn graph based on the abundant k-mers. Size of
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
55 vertices corresponds to k-mer frequencies, Paths in the graph which was used
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
56 for reconstruction of consensus sequences is gray colored.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
57 - logo image :: consensus sequences shown as DNA logo. Height of letters corresponds to kmer frequencies. Logo images are linked to corresponding position probability matrices.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
58
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
59 * Structure of the output archive
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
60 Complete results from TAREAN analysis can by downloaded as zip archive which contains the following
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
61 files and directories:
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
62
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
63 #+BEGIN_SRC files & directories
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
64 .
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
65 .
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
66 ├── clusters_info.csv <------------ list of clusters in tab delimited format
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
67 ├── index.html <------------ main html report
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
68 ├── seqclust
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
69 │   ├── assembly # not implemented yet
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
70 │   ├── blastn <------------ results of read comparison with DNA database
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
71 │   ├── blastx <------------ results of read comparison with protein database
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
72 │   ├── clustering
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
73 │   │   ├── clusters
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
74 │   │   │   ├── dir_CL0001 <----┐- detailed information about clusters
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
75 │   │   │   ├── dir_CL0002 <----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
76 │   │   │   ├── dir_CL0003 <----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
77 │ │ │ .... <----┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
78 │ │ │
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
79 │   │   └── hitsort.cls <--------- list of reads in individual clusters
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
80 │   ├── mgblast
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
81 │   ├── prerun
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
82 │   └── sequences <--------- input reads
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
83 ├── summary # not implemented yet
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
84 ├── TR_consensus_rank_1_.fasta <-- reconstructed monomer sequences for HIGH confidence satellites
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
85 ├── TR_consensus_rank_2_.fasta <-- reconstructed monomer sequences for LOW confidence satellites
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
86 ├── TR_consensus_rank_3_.fasta <-- reconstructed sequences of potential LTR elements
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
87 └── TR_consensus_rank_4_.fasta <-- reconstructed consensus for rDNA
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
88
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
89 #+END_SRC
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
90
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
91 List of all clusters which is available in HTML file =index.html= is also
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
92 available in tab delimited format in the file =clusters_info.csv= which can be
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
93 easily viewed and edited in spreadsheet editing programs. List of all clusters
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
94 and the corresponding reads is in the file =hitsort.cls= which has the following
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
95 format:
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
96
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
97 : >CL1 11
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
98 : 134234r 55494f 85525f 136746r 96742f 91926f 239729r 105445f 222518r 136402r 9013
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
99 : >CL2 10
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
100 : 76205r 120735r 69527r 12235r 176778f 189307f 131952f 163507f 100038r 178475r
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
101 : >CL3 6
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
102 : 99835r 222598f 29715r 102023f 99524r 30116f
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
103 : >CL4 6
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
104 : 51723r 69073r 218774r 146425f 136314r 41744f
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
105 : >CL5 5
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
106 : 70686f 65565f 234078r 50430r 68247r
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
107
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
108 where =CL1 11= is the cluster ID followed by number of reads in the cluster;
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
109 next line contains list of all read names belonging to the cluster.
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
110 ** structure of cluster directories
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
111
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
112 Detailed information for each cluster is stored is subdirectories:
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
113
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
114 #+BEGIN_SRC folder directories
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
115 dir_CL0011
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
116 ├── blast.csv <------------tab delimited file, all-to-all comparison od reads within cluster
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
117 ├── CL11_directed_graph.RData <----directed graph representation of cluster saved as R igraph object
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
118 ├── CL11.GL <-----------------undirected graph representation of cluster saved as R igraph object
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
119 ├── CL11.png <-----------┐- images with graph visualization
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
120 ├── CL11_tmb.png <-----------┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
121 ├── dna_database_annotation.csv <-- annotation of cluster reads based on the DNA database of repeats
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
122 ├── reads_all.fas <---------------- all reads included in the cluster in fasta format
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
123 ├── reads.fas <---------------- subset of reads used for monomer reconstruction
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
124 ├── reads_oriented.fas <------------ subset of reads all in the same orientation
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
125 └── tarean
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
126 ├── consensus.fasta <----------- fasta file with tandem repeat consensus variants
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
127 ├── ggmin.RData
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
128 ├── img
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
129 │   ├── graph_11mer_1.png <-----┐
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
130 │   ├── graph_11mer_2.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
131 │   ├── graph_15mer_2.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
132 │   ├── graph_15mer_3.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
133 │   ├── graph_15mer_4.png <-----│ images of kmer-based graphs used for reconstruction of
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
134 │   ├── graph_19mer_2.png <-----│ monomer variants
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
135 │   ├── graph_19mer_4.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
136 │   ├── graph_19mer_5.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
137 │   ├── graph_23mer_2.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
138 │   ├── graph_27mer_3.png <-----┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
139 │ │
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
140 │   ├── logo_11mer_1.png <-----┐
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
141 │   ├── logo_11mer_2.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
142 │   ├── logo_15mer_2.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
143 │   ├── logo_15mer_3.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
144 │   ├── logo_15mer_4.png <-----│ images with DNA logos representing consensus sequences
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
145 │   ├── logo_19mer_2.png <-----│ of monomer variants
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
146 │   ├── logo_19mer_4.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
147 │   ├── logo_19mer_5.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
148 │   ├── logo_23mer_2.png <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
149 │   └── logo_27mer_3.png <-----┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
150
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
151 ├── ppm_11mer_1.csv <-----┐
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
152 ├── ppm_11mer_2.csv <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
153 ├── ppm_15mer_2.csv <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
154 ├── ppm_15mer_3.csv <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
155 ├── ppm_15mer_4.csv <-----│ position probability matrices for individual monomer
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
156 ├── ppm_19mer_2.csv <-----│ variants derived from k-mer frequencies
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
157 ├── ppm_19mer_4.csv <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
158 ├── ppm_19mer_5.csv <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
159 ├── ppm_23mer_2.csv <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
160 ├── ppm_27mer_3.csv <-----┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
161
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
162 ├── reads_oriented.fas_11.kmers <-----┐
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
163 ├── reads_oriented.fas_15.kmers <-----│
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
164 ├── reads_oriented.fas_19.kmers <-----│ k-mer frequencies calculated on oriented reads
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
165 ├── reads_oriented.fas_23.kmers <-----│ for k-mer lengths 11 - 27
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
166 ├── reads_oriented.fas_27.kmers <-----┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
167 ├── reads_oriented.fasblast_out.cvs <---------┐results of blastn search against database of tRNA
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
168 ├── reads_oriented.fasblast_out.cvs_L.csv <----│for purposes of LTR detection
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
169 ├── reads_oriented.fasblast_out.cvs_R.csv <----┘
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
170 └── report.html <--- cluster analysisHTML summary
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
171 #+END_SRC
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
172
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
173
1d1b9e1b2e2f Uploaded
petr-novak
parents:
diff changeset
174