annotate README.md @ 14:a6c55d1bdb6c draft

Uploaded
author petr-novak
date Wed, 28 Aug 2019 08:08:47 -0400
parents 77d9f2ecb28a
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
1 # Domain based annotation of transposable elements - DANTE #
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
2
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
3 ### Authors
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
4 Nina Hostakova, Petr Novak, Pavel Neumann, Jiri Macas
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
5 Biology Centre CAS, Czech Republic
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
6
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
7
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
8 ### Introduction
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
9
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
10 * Protein Domains Finder [dante.py]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
11 * Script performs scanning of given DNA sequence(s) in (multi)fasta format in order to discover protein domains using our protein domains database.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
12 * Domains searching is accomplished engaging LASTAL alignment tool.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
13 * Domains are subsequently annotated and classified - in case certain domain has multiple annotations assigned, classifation is derived from the common classification level of all of them.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
14
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
15 * Proteins Domains Filter [dante_gff_output_filtering.py]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
16 * filters GFF3 output from previous step to obtain certain kind of domain and/or allows to adjust quality filtering
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
17
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
18 ### DEPENDENCIES ###
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
19
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
20 * python3.4 or higher with packages:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
21 * numpy
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
22 * biopython
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
23 * [lastal](http://last.cbrc.jp/doc/last.html) 744 or higher
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
24 * ProfRep/DANTE modules:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
25 * configuration.py
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
26
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
27
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
28 ### Protein Domains Finder ###
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
29
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
30 This tool provides **preliminary** output of all domains types which are not filtered for quality.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
31
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
32 #### INPUTS ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
33
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
34 * DNA sequence [multiFasta]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
35
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
36 #### OUTPUTS ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
37
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
38 * **All protein domains GFF3** - individual domains are reported per line as regions (start-end) on the original DNA sequence including the seq ID and strand orientation. The last "Attributes" column contains several comma-separated information related to the domain annotation, alignment and its quality. This file can undergo further filtering using Protein Domain Filter tool.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
39
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
40 #### USAGE ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
41
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
42 usage: dante.py [-h] -q QUERY -pdb PROTEIN_DATABASE -cs
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
43 CLASSIFICATION [-oug DOMAIN_GFF] [-nld NEW_LDB]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
44 [-dir OUTPUT_DIR] [-thsc THRESHOLD_SCORE]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
45 [-wd WIN_DOM] [-od OVERLAP_DOM]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
46
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
47 optional arguments:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
48 -h, --help show this help message and exit
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
49 -oug DOMAIN_GFF, --domain_gff DOMAIN_GFF
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
50 output domains gff format (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
51 -nld NEW_LDB, --new_ldb NEW_LDB
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
52 create indexed database files for lastal in case of
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
53 working with new protein db (default: False)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
54 -dir OUTPUT_DIR, --output_dir OUTPUT_DIR
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
55 specify if you want to change the output directory
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
56 (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
57 -thsc THRESHOLD_SCORE, --threshold_score THRESHOLD_SCORE
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
58 percentage of the best score in the cluster to be
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
59 tolerated when assigning annotations per base
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
60 (default: 80)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
61 -wd WIN_DOM, --win_dom WIN_DOM
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
62 window to process large input sequences sequentially
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
63 (default: 10000000)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
64 -od OVERLAP_DOM, --overlap_dom OVERLAP_DOM
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
65 overlap of sequences in two consecutive windows
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
66 (default: 10000)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
67
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
68 required named arguments:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
69 -q QUERY, --query QUERY
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
70 input DNA sequence to search for protein domains in a
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
71 fasta format. Multifasta format allowed. (default:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
72 None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
73 -pdb PROTEIN_DATABASE, --protein_database PROTEIN_DATABASE
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
74 protein domains database file (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
75 -cs CLASSIFICATION, --classification CLASSIFICATION
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
76 protein domains classification file (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
77
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
78
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
79
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
80 #### HOW TO RUN EXAMPLE ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
81 ./protein_domains.py -q PATH_TO_INPUT_SEQ -pdb PATH_TO_PROTEIN_DB -cs PATH_TO_CLASSIFICATION_FILE
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
82
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
83 When running for the first time with a new database use -nld option allowing lastal to create indexed database files:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
84
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
85 -nld True
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
86
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
87 use other arguments if you wish to rename your outputs or they will be created automatically with standard names
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
88
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
89 ### Protein Domains Filter ###
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
90
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
91 The script performs Protein Domains Finder output filtering for quality and/or extracting specific type of protein domain or mobile elements of origin. For the filtered domains it reports their translated protein sequence of original DNA.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
92
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
93 WHEN NO PARAMETERS GIVEN, IT PERFORMS QUALITY FILTERING USING THE DEFAULT PARAMETRES (optimized for Viridiplantae species)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
94
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
95 #### INPUTS ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
96 * GFF3 file produced by protein_domains.py OR already filtered GFF3
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
97
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
98 #### Filtering options ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
99 * QUALITY:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
100 - Min relative length of alignemnt to the protein domain from DB (without gaps)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
101 - Identity
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
102 - Similarity (scoring matrix: BLOSUM80)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
103 - Interruption in the reading frame (frameshifts + stop codons) per every starting 100 AA
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
104 - Max alignment proportion to the original length of database domain sequence
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
105 * DOMAIN TYPE: 'Name' attribute in GFF - see choices bellow
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
106 Records for ambiguous domain type (e.g. INT/RH) are filtered out automatically
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
107
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
108 * MOBILE ELEMENT TYPE:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
109 arbitrary substring of the element classification ('Final_Classification' attribute in GFF)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
110
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
111 #### OUTPUTS ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
112 * filtered GFF3 file
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
113 * fasta file of translated protein sequences for the aligned domains that match the filtering criteria
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
114 ! as it is taken from the best hit alignment reported by LAST, it does not neccessary cover the whole region reported as domain in GFF
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
115
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
116 #### USAGE ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
117
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
118 usage: dante_gff_output_filtering.py [-h] -dg DOM_GFF [-ouf DOMAINS_FILTERED]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
119 [-dps DOMAINS_PROT_SEQ]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
120 [-thl {float range 0.0..1.0}]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
121 [-thi {float range 0.0..1.0}]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
122 [-ths {float range 0.0..1.0}] [-ir INTERRUPTIONS]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
123 [-mlen MAX_LEN_PROPORTION]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
124 [-sd {All,GAG,INT,PROT,RH,RT,aRH,CHDCR,CHDII,TPase,YR,HEL1,HEL2,ENDO}]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
125 [-el ELEMENT_TYPE] [-dir OUTPUT_DIR]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
126
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
127
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
128
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
129 optional arguments:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
130 -h, --help show this help message and exit
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
131 -ouf DOMAINS_FILTERED, --domains_filtered DOMAINS_FILTERED
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
132 output filtered domains gff file (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
133 -dps DOMAINS_PROT_SEQ, --domains_prot_seq DOMAINS_PROT_SEQ
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
134 output file containg domains protein sequences
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
135 (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
136 -thl {float range 0.0..1.0}, --th_length {float range 0.0..1.0}
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
137 proportion of alignment length threshold (default:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
138 0.8)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
139 -thi {float range 0.0..1.0}, --th_identity {float range 0.0..1.0}
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
140 proportion of alignment identity threshold (default:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
141 0.35)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
142 -ths {float range 0.0..1.0}, --th_similarity {float range 0.0..1.0}
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
143 threshold for alignment proportional similarity
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
144 (default: 0.45)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
145 -ir INTERRUPTIONS, --interruptions INTERRUPTIONS
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
146 interruptions (frameshifts + stop codons) tolerance
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
147 threshold per 100 AA (default: 3)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
148 -mlen MAX_LEN_PROPORTION, --max_len_proportion MAX_LEN_PROPORTION
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
149 maximal proportion of alignment length to the original
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
150 length of protein domain from database (default: 1.2)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
151 -sd {All,GAG,INT,PROT,RH,RT,aRH,CHDCR,CHDII,TPase,YR,HEL1,HEL2,ENDO}, --selected_dom {All,GAG,INT,PROT,RH,RT,aRH,CHDCR,CHDII,TPase,YR,HEL1,HEL2,ENDO}
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
152 filter output domains based on the domain type
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
153 (default: All)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
154 -el ELEMENT_TYPE, --element_type ELEMENT_TYPE
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
155 filter output domains by typing substring from
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
156 classification (default: )
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
157 -dir OUTPUT_DIR, --output_dir OUTPUT_DIR
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
158 specify if you want to change the output directory
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
159 (default: None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
160
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
161 required named arguments:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
162 -dg DOM_GFF, --dom_gff DOM_GFF
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
163 basic unfiltered gff file of all domains (default:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
164 None)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
165
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
166
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
167
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
168 #### HOW TO RUN EXAMPLE ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
169 e.g. getting quality filtered integrase(INT) domains of all gypsy transposable elements:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
170
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
171 ./domains_filtering.py -dom_gff PATH_TO_INPUT_GFF -pdb PATH_TO_PROTEIN_DB -cs PATH_TO_CLASSIFICATION_FILE --selected_dom INT --element_type Ty3/gypsy
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
172
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
173
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
174 ### Extract Domains Nucleotide Sequences ###
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
175
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
176 This tool extracts nucleotide sequences of protein domains from reference DNA based on DANTE's output. It can be used e.g. for deriving phylogenetic relations of individual mobile elements classes within a species.
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
177
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
178 #### INPUTS ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
179
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
180 * original DNA sequence in multifasta format to extract the domains from
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
181 * GFF3 file of protein domains (**DANTE's output** - preferably filtered for quality and specific domain type)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
182 * Domains database classification table (to check the classification level)
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
183
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
184 #### OUTPUTS ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
185
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
186 * fasta files of domains nucleotide sequences for individual transposons lineages
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
187 * txt file of domains counts extracted for individual lineages
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
188
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
189 **- For GALAXY usage all concatenated in a single fasta file**
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
190
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
191 #### USAGE ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
192 usage: dante_gff_to_dna.py [-h] -i INPUT_DNA -d DOMAINS_GFF -cs
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
193 CLASSIFICATION [-out OUT_DIR] [-ex EXTENDED]
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
194
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
195 optional arguments:
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
196 -h, --help show this help message and exit
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
197 -i INPUT_DNA, --input_dna INPUT_DNA
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
198 path to input DNA sequence
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
199 -d DOMAINS_GFF, --domains_gff DOMAINS_GFF
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
200 GFF file of protein domains
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
201 -cs CLASSIFICATION, --classification CLASSIFICATION
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
202 protein domains classification file
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
203 -out OUT_DIR, --out_dir OUT_DIR
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
204 output directory
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
205 -ex EXTENDED, --extended EXTENDED
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
206 extend the domains edges if not the whole datatabase
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
207 sequence was aligned
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
208
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
209 #### HOW TO RUN EXAMPLE ####
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
210 ./extract_domains_seqs.py --domains_gff PATH_PROTEIN_DOMAINS_GFF --input_dna PATH_TO_INPUT_DNA --classification PROTEIN_DOMAINS_DB_CLASS_TBL --extended True
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
211
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
212
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
213
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
214
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
215
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
216
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
217
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
218
77d9f2ecb28a Uploaded
petr-novak
parents:
diff changeset
219