Mercurial > repos > artbio > fetch_fasta_from_ncbi
annotate fetch_fasta_from_NCBI.py @ 5:706fe8139955 draft
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
author | artbio |
---|---|
date | Tue, 16 Mar 2021 23:26:58 +0000 |
parents | c667d0ee39f5 |
children | 4af77e1af12a |
rev | line source |
---|---|
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
1 #!/usr/bin/env python |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
2 # -*- coding: utf-8 -*- |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
3 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
4 From a taxonomy ID retrieves all the nucleotide sequences |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
5 It returns a multiFASTA nuc/prot file |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
6 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
7 Entrez Database UID common name E-utility Database Name |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
8 Nucleotide GI number nuccore |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
9 Protein GI number protein |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
10 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
11 Retrieve strategy: |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
12 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
13 esearch to get total number of UIDs (count) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
14 esearch to get UIDs in batches |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
15 loop untile end of UIDs list: |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
16 epost to put a batch of UIDs in the history server |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
17 efetch to retrieve info from previous post |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
18 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
19 retmax of efetch is 1/10 of declared value from NCBI |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
20 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
21 queries are 1 sec delayed, to satisfy NCBI guidelines |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
22 (more than what they request) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
23 """ |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
24 import argparse |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
25 import http.client |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
26 import logging |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
27 import re |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
28 import sys |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
29 import time |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
30 from urllib import error, parse, request |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
31 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
32 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
33 LOG_FORMAT = '%(asctime)s|%(levelname)-8s|%(message)s' |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
34 LOG_DATEFMT = '%Y-%m-%d %H:%M:%S' |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
35 LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
36 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
37 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
38 class QueryException(Exception): |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
39 pass |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
40 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
41 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
42 class Eutils: |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
43 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
44 def __init__(self, options, logger): |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
45 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
46 Initialize retrieval parameters |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
47 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
48 self.logger = logger |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
49 self.base = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/" |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
50 self.query_string = options.query_string |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
51 self.dbname = options.dbname |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
52 if options.get_fasta: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
53 self.get_fasta = options.get_fasta |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
54 self.ids = [] |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
55 self.retmax_esearch = 100000 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
56 self.retmax_efetch = 500 |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
57 self.webenv = '' |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
58 self.usehistory = '' |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
59 self.query_key = '' |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
60 self.iuds_file = options.iuds_file |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
61 if self.iuds_file: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
62 with open(self.iuds_file, 'r') as f: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
63 for line in f: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
64 self.ids.append(line.rstrip()) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
65 self.count = len(self.ids) # 0 if query, some value if iuds_file |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
66 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
67 def retrieve(self): |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
68 """ |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
69 Retrieve the iuds and fastas corresponding to the query |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
70 """ |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
71 if len(self.ids) == 0: # retrieving from query (not required for file) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
72 self.count = self.ecount() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
73 # If no UIDs were found from query or file, exit |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
74 if self.count == 0: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
75 self.logger.info("found no UIDs. Exiting script.") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
76 sys.exit(-1) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
77 if not self.iuds_file: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
78 self.get_uids_list() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
79 self.print_uids_list() |
4
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
80 else: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
81 # as self.ids already implemented |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
82 self.print_uids_list() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
83 if self.get_fasta: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
84 try: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
85 self.get_sequences() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
86 except QueryException as e: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
87 self.logger.error("Exiting script.") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
88 raise e |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
89 |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
90 def print_uids_list(self): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
91 with open("retrieved_uid_list.txt", 'w') as f: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
92 f.write('\n'.join(self.ids)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
93 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
94 def ecount(self): |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
95 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
96 just to retrieve Count (number of UIDs) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
97 Total number of UIDs from the retrieved set to be shown in the XML |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
98 output (default=20). By default, ESearch only includes the first 20 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
99 UIDs retrieved in the XML output. If usehistory is set to 'y', |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
100 the remainder of the retrieved set will be stored on the History server |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
101 http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
102 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
103 querylog = self.esearch(self.dbname, self.query_string, '', '', |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
104 'count') |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
105 self.logger.debug("Query response:") |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
106 for line in querylog: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
107 line = line.decode('utf-8') |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
108 self.logger.debug(line.rstrip()) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
109 if '</Count>' in line: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
110 count = int(line.split("<Count>")[1].split("</Count>")[0]) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
111 self.logger.info("Found %d UIDs" % count) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
112 return count |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
113 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
114 def get_uids_list(self): |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
115 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
116 Increasing retmax allows more of the retrieved UIDs to be included in |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
117 the XML output, up to a maximum of 100,000 records. |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
118 from http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
119 """ |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
120 retmax = self.retmax_esearch |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
121 self.logger.info("retmax = %s, self.count = %s" % (retmax, self.count)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
122 if (int(self.count) > retmax): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
123 num_batches = int(self.count / retmax) + 1 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
124 self.usehistory = 'y' |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
125 self.logger.info("Batch size for esearch action: %d UIDs" % retmax) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
126 self.logger.info("Number of batches for esearch action: %s" |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
127 % num_batches) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
128 querylog = self.esearch(self.dbname, self.query_string, '', '', '') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
129 for line in querylog: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
130 line = line.decode('utf-8') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
131 self.logger.debug(line.rstrip()) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
132 if '<WebEnv>' in line: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
133 self.webenv = line.split("<WebEnv>")[1].split( |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
134 "</WebEnv>")[0] |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
135 self.logger.info("Will use webenv %s" % self.webenv) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
136 for n in range(num_batches): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
137 querylog = self.esearch(self.dbname, self.query_string, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
138 n*retmax, retmax, '') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
139 for line in querylog: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
140 line = line.decode('utf-8') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
141 if '<Id>' in line and '</Id>' in line: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
142 uid = line.split("<Id>")[1].split("</Id>")[0] |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
143 self.ids.append(uid) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
144 self.logger.info("Retrieved %d UIDs" % len(self.ids)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
145 |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
146 else: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
147 self.logger.info("Batch size for esearch action: %d UIDs" % retmax) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
148 self.logger.info("Number of batches for esearch action: 1") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
149 querylog = self.esearch(self.dbname, self.query_string, 0, |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
150 retmax, '') |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
151 for line in querylog: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
152 line = line.decode('utf-8') |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
153 if '<Id>' in line and '</Id>' in line: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
154 uid = line.split("<Id>")[1].split("</Id>")[0] |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
155 self.ids.append(uid) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
156 self.logger.info("Retrieved %d UIDs" % len(self.ids)) |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
157 return self.ids |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
158 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
159 def get_sequences(self): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
160 batch_size = self.retmax_efetch |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
161 count = self.count |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
162 uids_list = self.ids |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
163 self.logger.info("Batch size for efetch action: %d" % batch_size) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
164 self.logger.info("Number of batches for efetch action: %d" % |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
165 ((count / batch_size) + 1)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
166 with open(self.get_fasta, 'w') as out: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
167 for start in range(0, count, batch_size): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
168 end = min(count, start+batch_size) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
169 batch = uids_list[start:end] |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
170 self.logger.info("retrieving batch %d" % |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
171 ((start / batch_size) + 1)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
172 try: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
173 mfasta = self.efetch(self.dbname, ','.join(batch)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
174 out.write(mfasta + '\n') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
175 except QueryException as e: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
176 self.logger.error("%s" % e.message) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
177 raise e |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
178 request.urlcleanup() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
179 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
180 def efetch(self, db, uid_list): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
181 url = self.base + "efetch.fcgi" |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
182 self.logger.debug("url_efetch: %s" % url) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
183 values = {'db': db, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
184 'id': uid_list, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
185 'rettype': "fasta", |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
186 'retmode': "text", |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
187 'usehistory': self.usehistory, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
188 'WebEnv': self.webenv} |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
189 data = parse.urlencode(values) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
190 req = request.Request(url, data.encode('utf-8')) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
191 self.logger.debug("data: %s" % str(data)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
192 serverTransaction = False |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
193 counter = 0 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
194 response_code = 0 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
195 while not serverTransaction: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
196 counter += 1 |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
197 self.logger.info("Server Transaction Trial: %s" % (counter)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
198 try: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
199 self.logger.debug("Going to open") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
200 response = request.urlopen(req) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
201 self.logger.debug("Going to get code") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
202 response_code = response.getcode() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
203 self.logger.debug("Going to read, de code was : %s", |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
204 str(response_code)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
205 fasta = response.read() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
206 self.logger.debug("Did all that") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
207 response.close() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
208 if((response_code != 200) or |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
209 (b"Resource temporarily unavailable" in fasta) or |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
210 (b"Error" in fasta) or (not fasta.startswith(b">"))): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
211 serverTransaction = False |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
212 if (response_code != 200): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
213 self.logger.info("urlopen error: Response code is not\ |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
214 200") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
215 elif ("Resource temporarily unavailable" in fasta): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
216 self.logger.info("Ressource temporarily unavailable") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
217 elif ("Error" in fasta): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
218 self.logger.info("Error in fasta") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
219 else: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
220 self.logger.info("Fasta doesn't start with '>'") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
221 else: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
222 serverTransaction = True |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
223 except error.HTTPError as e: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
224 serverTransaction = False |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
225 self.logger.info("urlopen error:%s, %s" % (e.code, e.read())) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
226 except error.URLError as e: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
227 serverTransaction = False |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
228 self.logger.info("urlopen error: Failed to reach a server") |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
229 self.logger.info("Reason :%s" % (e.reason)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
230 except http.client.IncompleteRead as e: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
231 serverTransaction = False |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
232 self.logger.info("IncompleteRead error: %s" % (e.partial)) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
233 if (counter > 500): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
234 serverTransaction = True |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
235 if (counter > 500): |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
236 raise QueryException({"message": |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
237 "500 Server Transaction Trials attempted for\ |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
238 this batch. Aborting."}) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
239 fasta = self.sanitiser(self.dbname, fasta.decode('utf-8')) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
240 time.sleep(0.1) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
241 return fasta |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
242 |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
243 def esearch(self, db, term, retstart, retmax, rettype): |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
244 url = self.base + "esearch.fcgi" |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
245 self.logger.debug("url: %s" % url) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
246 values = {'db': db, |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
247 'term': term, |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
248 'rettype': rettype, |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
249 'retstart': retstart, |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
250 'retmax': retmax, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
251 'usehistory': self.usehistory, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
252 'WebEnv': self.webenv} |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
253 data = parse.urlencode(values) |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
254 self.logger.debug("data: %s" % str(data)) |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
255 req = request.Request(url, data.encode('utf-8')) |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
256 response = request.urlopen(req) |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
257 querylog = response.readlines() |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
258 response.close() |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
259 time.sleep(1) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
260 return querylog |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
261 |
4
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
262 def sanitiser(self, db, fastaseq): |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
263 if(db not in "nuccore protein"): |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
264 return fastaseq |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
265 regex = re.compile(r"[ACDEFGHIKLMNPQRSTVWYBZ]{49,}") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
266 sane_seqlist = [] |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
267 seqlist = fastaseq.split('\n\n') |
4
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
268 for seq in seqlist[:-1]: |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
269 fastalines = seq.split("\n") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
270 if len(fastalines) < 2: |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
271 self.logger.info("Empty sequence for %s" % |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
272 ("|".join(fastalines[0].split("|")[:4]))) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
273 self.logger.info("%s download is skipped" % |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
274 ("|".join(fastalines[0].split("|")[:4]))) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
275 continue |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
276 if db == "nuccore": |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
277 badnuc = 0 |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
278 for nucleotide in fastalines[1]: |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
279 if nucleotide not in "ATGC": |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
280 badnuc += 1 |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
281 if float(badnuc)/len(fastalines[1]) > 0.4: |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
282 self.logger.info("%s ambiguous nucleotides in %s\ |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
283 or download interrupted at this offset\ |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
284 | %s" % (float(badnuc)/len(fastalines[1]), |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
285 "|".join(fastalines[0].split("|") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
286 [:4]), |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
287 fastalines[1])) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
288 self.logger.info("%s download is skipped" % |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
289 fastalines[0].split("|")[:4]) |
4
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
290 continue |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
291 """ remove spaces and trim the header to 100 chars """ |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
292 fastalines[0] = fastalines[0].replace(" ", "_")[:100] |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
293 cleanseq = "\n".join(fastalines) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
294 sane_seqlist.append(cleanseq) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
295 elif db == "protein": |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
296 fastalines[0] = fastalines[0][0:100] |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
297 fastalines[0] = fastalines[0].replace(" ", "_") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
298 fastalines[0] = fastalines[0].replace("[", "_") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
299 fastalines[0] = fastalines[0].replace("]", "_") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
300 fastalines[0] = fastalines[0].replace("=", "_") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
301 """ because blast makedb doesn't like it """ |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
302 fastalines[0] = fastalines[0].rstrip("_") |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
303 fastalines[0] = re.sub(regex, "_", fastalines[0]) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
304 cleanseq = "\n".join(fastalines) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
305 sane_seqlist.append(cleanseq) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
306 self.logger.info("clean sequences appended: %d" % (len(sane_seqlist))) |
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
307 return "\n".join(sane_seqlist) |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
308 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
309 |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
310 def command_parse(): |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
311 parser = argparse.ArgumentParser(description='Retrieve data from NCBI') |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
312 parser.add_argument('--query', '-i', dest='query_string', |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
313 default=None, help='NCBI Query String') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
314 parser.add_argument('--iud_file', dest='iuds_file', default=None, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
315 help='input list of iuds to be fetched') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
316 parser.add_argument('--dbname', '-d', dest='dbname', help='database type') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
317 parser.add_argument('--fasta', '-F', dest='get_fasta', default=False, |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
318 help='file with retrieved fasta sequences') |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
319 parser.add_argument('--logfile', '-l', help='log file (default=stderr)') |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
320 parser.add_argument('--loglevel', choices=LOG_LEVELS, default='INFO', |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
321 help='logging level (default: INFO)') |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
322 args = parser.parse_args() |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
323 |
4
c667d0ee39f5
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ca3070e85c370b914ffa0562afe12b363e05aea4
artbio
parents:
3
diff
changeset
|
324 if args.query_string is not None and args.iuds_file is not None: |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
325 parser.error('Please choose either fetching by query (--query) \ |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
326 or by uid list (--iud_file)') |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
327 return args |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
328 |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
329 |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
330 def __main__(): |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
331 """ main function """ |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
332 args = command_parse() |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
333 log_level = getattr(logging, args.loglevel) |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
334 kwargs = {'format': LOG_FORMAT, |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
335 'datefmt': LOG_DATEFMT, |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
336 'level': log_level} |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
337 if args.logfile: |
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
338 kwargs['filename'] = args.logfile |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
339 logging.basicConfig(**kwargs) |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
340 logger = logging.getLogger('data_from_NCBI') |
3
8be88084f89c
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit ab45487db8cc69750f92a40d763d51ffac940e25
artbio
parents:
2
diff
changeset
|
341 E = Eutils(args, logger) |
5
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
342 try: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
343 E.retrieve() |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
344 except Exception: |
706fe8139955
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit b5ef783237b244d684e26b1ed1cc333a8305ce3e"
artbio
parents:
4
diff
changeset
|
345 sys.exit(-1) |
1
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
346 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
347 |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
348 if __name__ == "__main__": |
7e41bbb94159
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/fetch_fasta_from_ncbi commit 6008aafac37eec1916d6b72c05d9cfcb002b8095
artbio
parents:
diff
changeset
|
349 __main__() |