Mercurial > repos > matt-shirley > sra_tools
annotate sra.py @ 0:cdcc400dcafc draft
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
author | matt-shirley <mdshw5@gmail.com> |
---|---|
date | Tue, 27 Nov 2012 13:31:09 -0500 |
parents | |
children |
rev | line source |
---|---|
0
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
1 """ |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
2 Sra class |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
3 """ |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
4 |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
5 import galaxy.datatypes.binary |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
6 from galaxy.datatypes.binary import Binary |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
7 import data, logging, binascii |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
8 from galaxy.datatypes.metadata import MetadataElement |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
9 from galaxy.datatypes import metadata |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
10 from galaxy.datatypes.sniff import * |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
11 from galaxy import eggs |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
12 import pkg_resources |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
13 pkg_resources.require( "bx-python" ) |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
14 import os, subprocess, tempfile |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
15 import struct |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
16 |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
17 class Sra( Binary ): |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
18 """ Sequence Read Archive (SRA) """ |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
19 file_ext = "sra" |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
20 |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
21 def __init__( self, **kwd ): |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
22 Binary.__init__( self, **kwd ) |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
23 def sniff( self, filename ): |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
24 # The first 8 bytes of any NCBI sra file is 'NCIB.sra', and the file is binary. EBI and DDBJ files may differ. For details |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
25 # about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
26 try: |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
27 header = open( filename ).read(8) |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
28 if binascii.b2a_hex( header ) == binascii.hexlify( 'NCBI.sra' ): |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
29 return True |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
30 return False |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
31 except: |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
32 return False |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
33 def set_peek( self, dataset, is_multi_byte=False ): |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
34 if not dataset.dataset.purged: |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
35 dataset.peek = "Binary sra file" |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
36 dataset.blurb = data.nice_size( dataset.get_size() ) |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
37 else: |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
38 dataset.peek = 'file does not exist' |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
39 dataset.blurb = 'file purged from disk' |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
40 def display_peek( self, dataset ): |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
41 try: |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
42 return dataset.peek |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
43 except: |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
44 return "Binary sra file (%s)" % ( data.nice_size( dataset.get_size() ) ) |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
45 |
cdcc400dcafc
Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff
changeset
|
46 Binary.register_sniffable_binary_format("sra", "sra", Sra) |