annotate sra.py @ 0:cdcc400dcafc draft

Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
author matt-shirley <mdshw5@gmail.com>
date Tue, 27 Nov 2012 13:31:09 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
1 """
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
2 Sra class
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
3 """
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
4
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
5 import galaxy.datatypes.binary
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
6 from galaxy.datatypes.binary import Binary
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
7 import data, logging, binascii
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
8 from galaxy.datatypes.metadata import MetadataElement
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
9 from galaxy.datatypes import metadata
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
10 from galaxy.datatypes.sniff import *
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
11 from galaxy import eggs
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
12 import pkg_resources
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
13 pkg_resources.require( "bx-python" )
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
14 import os, subprocess, tempfile
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
15 import struct
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
16
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
17 class Sra( Binary ):
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
18 """ Sequence Read Archive (SRA) """
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
19 file_ext = "sra"
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
20
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
21 def __init__( self, **kwd ):
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
22 Binary.__init__( self, **kwd )
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
23 def sniff( self, filename ):
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
24 # The first 8 bytes of any NCBI sra file is 'NCIB.sra', and the file is binary. EBI and DDBJ files may differ. For details
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
25 # about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
26 try:
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
27 header = open( filename ).read(8)
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
28 if binascii.b2a_hex( header ) == binascii.hexlify( 'NCBI.sra' ):
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
29 return True
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
30 return False
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
31 except:
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
32 return False
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
33 def set_peek( self, dataset, is_multi_byte=False ):
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
34 if not dataset.dataset.purged:
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
35 dataset.peek = "Binary sra file"
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
36 dataset.blurb = data.nice_size( dataset.get_size() )
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
37 else:
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
38 dataset.peek = 'file does not exist'
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
39 dataset.blurb = 'file purged from disk'
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
40 def display_peek( self, dataset ):
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
41 try:
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
42 return dataset.peek
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
43 except:
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
44 return "Binary sra file (%s)" % ( data.nice_size( dataset.get_size() ) )
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
45
cdcc400dcafc Migrated separate tools fastq_dump, sam_dump, and sra_fetch to this repository for further development.
matt-shirley <mdshw5@gmail.com>
parents:
diff changeset
46 Binary.register_sniffable_binary_format("sra", "sra", Sra)