sharplabtool: tools/sr_mapping/lastz

annotate tools/sr_mapping/lastz_wrapper.py @ 1:cdcb0ce84a1b

Uploaded

author	xuebing
date	Fri, 09 Mar 2012 19:45:15 -0500
parents	9071e359b9a3
children

rev	line source
0 9071e359b9a3 Uploaded xuebing parents: diff changeset	1 #!/usr/bin/env python
9071e359b9a3 Uploaded xuebing parents: diff changeset	2
9071e359b9a3 Uploaded xuebing parents: diff changeset	3 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	4 Runs Lastz
9071e359b9a3 Uploaded xuebing parents: diff changeset	5 Written for Lastz v. 1.01.88.
9071e359b9a3 Uploaded xuebing parents: diff changeset	6
9071e359b9a3 Uploaded xuebing parents: diff changeset	7 usage: lastz_wrapper.py [options]
9071e359b9a3 Uploaded xuebing parents: diff changeset	8 --ref_name: The reference name to change all output matches to
9071e359b9a3 Uploaded xuebing parents: diff changeset	9 --ref_source: Whether the reference is cached or from the history
9071e359b9a3 Uploaded xuebing parents: diff changeset	10 --source_select: Whether to used pre-set or cached reference file
9071e359b9a3 Uploaded xuebing parents: diff changeset	11 --input1: The name of the reference file if using history or reference base name if using cached
9071e359b9a3 Uploaded xuebing parents: diff changeset	12 --input2: The reads file to align
9071e359b9a3 Uploaded xuebing parents: diff changeset	13 --ref_sequences: The number of sequences in the reference file if using one from history
9071e359b9a3 Uploaded xuebing parents: diff changeset	14 --pre_set_options: Which of the pre set options to use, if using pre-sets
9071e359b9a3 Uploaded xuebing parents: diff changeset	15 --strand: Which strand of the read to search, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	16 --seed: Seeding settings, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	17 --gfextend: Whether to perform gap-free extension of seed hits to HSPs (high scoring segment pairs), if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	18 --chain: Whether to perform chaining of HSPs, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	19 --transition: Number of transitions to allow in each seed hit, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	20 --O: Gap opening penalty, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	21 --E: Gap extension penalty, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	22 --X: X-drop threshold, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	23 --Y: Y-drop threshold, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	24 --K: Threshold for HSPs, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	25 --L: Threshold for gapped alignments, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	26 --entropy: Whether to involve entropy when filtering HSPs, if specifying all parameters
9071e359b9a3 Uploaded xuebing parents: diff changeset	27 --identity_min: Minimum identity (don't report matches under this identity)
9071e359b9a3 Uploaded xuebing parents: diff changeset	28 --identity_max: Maximum identity (don't report matches above this identity)
9071e359b9a3 Uploaded xuebing parents: diff changeset	29 --coverage: The minimum coverage value (don't report matches covering less than this)
9071e359b9a3 Uploaded xuebing parents: diff changeset	30 --unmask: Whether to convert lowercase bases to uppercase
9071e359b9a3 Uploaded xuebing parents: diff changeset	31 --out_format: The format of the output file (sam, diffs, or tabular (general))
9071e359b9a3 Uploaded xuebing parents: diff changeset	32 --output: The name of the output file
9071e359b9a3 Uploaded xuebing parents: diff changeset	33 --lastzSeqsFileDir: Directory of local lastz_seqs.loc file
9071e359b9a3 Uploaded xuebing parents: diff changeset	34 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	35 import optparse, os, subprocess, shutil, sys, tempfile, threading, time
9071e359b9a3 Uploaded xuebing parents: diff changeset	36 from Queue import Queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	37
9071e359b9a3 Uploaded xuebing parents: diff changeset	38 from galaxy import eggs
9071e359b9a3 Uploaded xuebing parents: diff changeset	39 import pkg_resources
9071e359b9a3 Uploaded xuebing parents: diff changeset	40 pkg_resources.require( 'bx-python' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	41 from bx.seq.twobit import *
9071e359b9a3 Uploaded xuebing parents: diff changeset	42 from bx.seq.fasta import FastaReader
9071e359b9a3 Uploaded xuebing parents: diff changeset	43 from galaxy.util.bunch import Bunch
9071e359b9a3 Uploaded xuebing parents: diff changeset	44
9071e359b9a3 Uploaded xuebing parents: diff changeset	45 STOP_SIGNAL = object()
9071e359b9a3 Uploaded xuebing parents: diff changeset	46 WORKERS = 4
9071e359b9a3 Uploaded xuebing parents: diff changeset	47 SLOTS = 128
9071e359b9a3 Uploaded xuebing parents: diff changeset	48
9071e359b9a3 Uploaded xuebing parents: diff changeset	49 def stop_err( msg ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	50 sys.stderr.write( "%s" % msg )
9071e359b9a3 Uploaded xuebing parents: diff changeset	51 sys.exit()
9071e359b9a3 Uploaded xuebing parents: diff changeset	52
9071e359b9a3 Uploaded xuebing parents: diff changeset	53 def stop_queues( lastz, combine_data ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	54 # This method should only be called if an error has been encountered.
9071e359b9a3 Uploaded xuebing parents: diff changeset	55 # Send STOP_SIGNAL to all worker threads
9071e359b9a3 Uploaded xuebing parents: diff changeset	56 for t in lastz.threads:
9071e359b9a3 Uploaded xuebing parents: diff changeset	57 lastz.put( STOP_SIGNAL, True )
9071e359b9a3 Uploaded xuebing parents: diff changeset	58 combine_data.put( STOP_SIGNAL, True )
9071e359b9a3 Uploaded xuebing parents: diff changeset	59
9071e359b9a3 Uploaded xuebing parents: diff changeset	60 class BaseQueue( object ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	61 def __init__( self, num_threads, slots=-1 ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	62 # Initialize the queue and worker threads
9071e359b9a3 Uploaded xuebing parents: diff changeset	63 self.queue = Queue( slots )
9071e359b9a3 Uploaded xuebing parents: diff changeset	64 self.threads = []
9071e359b9a3 Uploaded xuebing parents: diff changeset	65 for i in range( num_threads ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	66 worker = threading.Thread( target=self.run_next )
9071e359b9a3 Uploaded xuebing parents: diff changeset	67 worker.start()
9071e359b9a3 Uploaded xuebing parents: diff changeset	68 self.threads.append( worker )
9071e359b9a3 Uploaded xuebing parents: diff changeset	69 def run_next( self ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	70 # Run the next job, waiting until one is available if necessary
9071e359b9a3 Uploaded xuebing parents: diff changeset	71 while True:
9071e359b9a3 Uploaded xuebing parents: diff changeset	72 job = self.queue.get()
9071e359b9a3 Uploaded xuebing parents: diff changeset	73 if job is STOP_SIGNAL:
9071e359b9a3 Uploaded xuebing parents: diff changeset	74 return self.shutdown()
9071e359b9a3 Uploaded xuebing parents: diff changeset	75 self.run_job( job )
9071e359b9a3 Uploaded xuebing parents: diff changeset	76 time.sleep( 1 )
9071e359b9a3 Uploaded xuebing parents: diff changeset	77 def run_job( self, job ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	78 stop_err( 'Not Implemented' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	79 def put( self, job, block=False ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	80 # Add a job to the queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	81 self.queue.put( job, block )
9071e359b9a3 Uploaded xuebing parents: diff changeset	82 def shutdown( self ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	83 return
9071e359b9a3 Uploaded xuebing parents: diff changeset	84
9071e359b9a3 Uploaded xuebing parents: diff changeset	85 class LastzJobQueue( BaseQueue ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	86 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	87 A queue that runs commands in parallel. Blocking is done so the queue will
9071e359b9a3 Uploaded xuebing parents: diff changeset	88 not consume much memory.
9071e359b9a3 Uploaded xuebing parents: diff changeset	89 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	90 def run_job( self, job ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	91 # Execute the job's command
9071e359b9a3 Uploaded xuebing parents: diff changeset	92 proc = subprocess.Popen( args=job.command, shell=True, stderr=subprocess.PIPE, )
9071e359b9a3 Uploaded xuebing parents: diff changeset	93 proc.wait()
9071e359b9a3 Uploaded xuebing parents: diff changeset	94 stderr = proc.stderr.read()
9071e359b9a3 Uploaded xuebing parents: diff changeset	95 proc.wait()
9071e359b9a3 Uploaded xuebing parents: diff changeset	96 if stderr:
9071e359b9a3 Uploaded xuebing parents: diff changeset	97 stop_queues( self, job.combine_data_queue )
9071e359b9a3 Uploaded xuebing parents: diff changeset	98 stop_err( stderr )
9071e359b9a3 Uploaded xuebing parents: diff changeset	99 job.combine_data_queue.put( job )
9071e359b9a3 Uploaded xuebing parents: diff changeset	100
9071e359b9a3 Uploaded xuebing parents: diff changeset	101 class CombineDataQueue( BaseQueue ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	102 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	103 A queue that concatenates files in serial. Blocking is not done since this
9071e359b9a3 Uploaded xuebing parents: diff changeset	104 queue is not expected to grow larger than the command queue.
9071e359b9a3 Uploaded xuebing parents: diff changeset	105 """
9071e359b9a3 Uploaded xuebing parents: diff changeset	106 def __init__( self, output_filename, num_threads=1 ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	107 BaseQueue.__init__( self, num_threads )
9071e359b9a3 Uploaded xuebing parents: diff changeset	108 self.CHUNK_SIZE = 2**20 # 1Mb
9071e359b9a3 Uploaded xuebing parents: diff changeset	109 self.output_file = open( output_filename, 'wb' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	110 def run_job( self, job ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	111 in_file = open( job.output, 'rb' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	112 while True:
9071e359b9a3 Uploaded xuebing parents: diff changeset	113 chunk = in_file.read( self.CHUNK_SIZE )
9071e359b9a3 Uploaded xuebing parents: diff changeset	114 if not chunk:
9071e359b9a3 Uploaded xuebing parents: diff changeset	115 in_file.close()
9071e359b9a3 Uploaded xuebing parents: diff changeset	116 break
9071e359b9a3 Uploaded xuebing parents: diff changeset	117 self.output_file.write( chunk )
9071e359b9a3 Uploaded xuebing parents: diff changeset	118 for file_name in job.cleanup:
9071e359b9a3 Uploaded xuebing parents: diff changeset	119 os.remove( file_name )
9071e359b9a3 Uploaded xuebing parents: diff changeset	120 def shutdown( self ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	121 self.output_file.close()
9071e359b9a3 Uploaded xuebing parents: diff changeset	122 return
9071e359b9a3 Uploaded xuebing parents: diff changeset	123
9071e359b9a3 Uploaded xuebing parents: diff changeset	124 def __main__():
9071e359b9a3 Uploaded xuebing parents: diff changeset	125 #Parse Command Line
9071e359b9a3 Uploaded xuebing parents: diff changeset	126 parser = optparse.OptionParser()
9071e359b9a3 Uploaded xuebing parents: diff changeset	127 parser.add_option( '', '--ref_name', dest='ref_name', help='The reference name to change all output matches to' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	128 parser.add_option( '', '--ref_source', dest='ref_source', help='Whether the reference is cached or from the history' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	129 parser.add_option( '', '--ref_sequences', dest='ref_sequences', help='Number of sequences in the reference dataset' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	130 parser.add_option( '', '--source_select', dest='source_select', help='Whether to used pre-set or cached reference file' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	131 parser.add_option( '', '--input1', dest='input1', help='The name of the reference file if using history or reference base name if using cached' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	132 parser.add_option( '', '--input2', dest='input2', help='The reads file to align' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	133 parser.add_option( '', '--pre_set_options', dest='pre_set_options', help='Which of the pre set options to use, if using pre-sets' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	134 parser.add_option( '', '--strand', dest='strand', help='Which strand of the read to search, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	135 parser.add_option( '', '--seed', dest='seed', help='Seeding settings, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	136 parser.add_option( '', '--transition', dest='transition', help='Number of transitions to allow in each seed hit, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	137 parser.add_option( '', '--gfextend', dest='gfextend', help='Whether to perform gap-free extension of seed hits to HSPs (high scoring segment pairs), if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	138 parser.add_option( '', '--chain', dest='chain', help='Whether to perform chaining of HSPs, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	139 parser.add_option( '', '--O', dest='O', help='Gap opening penalty, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	140 parser.add_option( '', '--E', dest='E', help='Gap extension penalty, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	141 parser.add_option( '', '--X', dest='X', help='X-drop threshold, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	142 parser.add_option( '', '--Y', dest='Y', help='Y-drop threshold, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	143 parser.add_option( '', '--K', dest='K', help='Threshold for HSPs, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	144 parser.add_option( '', '--L', dest='L', help='Threshold for gapped alignments, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	145 parser.add_option( '', '--entropy', dest='entropy', help='Whether to involve entropy when filtering HSPs, if specifying all parameters' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	146 parser.add_option( '', '--identity_min', dest='identity_min', help="Minimum identity (don't report matches under this identity)" )
9071e359b9a3 Uploaded xuebing parents: diff changeset	147 parser.add_option( '', '--identity_max', dest='identity_max', help="Maximum identity (don't report matches above this identity)" )
9071e359b9a3 Uploaded xuebing parents: diff changeset	148 parser.add_option( '', '--coverage', dest='coverage', help="The minimum coverage value (don't report matches covering less than this)" )
9071e359b9a3 Uploaded xuebing parents: diff changeset	149 parser.add_option( '', '--unmask', dest='unmask', help='Whether to convert lowercase bases to uppercase' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	150 parser.add_option( '', '--out_format', dest='format', help='The format of the output file (sam, diffs, or tabular (general))' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	151 parser.add_option( '', '--output', dest='output', help='The output file' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	152 parser.add_option( '', '--lastzSeqsFileDir', dest='lastzSeqsFileDir', help='Directory of local lastz_seqs.loc file' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	153 ( options, args ) = parser.parse_args()
9071e359b9a3 Uploaded xuebing parents: diff changeset	154
9071e359b9a3 Uploaded xuebing parents: diff changeset	155 # output version # of tool
9071e359b9a3 Uploaded xuebing parents: diff changeset	156 try:
9071e359b9a3 Uploaded xuebing parents: diff changeset	157 tmp = tempfile.NamedTemporaryFile().name
9071e359b9a3 Uploaded xuebing parents: diff changeset	158 tmp_stdout = open( tmp, 'wb' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	159 proc = subprocess.Popen( args='lastz -v', shell=True, stdout=tmp_stdout )
9071e359b9a3 Uploaded xuebing parents: diff changeset	160 tmp_stdout.close()
9071e359b9a3 Uploaded xuebing parents: diff changeset	161 returncode = proc.wait()
9071e359b9a3 Uploaded xuebing parents: diff changeset	162 stdout = None
9071e359b9a3 Uploaded xuebing parents: diff changeset	163 for line in open( tmp_stdout.name, 'rb' ):
9071e359b9a3 Uploaded xuebing parents: diff changeset	164 if line.lower().find( 'version' ) >= 0:
9071e359b9a3 Uploaded xuebing parents: diff changeset	165 stdout = line.strip()
9071e359b9a3 Uploaded xuebing parents: diff changeset	166 break
9071e359b9a3 Uploaded xuebing parents: diff changeset	167 if stdout:
9071e359b9a3 Uploaded xuebing parents: diff changeset	168 sys.stdout.write( '%s\n' % stdout )
9071e359b9a3 Uploaded xuebing parents: diff changeset	169 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	170 raise Exception
9071e359b9a3 Uploaded xuebing parents: diff changeset	171 except:
9071e359b9a3 Uploaded xuebing parents: diff changeset	172 sys.stdout.write( 'Could not determine Lastz version\n' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	173
9071e359b9a3 Uploaded xuebing parents: diff changeset	174 if options.unmask == 'yes':
9071e359b9a3 Uploaded xuebing parents: diff changeset	175 unmask = '[unmask]'
9071e359b9a3 Uploaded xuebing parents: diff changeset	176 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	177 unmask = ''
9071e359b9a3 Uploaded xuebing parents: diff changeset	178 if options.ref_name:
9071e359b9a3 Uploaded xuebing parents: diff changeset	179 ref_name = '[nickname=%s]' % options.ref_name
9071e359b9a3 Uploaded xuebing parents: diff changeset	180 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	181 ref_name = ''
9071e359b9a3 Uploaded xuebing parents: diff changeset	182 # Prepare for commonly-used preset options
9071e359b9a3 Uploaded xuebing parents: diff changeset	183 if options.source_select == 'pre_set':
9071e359b9a3 Uploaded xuebing parents: diff changeset	184 set_options = '--%s' % options.pre_set_options
9071e359b9a3 Uploaded xuebing parents: diff changeset	185 # Prepare for user-specified options
9071e359b9a3 Uploaded xuebing parents: diff changeset	186 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	187 set_options = '--%s --%s --gapped --strand=%s --seed=%s --%s O=%s E=%s X=%s Y=%s K=%s L=%s --%s' % \
9071e359b9a3 Uploaded xuebing parents: diff changeset	188 ( options.gfextend, options.chain, options.strand, options.seed, options.transition,
9071e359b9a3 Uploaded xuebing parents: diff changeset	189 options.O, options.E, options.X, options.Y, options.K, options.L, options.entropy )
9071e359b9a3 Uploaded xuebing parents: diff changeset	190 # Specify input2 and add [fullnames] modifier if output format is diffs
9071e359b9a3 Uploaded xuebing parents: diff changeset	191 if options.format == 'diffs':
9071e359b9a3 Uploaded xuebing parents: diff changeset	192 input2 = '%s[fullnames]' % options.input2
9071e359b9a3 Uploaded xuebing parents: diff changeset	193 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	194 input2 = options.input2
9071e359b9a3 Uploaded xuebing parents: diff changeset	195 if options.format == 'tabular':
9071e359b9a3 Uploaded xuebing parents: diff changeset	196 # Change output format to general if it's tabular and add field names for tabular output
9071e359b9a3 Uploaded xuebing parents: diff changeset	197 format = 'general-'
9071e359b9a3 Uploaded xuebing parents: diff changeset	198 tabular_fields = ':score,name1,strand1,size1,start1,zstart1,end1,length1,text1,name2,strand2,size2,start2,zstart2,end2,start2+,zstart2+,end2+,length2,text2,diff,cigar,identity,coverage,gaprate,diagonal,shingle'
9071e359b9a3 Uploaded xuebing parents: diff changeset	199 elif options.format == 'sam':
9071e359b9a3 Uploaded xuebing parents: diff changeset	200 # We currently ALWAYS suppress SAM headers.
9071e359b9a3 Uploaded xuebing parents: diff changeset	201 format = 'sam-'
9071e359b9a3 Uploaded xuebing parents: diff changeset	202 tabular_fields = ''
9071e359b9a3 Uploaded xuebing parents: diff changeset	203 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	204 format = options.format
9071e359b9a3 Uploaded xuebing parents: diff changeset	205 tabular_fields = ''
9071e359b9a3 Uploaded xuebing parents: diff changeset	206
9071e359b9a3 Uploaded xuebing parents: diff changeset	207 # Set up our queues
9071e359b9a3 Uploaded xuebing parents: diff changeset	208 lastz_job_queue = LastzJobQueue( WORKERS, slots=SLOTS )
9071e359b9a3 Uploaded xuebing parents: diff changeset	209 combine_data_queue = CombineDataQueue( options.output )
9071e359b9a3 Uploaded xuebing parents: diff changeset	210
9071e359b9a3 Uploaded xuebing parents: diff changeset	211 if options.ref_source == 'history':
9071e359b9a3 Uploaded xuebing parents: diff changeset	212 # Reference is a fasta dataset from the history, so split job across
9071e359b9a3 Uploaded xuebing parents: diff changeset	213 # the number of sequences in the dataset ( this could be a HUGE number )
9071e359b9a3 Uploaded xuebing parents: diff changeset	214 try:
9071e359b9a3 Uploaded xuebing parents: diff changeset	215 # Ensure there is at least 1 sequence in the dataset ( this may not be necessary ).
9071e359b9a3 Uploaded xuebing parents: diff changeset	216 error_msg = "The reference dataset is missing metadata, click the pencil icon in the history item and 'auto-detect' the metadata attributes."
9071e359b9a3 Uploaded xuebing parents: diff changeset	217 ref_sequences = int( options.ref_sequences )
9071e359b9a3 Uploaded xuebing parents: diff changeset	218 if ref_sequences < 1:
9071e359b9a3 Uploaded xuebing parents: diff changeset	219 stop_queues( lastz_job_queue, combine_data_queue )
9071e359b9a3 Uploaded xuebing parents: diff changeset	220 stop_err( error_msg )
9071e359b9a3 Uploaded xuebing parents: diff changeset	221 except:
9071e359b9a3 Uploaded xuebing parents: diff changeset	222 stop_queues( lastz_job_queue, combine_data_queue )
9071e359b9a3 Uploaded xuebing parents: diff changeset	223 stop_err( error_msg )
9071e359b9a3 Uploaded xuebing parents: diff changeset	224 seqs = 0
9071e359b9a3 Uploaded xuebing parents: diff changeset	225 fasta_reader = FastaReader( open( options.input1 ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	226 while True:
9071e359b9a3 Uploaded xuebing parents: diff changeset	227 # Read the next sequence from the reference dataset
9071e359b9a3 Uploaded xuebing parents: diff changeset	228 seq = fasta_reader.next()
9071e359b9a3 Uploaded xuebing parents: diff changeset	229 if not seq:
9071e359b9a3 Uploaded xuebing parents: diff changeset	230 break
9071e359b9a3 Uploaded xuebing parents: diff changeset	231 seqs += 1
9071e359b9a3 Uploaded xuebing parents: diff changeset	232 # Create a temporary file to contain the current sequence as input to lastz
9071e359b9a3 Uploaded xuebing parents: diff changeset	233 tmp_in_fd, tmp_in_name = tempfile.mkstemp( suffix='.in' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	234 tmp_in = os.fdopen( tmp_in_fd, 'wb' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	235 # Write the current sequence to the temporary input file
9071e359b9a3 Uploaded xuebing parents: diff changeset	236 tmp_in.write( '>%s\n%s\n' % ( seq.name, seq.text ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	237 tmp_in.close()
9071e359b9a3 Uploaded xuebing parents: diff changeset	238 # Create a 2nd temporary file to contain the output from lastz execution on the current sequence
9071e359b9a3 Uploaded xuebing parents: diff changeset	239 tmp_out_fd, tmp_out_name = tempfile.mkstemp( suffix='.out' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	240 os.close( tmp_out_fd )
9071e359b9a3 Uploaded xuebing parents: diff changeset	241 # Generate the command line for calling lastz on the current sequence
9071e359b9a3 Uploaded xuebing parents: diff changeset	242 command = 'lastz %s%s%s %s %s --ambiguousn --nolaj --identity=%s..%s --coverage=%s --format=%s%s > %s' % \
9071e359b9a3 Uploaded xuebing parents: diff changeset	243 ( tmp_in_name, unmask, ref_name, input2, set_options, options.identity_min,
9071e359b9a3 Uploaded xuebing parents: diff changeset	244 options.identity_max, options.coverage, format, tabular_fields, tmp_out_name )
9071e359b9a3 Uploaded xuebing parents: diff changeset	245 # Create a job object
9071e359b9a3 Uploaded xuebing parents: diff changeset	246 job = Bunch()
9071e359b9a3 Uploaded xuebing parents: diff changeset	247 job.command = command
9071e359b9a3 Uploaded xuebing parents: diff changeset	248 job.output = tmp_out_name
9071e359b9a3 Uploaded xuebing parents: diff changeset	249 job.cleanup = [ tmp_in_name, tmp_out_name ]
9071e359b9a3 Uploaded xuebing parents: diff changeset	250 job.combine_data_queue = combine_data_queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	251 # Add another job to the lastz_job_queue. Execution
9071e359b9a3 Uploaded xuebing parents: diff changeset	252 # will wait at this point if the queue is full.
9071e359b9a3 Uploaded xuebing parents: diff changeset	253 lastz_job_queue.put( job, block=True )
9071e359b9a3 Uploaded xuebing parents: diff changeset	254 # Make sure the value of sequences in the metadata is the same as the
9071e359b9a3 Uploaded xuebing parents: diff changeset	255 # number of sequences read from the dataset ( this may not be necessary ).
9071e359b9a3 Uploaded xuebing parents: diff changeset	256 if ref_sequences != seqs:
9071e359b9a3 Uploaded xuebing parents: diff changeset	257 stop_queues( lastz_job_queue, combine_data_queue )
9071e359b9a3 Uploaded xuebing parents: diff changeset	258 stop_err( "The value of metadata.sequences (%d) differs from the number of sequences read from the reference (%d)." % ( ref_sequences, seqs ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	259 else:
9071e359b9a3 Uploaded xuebing parents: diff changeset	260 # Reference is a locally cached 2bit file, split job across number of chroms in 2bit file
9071e359b9a3 Uploaded xuebing parents: diff changeset	261 tbf = TwoBitFile( open( options.input1, 'r' ) )
9071e359b9a3 Uploaded xuebing parents: diff changeset	262 for chrom in tbf.keys():
9071e359b9a3 Uploaded xuebing parents: diff changeset	263 # Create a temporary file to contain the output from lastz execution on the current chrom
9071e359b9a3 Uploaded xuebing parents: diff changeset	264 tmp_out_fd, tmp_out_name = tempfile.mkstemp( suffix='.out' )
9071e359b9a3 Uploaded xuebing parents: diff changeset	265 os.close( tmp_out_fd )
9071e359b9a3 Uploaded xuebing parents: diff changeset	266 command = 'lastz %s/%s%s%s %s %s --ambiguousn --nolaj --identity=%s..%s --coverage=%s --format=%s%s >> %s' % \
9071e359b9a3 Uploaded xuebing parents: diff changeset	267 ( options.input1, chrom, unmask, ref_name, input2, set_options, options.identity_min,
9071e359b9a3 Uploaded xuebing parents: diff changeset	268 options.identity_max, options.coverage, format, tabular_fields, tmp_out_name )
9071e359b9a3 Uploaded xuebing parents: diff changeset	269 # Create a job object
9071e359b9a3 Uploaded xuebing parents: diff changeset	270 job = Bunch()
9071e359b9a3 Uploaded xuebing parents: diff changeset	271 job.command = command
9071e359b9a3 Uploaded xuebing parents: diff changeset	272 job.output = tmp_out_name
9071e359b9a3 Uploaded xuebing parents: diff changeset	273 job.cleanup = [ tmp_out_name ]
9071e359b9a3 Uploaded xuebing parents: diff changeset	274 job.combine_data_queue = combine_data_queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	275 # Add another job to the lastz_job_queue. Execution
9071e359b9a3 Uploaded xuebing parents: diff changeset	276 # will wait at this point if the queue is full.
9071e359b9a3 Uploaded xuebing parents: diff changeset	277 lastz_job_queue.put( job, block=True )
9071e359b9a3 Uploaded xuebing parents: diff changeset	278
9071e359b9a3 Uploaded xuebing parents: diff changeset	279 # Stop the lastz_job_queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	280 for t in lastz_job_queue.threads:
9071e359b9a3 Uploaded xuebing parents: diff changeset	281 lastz_job_queue.put( STOP_SIGNAL, True )
9071e359b9a3 Uploaded xuebing parents: diff changeset	282 # Although all jobs are submitted to the queue, we can't shut down the combine_data_queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	283 # until we know that all jobs have been submitted to its queue. We do this by checking
9071e359b9a3 Uploaded xuebing parents: diff changeset	284 # whether all of the threads in the lastz_job_queue have terminated.
9071e359b9a3 Uploaded xuebing parents: diff changeset	285 while threading.activeCount() > 2:
9071e359b9a3 Uploaded xuebing parents: diff changeset	286 time.sleep( 1 )
9071e359b9a3 Uploaded xuebing parents: diff changeset	287 # Now it's safe to stop the combine_data_queue
9071e359b9a3 Uploaded xuebing parents: diff changeset	288 combine_data_queue.put( STOP_SIGNAL )
9071e359b9a3 Uploaded xuebing parents: diff changeset	289
9071e359b9a3 Uploaded xuebing parents: diff changeset	290 if __name__=="__main__": __main__()

Mercurial > repos > xuebing > sharplabtool

annotate tools/sr_mapping/lastz_wrapper.py @ 1:cdcb0ce84a1b