tmhmm_and_signalp: tools/protein_analysis/tmhmm2.py comparison

comparison tools/protein_analysis/tmhmm2.py @ 1:3ff1dcbb9440

Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository

author	peterjc
date	Tue, 07 Jun 2011 18:04:05 -0400
parents	bca9bc7fdaef
children	6901298ac16c

comparison

equal deleted inserted replaced

-:bca9bc7fdaef
+:3ff1dcbb9440
 The second major potential feature is taking advantage of multiple cores
 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file
 into chunks and running multiple copies of TMHMM in parallel. I would normally
 use Python's multiprocessing library in this situation but it requires at
 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
+Also tmhmm2 can fail without returning an error code, for example if run on a
+64 bit machine with only the 32 bit binaries installed. This script will spot
+when there is no output from tmhmm2, and raise an error.
 """
 import sys
 import os
 from seq_analysis_utils import stop_err, split_fasta, run_jobs
 stop_err("Threads argument %s is not a positive integer" % sys.argv[1])
 fasta_file = sys.argv[2]
 tabular_file = sys.argv[3]
 def clean_tabular(raw_handle, out_handle):
-"""Clean up tabular TMHMM output."""
+"""Clean up tabular TMHMM output, returns output line count."""
+count = 0
 for line in raw_handle:
 if not line:
 continue
 parts = line.rstrip("\r\n").split("\t")
 try:
 first60 = first60[8:]
 assert predhel.startswith("PredHel="), line
 predhel = predhel[8:]
 assert topology.startswith("Topology="), line
 topology = topology[9:]
-	out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n" \
+out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n" \
 % (identifier, length, expAA, first60, predhel, topology))
+count += 1
+return count
+#Note that if the input FASTA file contains no sequences,
+#split_fasta returns an empty list (i.e. zero temp files).
 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK)
 temp_files = [f+".out" for f in fasta_files]
 jobs = ["tmhmm %s > %s" % (fasta, temp)
 for fasta, temp in zip(fasta_files, temp_files)]
 out_handle = open(tabular_file, "w")
 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n")
 for temp in temp_files:
 data_handle = open(temp)
-clean_tabular(data_handle, out_handle)
+count = clean_tabular(data_handle, out_handle)
 data_handle.close()
+if not count:
+clean_up(fasta_files)
+clean_up(temp_files)
+stop_err("No output from tmhmm2")
 out_handle.close()
 clean_up(fasta_files)
 clean_up(temp_files)

Mercurial > repos > peterjc > tmhmm_and_signalp

comparison tools/protein_analysis/tmhmm2.py @ 1:3ff1dcbb9440