Mercurial > repos > peterjc > tmhmm_and_signalp
comparison tools/protein_analysis/tmhmm2.py @ 2:6901298ac16c
Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
author | peterjc |
---|---|
date | Tue, 07 Jun 2011 18:04:39 -0400 |
parents | 3ff1dcbb9440 |
children | 9b45a8743100 |
comparison
equal
deleted
inserted
replaced
1:3ff1dcbb9440 | 2:6901298ac16c |
---|---|
4 This script takes exactly two command line arguments - an input protein FASTA | 4 This script takes exactly two command line arguments - an input protein FASTA |
5 filename and an output tabular filename. It then calls the standalone TMHMM | 5 filename and an output tabular filename. It then calls the standalone TMHMM |
6 v2.0 program (not the webservice) requesting the short output (one line per | 6 v2.0 program (not the webservice) requesting the short output (one line per |
7 protein). | 7 protein). |
8 | 8 |
9 First major feature is cleaning up the tabular output. The raw output from | 9 The first major feature is cleaning up the tabular output. The short form raw |
10 TMHMM v2.0 looks like this (six columns tab separated): | 10 output from TMHMM v2.0 looks like this (six columns tab separated): |
11 | 11 |
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o | 12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o |
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o | 13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o |
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o | 14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o |
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i | 15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i |
16 | |
17 If there are any additional 'comment' lines starting with the hash (#) | |
18 character these are ignored by this script. | |
16 | 19 |
17 In order to make it easier to use in Galaxy, this wrapper script simplifies | 20 In order to make it easier to use in Galaxy, this wrapper script simplifies |
18 this to remove the redundant tags, and instead adds a comment line at the | 21 this to remove the redundant tags, and instead adds a comment line at the |
19 top with the column names: | 22 top with the column names: |
20 | 23 |
53 | 56 |
54 def clean_tabular(raw_handle, out_handle): | 57 def clean_tabular(raw_handle, out_handle): |
55 """Clean up tabular TMHMM output, returns output line count.""" | 58 """Clean up tabular TMHMM output, returns output line count.""" |
56 count = 0 | 59 count = 0 |
57 for line in raw_handle: | 60 for line in raw_handle: |
58 if not line: | 61 if not line.strip() or line.startswith("#"): |
62 #Ignore any blank lines or comment lines | |
59 continue | 63 continue |
60 parts = line.rstrip("\r\n").split("\t") | 64 parts = line.rstrip("\r\n").split("\t") |
61 try: | 65 try: |
62 identifier, length, expAA, first60, predhel, topology = parts | 66 identifier, length, expAA, first60, predhel, topology = parts |
63 except: | 67 except: |
80 | 84 |
81 #Note that if the input FASTA file contains no sequences, | 85 #Note that if the input FASTA file contains no sequences, |
82 #split_fasta returns an empty list (i.e. zero temp files). | 86 #split_fasta returns an empty list (i.e. zero temp files). |
83 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK) | 87 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK) |
84 temp_files = [f+".out" for f in fasta_files] | 88 temp_files = [f+".out" for f in fasta_files] |
85 jobs = ["tmhmm %s > %s" % (fasta, temp) | 89 jobs = ["tmhmm -short %s > %s" % (fasta, temp) |
86 for fasta, temp in zip(fasta_files, temp_files)] | 90 for fasta, temp in zip(fasta_files, temp_files)] |
87 | 91 |
88 def clean_up(file_list): | 92 def clean_up(file_list): |
89 for f in file_list: | 93 for f in file_list: |
90 if os.path.isfile(f): | 94 if os.path.isfile(f): |