annotate tools/protein_analysis/psortb.py @ 22:e1afa4b0b682 draft

"This is v0.2.12 with black formating and Python 3 next fix etc"
author peterjc
date Thu, 17 Jun 2021 08:34:58 +0000
parents 238eae32483c
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
2 """Wrapper for psortb for use in Galaxy.
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
3
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
4 This script takes exactly six command line arguments - which includes the
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
5 number of threads, and the input protein FASTA filename and output
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
6 tabular filename. It then splits up the FASTA input and calls multiple
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
7 copies of the standalone psortb v3 program, then collates the output.
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
8 e.g. Rather than this,
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
9
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
10 psort $type -c $cutoff -d $divergent -o long $sequence > $outfile
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
12 Call this:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
13
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
14 psort $threads $type $cutoff $divergent $sequence $outfile
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
15
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
16 If ommitting -c or -d options, set $cutoff and $divergent to zero or blank.
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
17
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
18 Note that this is somewhat redundant with job-splitting available in Galaxy
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
19 itself (see the SignalP XML file for settings), but both can be applied.
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
20
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
21 Additionally it ensures the header line (with the column names) starts
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
22 with a # character as used elsewhere in Galaxy.
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
23 """
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
24
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
25 from __future__ import print_function
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
26
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
27 import os
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
28 import sys
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
29 import tempfile
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
30
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
31 from seq_analysis_utils import run_jobs, split_fasta, thread_count
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
32
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
33 FASTA_CHUNK = 500
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
34
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
35 if "-v" in sys.argv or "--version" in sys.argv:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
36 """Return underlying PSORTb's version"""
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
37 sys.exit(os.system("psort --version"))
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
38
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
39 if len(sys.argv) != 8:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
40 sys.exit(
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
41 "Require 7 arguments, number of threads (int), type (e.g. archaea), "
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
42 "output (e.g. terse/normal/long), cutoff, divergent, input protein "
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
43 "FASTA file & output tabular file"
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
44 )
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
45
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
46 num_threads = thread_count(sys.argv[1], default=4)
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
47 org_type = sys.argv[2]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
48 out_type = sys.argv[3]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
49 cutoff = sys.argv[4]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
50 if cutoff.strip() and float(cutoff.strip()) != 0.0:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
51 cutoff = "-c %s" % cutoff
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
52 else:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
53 cutoff = ""
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
54 divergent = sys.argv[5]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
55 if divergent.strip() and float(divergent.strip()) != 0.0:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
56 divergent = "-d %s" % divergent
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
57 else:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
58 divergent = ""
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
59 fasta_file = sys.argv[6]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
60 tabular_file = sys.argv[7]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
61
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
62 if out_type == "terse":
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
63 header = ["SeqID", "Localization", "Score"]
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
64 elif out_type == "normal":
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
65 sys.exit("Normal output not implemented yet, sorry.")
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
66 elif out_type == "long":
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
67 if org_type == "-n":
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
68 # Gram negative bacteria
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
69 header = [
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
70 "SeqID",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
71 "CMSVM-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
72 "CMSVM-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
73 "CytoSVM-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
74 "CytoSVM-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
75 "ECSVM-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
76 "ECSVM-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
77 "ModHMM-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
78 "ModHMM-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
79 "Motif-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
80 "Motif-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
81 "OMPMotif-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
82 "OMPMotif-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
83 "OMSVM-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
84 "OMSVM-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
85 "PPSVM-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
86 "PPSVM-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
87 "Profile-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
88 "Profile-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
89 "SCL-BLAST-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
90 "SCL-BLAST-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
91 "SCL-BLASTe-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
92 "SCL-BLASTe-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
93 "Signal-_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
94 "Signal-_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
95 "Cytoplasmic_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
96 "CytoplasmicMembrane_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
97 "Periplasmic_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
98 "OuterMembrane_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
99 "Extracellular_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
100 "Final_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
101 "Final_Localization_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
102 "Final_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
103 "Secondary_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
104 "PSortb_Version",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
105 ]
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
106 elif org_type == "-p":
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
107 # Gram positive bacteria
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
108 header = [
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
109 "SeqID",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
110 "CMSVM+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
111 "CMSVM+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
112 "CWSVM+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
113 "CWSVM+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
114 "CytoSVM+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
115 "CytoSVM+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
116 "ECSVM+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
117 "ECSVM+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
118 "ModHMM+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
119 "ModHMM+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
120 "Motif+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
121 "Motif+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
122 "Profile+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
123 "Profile+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
124 "SCL-BLAST+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
125 "SCL-BLAST+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
126 "SCL-BLASTe+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
127 "SCL-BLASTe+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
128 "Signal+_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
129 "Signal+_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
130 "Cytoplasmic_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
131 "CytoplasmicMembrane_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
132 "Cellwall_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
133 "Extracellular_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
134 "Final_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
135 "Final_Localization_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
136 "Final_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
137 "Secondary_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
138 "PSortb_Version",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
139 ]
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
140 elif org_type == "-a":
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
141 # Archaea
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
142 header = [
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
143 "SeqID",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
144 "CMSVM_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
145 "CMSVM_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
146 "CWSVM_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
147 "CWSVM_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
148 "CytoSVM_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
149 "CytoSVM_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
150 "ECSVM_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
151 "ECSVM_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
152 "ModHMM_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
153 "ModHMM_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
154 "Motif_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
155 "Motif_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
156 "Profile_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
157 "Profile_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
158 "SCL-BLAST_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
159 "SCL-BLAST_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
160 "SCL-BLASTe_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
161 "SCL-BLASTe_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
162 "Signal_a_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
163 "Signal_a_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
164 "Cytoplasmic_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
165 "CytoplasmicMembrane_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
166 "Cellwall_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
167 "Extracellular_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
168 "Final_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
169 "Final_Localization_Details",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
170 "Final_Score",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
171 "Secondary_Localization",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
172 "PSortb_Version",
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
173 ]
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
174 else:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
175 sys.exit("Expected -n, -p or -a for the organism type, not %r" % org_type)
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
176 else:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
177 sys.exit("Expected terse, normal or long for the output type, not %r" % out_type)
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
178
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
179 tmp_dir = tempfile.mkdtemp()
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
180
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
181
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
182 def clean_tabular(raw_handle, out_handle):
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
183 """Clean up tabular TMHMM output, returns output line count."""
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
184 global header
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
185 count = 0
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
186 for line in raw_handle:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
187 if not line.strip() or line.startswith("#"):
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
188 # Ignore any blank lines or comment lines
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
189 continue
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
190 parts = [x.strip() for x in line.rstrip("\r\n").split("\t")]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
191 if parts == header:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
192 # Ignore the header line
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
193 continue
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
194 if not parts[-1] and len(parts) == len(header) + 1:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
195 # Ignore dummy blank extra column, e.g.
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
196 # "...2.0\t\tPSORTb version 3.0\t\n"
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
197 parts = parts[:-1]
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
198 assert len(parts) == len(header), "%i fields, not %i, in line:\n%r" % (
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
199 len(line),
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
200 len(header),
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
201 line,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
202 )
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
203 out_handle.write(line)
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
204 count += 1
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
205 return count
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
206
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
207
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
208 # Note that if the input FASTA file contains no sequences,
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
209 # split_fasta returns an empty list (i.e. zero temp files).
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
210 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
211 temp_files = [f + ".out" for f in fasta_files]
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
212 jobs = [
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
213 "psort %s %s %s -o %s %s > %s"
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
214 % (org_type, cutoff, divergent, out_type, fasta, temp)
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
215 for fasta, temp in zip(fasta_files, temp_files)
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
216 ]
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
217
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
218
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
219 def clean_up(file_list):
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
220 for f in file_list:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
221 if os.path.isfile(f):
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
222 os.remove(f)
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
223 try:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
224 os.rmdir(tmp_dir)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
225 except Exception:
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
226 pass
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
227
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
228
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
229 if len(jobs) > 1 and num_threads > 1:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
230 # A small "info" message for Galaxy to show the user.
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
231 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs)))
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
232 results = run_jobs(jobs, num_threads)
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
233 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
234 error_level = results[cmd]
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
235 if error_level:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
236 try:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
237 output = open(temp).readline()
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
238 except IOError:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
239 output = ""
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
240 clean_up(fasta_files + temp_files)
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
241 sys.exit(
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
242 "One or more tasks failed, e.g. %i from %r gave:\n%s"
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
243 % (error_level, cmd, output),
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
244 error_level,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
245 )
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
246 del results
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
247 del jobs
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
248
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
249 out_handle = open(tabular_file, "w")
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
250 out_handle.write("#%s\n" % "\t".join(header))
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
251 count = 0
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
252 for temp in temp_files:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
253 data_handle = open(temp)
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
254 count += clean_tabular(data_handle, out_handle)
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
255 data_handle.close()
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
256 if not count:
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
257 clean_up(fasta_files + temp_files)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
258 sys.exit("No output from psortb")
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
259 out_handle.close()
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
260 print("%i records" % count)
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
261
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff changeset
262 clean_up(fasta_files + temp_files)