annotate design_primers.py @ 1:a0689dc29b7f draft

Updated vcf to gff conversion tool
author john-mccallum
date Tue, 31 Jul 2012 00:33:11 -0400
parents 21053f7f9ed1
children b321e0517be3
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
1 #!/usr/local/bin/python2.6
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
2 ##design primers to features in multiple sequences
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
3 ##usage: python design_primers.py <fasta-file> <gff file> <file of target IDs> <prod_min_size> <prod_max_size>
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
4
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
5
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
6 #Copyright 2012 John McCallum & Leshi Chen
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
7 #New Zealand Institute for Plant and Food Research
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
8 #This program is free software: you can redistribute it and/or modify
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
9 # it under the terms of the GNU General Public License as published by
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
10 # the Free Software Foundation, either version 3 of the License, or
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
11 # (at your option) any later version.
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
12 #
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
13 # This program is distributed in the hope that it will be useful,
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
14 # but WITHOUT ANY WARRANTY; without even the implied warranty of
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
16 # GNU General Public License for more details.
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
17 #
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
18 # You should have received a copy of the GNU General Public License
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
19 # along with this program. If not, see <http://www.gnu.org/licenses/>.
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
20
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
21
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
22 import os
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
23 import StringIO
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
24 import re
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
25 import tempfile
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
26 import subprocess
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
27 import copy
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
28 import sys
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
29 from BCBio import GFF
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
30 from BCBio.GFF import GFFExaminer
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
31 from Bio import SeqIO
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
32 from Bio.Emboss.Applications import Primer3Commandline
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
33 from Bio.Emboss import Primer3
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
34
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
35
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
36 in_file = sys.argv[1]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
37 gff_file = sys.argv[2]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
38 target_file = sys.argv[3]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
39 prod_min_size = int(sys.argv[4])
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
40 prod_max_size = int(sys.argv[5])
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
41
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
42 max_tm_diff=1 ##
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
43 opt_GC_percent=50 ##
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
44 maxpolyx=4 ##
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
45 optimum_length=20
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
46 ##target is specified in start, end format
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
47 productsizerange = str(prod_min_size) + "," + str(prod_max_size)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
48 #open input files
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
49 in_seq_handle = open(in_file)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
50 in_gff_handle = open(gff_file)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
51 in_target_handle=open(target_file)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
52 #read target feature IDs into list
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
53 targets=in_target_handle.readlines()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
54 in_target_handle.close()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
55 ##and create a hit list of sequences from this
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
56 target_seq_id_list = list(set([line.split(":")[0] for line in targets]))
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
57 ##create iterator returning sequence records
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
58 for myrec in SeqIO.parse(in_seq_handle, "fasta"):
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
59 #check if this sequence is included in the target list
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
60 if myrec.id in target_seq_id_list:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
61 ##create sequence dictionary so we can add in gff annotations
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
62 seq_dict = {myrec.id : myrec}
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
63 ##just limit to gff annotations for this sequence
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
64 limit_info = dict(gff_id = [ myrec.id ])
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
65 ##rewind gff filehandle
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
66 in_gff_handle.seek(0)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
67 ##read annotations into sequence dictionary for this sequence record only
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
68 annotations = [r for r in GFF.parse(in_gff_handle, base_dict=seq_dict,limit_info=limit_info)]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
69 ##if there are any annotations, then proceed.
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
70 if annotations:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
71 rec=annotations[0]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
72 ##iterate over list of target IDs
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
73 for t in targets:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
74 target_ID = t.strip('\n')
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
75 target_annotations = [f for f in rec.features if f.id == target_ID ]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
76 if target_annotations:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
77 mytarget = target_annotations[0]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
78 #create temporary files
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
79 tempfastaFile = tempfile.mktemp()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
80 tempproutfile = tempfile.mktemp()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
81 #just consider slice of sequence in a window of +/- prod_max_size bp
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
82 ##from feature UNLESS feature is close to end
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
83 ##Note that slice is zero-based
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
84 featLocation = mytarget.location.start.position
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
85 if featLocation > prod_max_size:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
86 slice_start = featLocation - prod_max_size
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
87 featPosition = prod_max_size
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
88 else:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
89 slice_start = 0
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
90 featPosition = featLocation
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
91 if (len(rec) - featLocation) < prod_max_size:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
92 slice_end = len(rec)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
93 else:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
94 slice_end = featLocation + prod_max_size
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
95 ###grab slice of sequence fom this window.
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
96 targetRec = rec[slice_start:slice_end]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
97 matching_feature = [f for f in targetRec.features if f.id == mytarget.id]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
98 if matching_feature:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
99 target_feat = matching_feature[0]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
100 if target_feat.location.start.position == 0:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
101 target_feat.location.start.position = 1
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
102 #we get the mask features by removing the target...all features are masked as just using snp and indels
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
103 ##a smarter filter could be added
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
104 ##note use of list copy to avoid possible side-effects
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
105 exclude_feat = list(targetRec.features)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
106 exclude_feat.remove(target_feat)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
107 ##print'targetRec.features', targetRec.features ##for debug
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
108 mask_str=map(lambda f: str(f.location.start.position+1) + "," + str(f.location.end.position + 1) ,exclude_feat)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
109 #mask_str=map(lambda f: str(f.location).strip('[]'),exclude_feat)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
110 p3_exclude_str = str(mask_str).replace('\', \'',':')
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
111 p3_target = str(target_feat.location.start.position +1) + "," + str(target_feat.location.end.position +1)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
112 #write sequence record into template file as fasta
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
113 t_output_handle = open(tempfastaFile, "w")
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
114 SeqIO.write([targetRec], t_output_handle, "fasta")
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
115 t_output_handle.close()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
116 #create Primer3Commandline() for emboss tool
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
117 primer_cl = Primer3Commandline()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
118 #set the emboss tool to suppress output as this will make Galaxy think it is error message although it is a message to state run success
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
119 primer_cl.set_parameter("-auto",'1')
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
120 #pass sequence file to emboss
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
121 primer_cl.set_parameter("-sequence",tempfastaFile)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
122 #add target location
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
123 primer_cl.set_parameter("-target", p3_target)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
124 ##mask off other features...dumb masking of everything at present, beware
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
125 if (p3_exclude_str != ""):
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
126 primer_cl.set_parameter("-excludedregion", p3_exclude_str)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
127 #add temporary output file to get the result
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
128 primer_cl.set_parameter("-outfile", tempproutfile)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
129 #specify maximum different of tm
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
130 primer_cl.set_parameter("-maxdifftm",max_tm_diff )
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
131 #other useful parameters
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
132 primer_cl.set_parameter("-ogcpercent", opt_GC_percent)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
133 primer_cl.set_parameter("-opolyxmax", maxpolyx)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
134 primer_cl.set_parameter("-osize", optimum_length)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
135 #set product size range
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
136 primer_cl.set_parameter("-prange", productsizerange)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
137 #using python subprocess method to run emboss command line programe with the parameters given
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
138 fnull = open(os.devnull, 'w')
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
139 result=subprocess.check_call(str(primer_cl),shell=True ,stdout = fnull, stderr = fnull)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
140 #read temporary outputfile
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
141 handle = open(tempproutfile)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
142 record = Primer3.read(handle)
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
143 ##just return first set, if there is one
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
144 if len(record.primers) > 0:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
145 primer= record.primers[0]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
146 outputstr=[mytarget.id, primer.forward_seq,primer.reverse_seq,primer.size]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
147 else:
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
148 outputstr=[mytarget.id,"NONE","NONE","NONE"]
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
149 print('\t'.join(map(str,outputstr)))
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
150
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
151
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
152 in_gff_handle.close()
21053f7f9ed1 First upload of PCR Marker tools
john-mccallum
parents:
diff changeset
153 in_seq_handle.close()