annotate find_str.py @ 1:9669aa9b75f4 draft

planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
author fubar
date Sat, 13 Jul 2024 23:05:50 +0000
parents 4ff60fb9ca4d
children 01c16e8fbc91
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
1 import argparse
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
2
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
3 import pytrf # 1.3.0
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
4 from pyfastx import Fastx # 0.5.2
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
5
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
6 """
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
7 Allows all STR or those for a subset of motifs to be written to a bed file
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
8 Designed to build some of the microsatellite tracks from https://github.com/arangrhie/T2T-Polish/tree/master/pattern for the VGP.
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
9 """
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
10
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
11
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
12 def write_ssrs(args):
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
13 """
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
14 The integers in the call change the minimum repeats for mono-, di-, tri-, tetra-, penta-, hexa-nucleotide repeats
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
15 ssrs = pytrf.STRFinder(name, seq, 10, 6, 4, 3, 3, 3)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
16 NOTE: Dinucleotides GA and AG are reported separately by https://github.com/marbl/seqrequester.
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
17 The reversed pair STRs are about as common in the documentation sample.
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
18 Sequence read bias might be influenced by GC density or some other specific motif.
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
19 """
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
20 bed = []
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
21 specific = None
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
22 if args.specific:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
23 specific = args.specific.upper().split(",")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
24 fa = Fastx(args.fasta, uppercase=True)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
25 for name, seq in fa:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
26 if args.specific:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
27 ssrs = pytrf.STRFinder(
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
28 name,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
29 seq,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
30 args.minreps,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
31 args.minreps,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
32 args.minreps,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
33 args.minreps,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
34 args.minreps,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
35 args.minreps,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
36 )
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
37 else:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
38 ssrs = pytrf.STRFinder(
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
39 name,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
40 seq,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
41 args.monomin,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
42 args.dimin,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
43 args.trimin,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
44 args.tetramin,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
45 args.pentamin,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
46 args.hexamin,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
47 )
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
48 for ssr in ssrs:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
49 row = (
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
50 ssr.chrom,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
51 ssr.start - 1,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
52 ssr.end,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
53 ssr.motif,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
54 ssr.repeat,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
55 ssr.length,
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
56 )
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
57 # pytrf reports a 1 based start position so start-1 fixes the bed interval lengths
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
58 if args.specific and ssr.motif in specific:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
59 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
60 elif args.mono and len(ssr.motif) == 1:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
61 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
62 elif args.di and len(ssr.motif) == 2:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
63 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
64 elif args.tri and len(ssr.motif) == 3:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
65 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
66 elif args.tetra and len(ssr.motif) == 4:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
67 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
68 elif args.penta and len(ssr.motif) == 5:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
69 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
70 elif args.hexa and len(ssr.motif) == 6:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
71 bed.append(row)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
72 bed.sort()
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
73 obed = ["%s\t%d\t%d\t%s_%d\t%d" % x for x in bed]
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
74 with open(args.bed, "w") as outbed:
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
75 outbed.write("\n".join(obed))
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
76 outbed.write("\n")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
77
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
78
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
79 if __name__ == "__main__":
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
80 parser = argparse.ArgumentParser()
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
81 a = parser.add_argument
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
82 a("--di", action="store_true")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
83 a("--tri", action="store_true")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
84 a("--tetra", action="store_true")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
85 a("--penta", action="store_true")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
86 a("--hexa", action="store_true")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
87 a("--mono", action="store_true")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
88 a("--dimin", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
89 a("--trimin", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
90 a("--tetramin", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
91 a("--pentamin", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
92 a("--hexamin", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
93 a("--monomin", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
94 a("-f", "--fasta", default="humsamp.fa")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
95 a("-b", "--bed", default="humsamp.bed")
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
96 a("--specific", default=None)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
97 a("--minreps", default=2, type=int)
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
98 args = parser.parse_args()
4ff60fb9ca4d planemo upload for repository https://github.com/fubar2/microsatbed commit 7ceb6658309a7ababe622b5d92e729e5470e22f0-dirty
fubar
parents:
diff changeset
99 write_ssrs(args)