annotate RepEnrich_setup.py @ 11:6bba3e33c2e7 draft

planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/main/tools/repenrich commit 11df9f0b68d35d3a9424a17e4cefee6cfb9d4c19
author artbio
date Sat, 09 Mar 2024 22:32:46 +0000
parents f6f0f1e5e940
children 89e05f831259
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
1 #!/usr/bin/env python
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
2 import argparse
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
3 import csv
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
4 import os
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
5 import shlex
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
6 import subprocess
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
7 import sys
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
8
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
9 from Bio import SeqIO
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
10 from Bio.Seq import Seq
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
11 from Bio.SeqRecord import SeqRecord
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
12
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
13 parser = argparse.ArgumentParser(description='''
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
14 Part I: Prepartion of repetive element psuedogenomes and repetive\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
15 element bamfiles. This script prepares the annotation used by\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
16 downstream applications to analyze for repetitive element\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
17 enrichment. For this script to run properly bowtie must be\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
18 loaded. The repeat element psuedogenomes are prepared in order\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
19 to analyze reads that map to multiple locations of the genome.\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
20 The repeat element bamfiles are prepared in order to use a\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
21 region sorter to analyze reads that map to a single location\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
22 of the genome. You will 1) annotation_file:\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
23 The repetitive element annotation file downloaded from\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
24 RepeatMasker.org database for your organism of interest.\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
25 2) genomefasta: Your genome of interest in fasta format,\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
26 3)setup_folder: a folder to contain repeat element setup files\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
27 command-line usage
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
28 EXAMPLE: python master_setup.py\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
29 /users/nneretti/data/annotation/mm9/mm9_repeatmasker.txt\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
30 /users/nneretti/data/annotation/mm9/mm9.fa\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
31 /users/nneretti/data/annotation/mm9/setup_folder''',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
32 prog='getargs_genome_maker.py')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
33 parser.add_argument('--version', action='version', version='%(prog)s 0.1')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
34 parser.add_argument('annotation_file', action='store',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
35 metavar='annotation_file',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
36 help='''List annotation file. The annotation file contains\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
37 the repeat masker annotation for the genome of\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
38 interest and may be downloaded at RepeatMasker.org\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
39 Example /data/annotation/mm9/mm9.fa.out''')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
40 parser.add_argument('genomefasta', action='store', metavar='genomefasta',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
41 help='''File name and path for genome of interest in fasta\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
42 format. Example /data/annotation/mm9/mm9.fa''')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
43 parser.add_argument('setup_folder', action='store', metavar='setup_folder',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
44 help='''List folder to contain bamfiles for repeats and\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
45 repeat element psuedogenomes.\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
46 Example /data/annotation/mm9/setup''')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
47 parser.add_argument('--nfragmentsfile1', action='store',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
48 dest='nfragmentsfile1', metavar='nfragmentsfile1',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
49 default='./repnames_nfragments.txt',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
50 help='''Output location of a description file that saves\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
51 the number of fragments processed per repname.
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
52 Default ./repnames_nfragments.txt''')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
53 parser.add_argument('--gaplength', action='store', dest='gaplength',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
54 metavar='gaplength', default='200', type=int,
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
55 help='Length of the spacer used to build\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
56 repeat psuedogeneomes. Default 200')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
57 parser.add_argument('--flankinglength', action='store', dest='flankinglength',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
58 metavar='flankinglength', default='25', type=int,
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
59 help='Length of the flanking region adjacent to the repeat\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
60 element that is used to build repeat psuedogeneomes.\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
61 The flanking length should be set according to the\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
62 length of your reads. Default 25')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
63 parser.add_argument('--is_bed', action='store', dest='is_bed',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
64 metavar='is_bed', default='FALSE',
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
65 help='''Is the annotation file a bed file. This is also a\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
66 compatible format. The file needs to be a tab\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
67 separated bed with optional fields.
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
68 Ex. format:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
69 chr\tstart\tend\tName_element\tclass\tfamily.
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
70 The class and family should identical to name_element\
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
71 if not applicable. Default FALSE change to TRUE''')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
72 args = parser.parse_args()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
73
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
74 # parameters and paths specified in args_parse
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
75 gapl = args.gaplength
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
76 flankingl = args.flankinglength
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
77 annotation_file = args.annotation_file
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
78 genomefasta = args.genomefasta
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
79 setup_folder = args.setup_folder
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
80 nfragmentsfile1 = args.nfragmentsfile1
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
81 is_bed = args.is_bed
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
82
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
83 ##############################################################################
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
84 # check that the programs we need are available
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
85 try:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
86 subprocess.call(shlex.split("bowtie --version"),
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
87 stdout=open(os.devnull, 'wb'),
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
88 stderr=open(os.devnull, 'wb'))
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
89 except OSError:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
90 print("Error: Bowtie or BEDTools not loaded")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
91 raise
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
92
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
93 ##############################################################################
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
94 # Define a text importer
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
95 csv.field_size_limit(sys.maxsize)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
96
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
97
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
98 def import_text(filename, separator):
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
99 for line in csv.reader(open(os.path.realpath(filename)),
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
100 delimiter=separator, skipinitialspace=True):
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
101 if line:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
102 yield line
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
103
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
104
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
105 # Make a setup folder
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
106 if not os.path.exists(setup_folder):
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
107 os.makedirs(setup_folder)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
108 ##############################################################################
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
109 # load genome into dictionary
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
110 print("loading genome...")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
111 g = SeqIO.to_dict(SeqIO.parse(genomefasta, "fasta"))
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
112
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
113 print("Precomputing length of all chromosomes...")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
114 idxgenome = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
115 lgenome = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
116 genome = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
117 allchrs = g.keys()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
118 k = 0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
119 for chr in allchrs:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
120 genome[chr] = str(g[chr].seq)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
121 # del g[chr]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
122 lgenome[chr] = len(genome[chr])
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
123 idxgenome[chr] = k
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
124 k = k + 1
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
125 del g
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
126
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
127 ##############################################################################
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
128 # Build a bedfile of repeatcoordinates to use by RepEnrich region_sorter
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
129 if is_bed == "FALSE":
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
130 repeat_elements = []
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
131 fout = open(os.path.realpath(setup_folder + os.path.sep
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
132 + 'repnames.bed'), 'w')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
133 fin = import_text(annotation_file, ' ')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
134 x = 0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
135 rep_chr = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
136 rep_start = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
137 rep_end = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
138 x = 0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
139 for line in fin:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
140 if x > 2:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
141 line9 = line[9].replace("(", "_").replace(")",
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
142 "_").replace("/", "_")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
143 repname = line9
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
144 if repname not in repeat_elements:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
145 repeat_elements.append(repname)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
146 repchr = line[4]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
147 repstart = int(line[5])
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
148 repend = int(line[6])
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
149 fout.write(str(repchr) + '\t' + str(repstart) + '\t' + str(repend)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
150 + '\t' + str(repname) + '\n')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
151 if repname in rep_chr:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
152 rep_chr[repname].append(repchr)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
153 rep_start[repname].append(int(repstart))
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
154 rep_end[repname].append(int(repend))
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
155 else:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
156 rep_chr[repname] = [repchr]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
157 rep_start[repname] = [int(repstart)]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
158 rep_end[repname] = [int(repend)]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
159 x += 1
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
160 if is_bed == "TRUE":
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
161 repeat_elements = []
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
162 fout = open(os.path.realpath(setup_folder + os.path.sep + 'repnames.bed'),
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
163 'w')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
164 fin = open(os.path.realpath(annotation_file), 'r')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
165 x = 0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
166 rep_chr = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
167 rep_start = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
168 rep_end = {}
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
169 x = 0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
170 for line in fin:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
171 line = line.strip('\n')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
172 line = line.split('\t')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
173 line3 = line[3].replace("(", "_").replace(")", "_").replace("/", "_")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
174 repname = line3
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
175 if repname not in repeat_elements:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
176 repeat_elements.append(repname)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
177 repchr = line[0]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
178 repstart = int(line[1])
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
179 repend = int(line[2])
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
180 fout.write(str(repchr) + '\t' + str(repstart) + '\t' +
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
181 str(repend) + '\t' + str(repname) + '\n')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
182 # if rep_chr.has_key(repname):
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
183 if repname in rep_chr:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
184 rep_chr[repname].append(repchr)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
185 rep_start[repname].append(int(repstart))
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
186 rep_end[repname].append(int(repend))
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
187 else:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
188 rep_chr[repname] = [repchr]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
189 rep_start[repname] = [int(repstart)]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
190 rep_end[repname] = [int(repend)]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
191
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
192 fin.close()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
193 fout.close()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
194 repeat_elements = sorted(repeat_elements)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
195 print("Writing a key for all repeats...")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
196 # print to fout the binary key that contains each repeat type with the
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
197 # associated binary number; sort the binary key:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
198 fout = open(os.path.realpath(setup_folder + os.path.sep +
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
199 'repgenomes_key.txt'), 'w')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
200 x = 0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
201 for repeat in repeat_elements:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
202 # print >> fout, str(repeat) + '\t' + str(x)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
203 fout.write(str(repeat) + '\t' + str(x) + '\n')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
204 x += 1
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
205 fout.close()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
206 ##############################################################################
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
207 # generate spacer for psuedogenomes
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
208 spacer = ""
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
209 for i in range(gapl):
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
210 spacer = spacer + "N"
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
211
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
212 # save file with number of fragments processed per repname
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
213 print("Saving number of fragments processed per repname to "
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
214 + nfragmentsfile1)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
215 fout1 = open(os.path.realpath(nfragmentsfile1), "w")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
216 for repname in rep_chr.keys():
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
217 rep_chr_current = rep_chr[repname]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
218 # print >>fout1, str(len(rep_chr[repname])) + "\t" + repname
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
219 fout1.write(str(len(rep_chr[repname])) + "\t" + repname + '\n')
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
220 fout1.close()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
221
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
222 # generate metagenomes and save them to FASTA files
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
223 k = 1
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
224 nrepgenomes = len(rep_chr.keys())
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
225 for repname in rep_chr.keys():
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
226 metagenome = ""
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
227 newname = repname.replace("(", "_").replace(")", "_").replace("/", "_")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
228 print("processing repgenome " + newname + ".fa" + " (" + str(k)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
229 + " of " + str(nrepgenomes) + ")")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
230 rep_chr_current = rep_chr[repname]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
231 rep_start_current = rep_start[repname]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
232 rep_end_current = rep_end[repname]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
233 print("-------> " + str(len(rep_chr[repname])) + " fragments")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
234 for i in range(len(rep_chr[repname])):
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
235 try:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
236 chr = rep_chr_current[i]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
237 rstart = max(rep_start_current[i] - flankingl, 0)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
238 rend = min(rep_end_current[i] + flankingl, lgenome[chr]-1)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
239 metagenome = metagenome + spacer + genome[chr][rstart:(rend+1)]
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
240 except KeyError:
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
241 print("Unrecognised Chromosome: "+chr)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
242 pass
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
243 # Convert metagenome to SeqRecord object (required by SeqIO.write)
11
6bba3e33c2e7 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/main/tools/repenrich commit 11df9f0b68d35d3a9424a17e4cefee6cfb9d4c19
artbio
parents: 0
diff changeset
244 record = SeqRecord(Seq(metagenome), id="repname",
0
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
245 name="", description="")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
246 print("saving repgenome " + newname + ".fa" + " (" + str(k) + " of "
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
247 + str(nrepgenomes) + ")")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
248 fastafilename = os.path.realpath(setup_folder + os.path.sep
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
249 + newname + ".fa")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
250 SeqIO.write(record, fastafilename, "fasta")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
251 print("indexing repgenome " + newname + ".fa" + " (" +
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
252 str(k) + " of " + str(nrepgenomes) + ")")
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
253 command = shlex.split('bowtie-build -f ' + fastafilename + ' ' +
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
254 setup_folder + os.path.sep + newname)
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
255 p = subprocess.Popen(command).communicate()
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
256 k += 1
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
257
f6f0f1e5e940 planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff changeset
258 print("... Done")