Mercurial > repos > artbio > repenrich
annotate RepEnrich_setup.py @ 3:1c9810ba0638 draft
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 50a80e047ef74664d616a332f93c84f27cb6b7a0
author | artbio |
---|---|
date | Fri, 22 Sep 2017 03:19:23 -0400 |
parents | f6f0f1e5e940 |
children | 6bba3e33c2e7 |
rev | line source |
---|---|
0
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
1 #!/usr/bin/env python |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
2 import argparse |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
3 import csv |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
4 import os |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
5 import shlex |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
6 import subprocess |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
7 import sys |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
8 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
9 from Bio import SeqIO |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
10 from Bio.Alphabet import IUPAC |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
11 from Bio.Seq import Seq |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
12 from Bio.SeqRecord import SeqRecord |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
13 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
14 parser = argparse.ArgumentParser(description=''' |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
15 Part I: Prepartion of repetive element psuedogenomes and repetive\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
16 element bamfiles. This script prepares the annotation used by\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
17 downstream applications to analyze for repetitive element\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
18 enrichment. For this script to run properly bowtie must be\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
19 loaded. The repeat element psuedogenomes are prepared in order\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
20 to analyze reads that map to multiple locations of the genome.\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
21 The repeat element bamfiles are prepared in order to use a\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
22 region sorter to analyze reads that map to a single location\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
23 of the genome. You will 1) annotation_file:\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
24 The repetitive element annotation file downloaded from\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
25 RepeatMasker.org database for your organism of interest.\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
26 2) genomefasta: Your genome of interest in fasta format,\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
27 3)setup_folder: a folder to contain repeat element setup files\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
28 command-line usage |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
29 EXAMPLE: python master_setup.py\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
30 /users/nneretti/data/annotation/mm9/mm9_repeatmasker.txt\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
31 /users/nneretti/data/annotation/mm9/mm9.fa\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
32 /users/nneretti/data/annotation/mm9/setup_folder''', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
33 prog='getargs_genome_maker.py') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
34 parser.add_argument('--version', action='version', version='%(prog)s 0.1') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
35 parser.add_argument('annotation_file', action='store', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
36 metavar='annotation_file', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
37 help='''List annotation file. The annotation file contains\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
38 the repeat masker annotation for the genome of\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
39 interest and may be downloaded at RepeatMasker.org\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
40 Example /data/annotation/mm9/mm9.fa.out''') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
41 parser.add_argument('genomefasta', action='store', metavar='genomefasta', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
42 help='''File name and path for genome of interest in fasta\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
43 format. Example /data/annotation/mm9/mm9.fa''') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
44 parser.add_argument('setup_folder', action='store', metavar='setup_folder', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
45 help='''List folder to contain bamfiles for repeats and\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
46 repeat element psuedogenomes.\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
47 Example /data/annotation/mm9/setup''') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
48 parser.add_argument('--nfragmentsfile1', action='store', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
49 dest='nfragmentsfile1', metavar='nfragmentsfile1', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
50 default='./repnames_nfragments.txt', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
51 help='''Output location of a description file that saves\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
52 the number of fragments processed per repname. |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
53 Default ./repnames_nfragments.txt''') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
54 parser.add_argument('--gaplength', action='store', dest='gaplength', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
55 metavar='gaplength', default='200', type=int, |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
56 help='Length of the spacer used to build\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
57 repeat psuedogeneomes. Default 200') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
58 parser.add_argument('--flankinglength', action='store', dest='flankinglength', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
59 metavar='flankinglength', default='25', type=int, |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
60 help='Length of the flanking region adjacent to the repeat\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
61 element that is used to build repeat psuedogeneomes.\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
62 The flanking length should be set according to the\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
63 length of your reads. Default 25') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
64 parser.add_argument('--is_bed', action='store', dest='is_bed', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
65 metavar='is_bed', default='FALSE', |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
66 help='''Is the annotation file a bed file. This is also a\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
67 compatible format. The file needs to be a tab\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
68 separated bed with optional fields. |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
69 Ex. format: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
70 chr\tstart\tend\tName_element\tclass\tfamily. |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
71 The class and family should identical to name_element\ |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
72 if not applicable. Default FALSE change to TRUE''') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
73 args = parser.parse_args() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
74 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
75 # parameters and paths specified in args_parse |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
76 gapl = args.gaplength |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
77 flankingl = args.flankinglength |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
78 annotation_file = args.annotation_file |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
79 genomefasta = args.genomefasta |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
80 setup_folder = args.setup_folder |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
81 nfragmentsfile1 = args.nfragmentsfile1 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
82 is_bed = args.is_bed |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
83 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
84 ############################################################################## |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
85 # check that the programs we need are available |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
86 try: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
87 subprocess.call(shlex.split("bowtie --version"), |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
88 stdout=open(os.devnull, 'wb'), |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
89 stderr=open(os.devnull, 'wb')) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
90 except OSError: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
91 print("Error: Bowtie or BEDTools not loaded") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
92 raise |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
93 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
94 ############################################################################## |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
95 # Define a text importer |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
96 csv.field_size_limit(sys.maxsize) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
97 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
98 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
99 def import_text(filename, separator): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
100 for line in csv.reader(open(os.path.realpath(filename)), |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
101 delimiter=separator, skipinitialspace=True): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
102 if line: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
103 yield line |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
104 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
105 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
106 # Make a setup folder |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
107 if not os.path.exists(setup_folder): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
108 os.makedirs(setup_folder) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
109 ############################################################################## |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
110 # load genome into dictionary |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
111 print("loading genome...") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
112 g = SeqIO.to_dict(SeqIO.parse(genomefasta, "fasta")) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
113 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
114 print("Precomputing length of all chromosomes...") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
115 idxgenome = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
116 lgenome = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
117 genome = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
118 allchrs = g.keys() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
119 k = 0 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
120 for chr in allchrs: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
121 genome[chr] = str(g[chr].seq) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
122 # del g[chr] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
123 lgenome[chr] = len(genome[chr]) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
124 idxgenome[chr] = k |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
125 k = k + 1 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
126 del g |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
127 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
128 ############################################################################## |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
129 # Build a bedfile of repeatcoordinates to use by RepEnrich region_sorter |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
130 if is_bed == "FALSE": |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
131 repeat_elements = [] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
132 fout = open(os.path.realpath(setup_folder + os.path.sep |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
133 + 'repnames.bed'), 'w') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
134 fin = import_text(annotation_file, ' ') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
135 x = 0 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
136 rep_chr = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
137 rep_start = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
138 rep_end = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
139 x = 0 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
140 for line in fin: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
141 if x > 2: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
142 line9 = line[9].replace("(", "_").replace(")", |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
143 "_").replace("/", "_") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
144 repname = line9 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
145 if repname not in repeat_elements: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
146 repeat_elements.append(repname) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
147 repchr = line[4] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
148 repstart = int(line[5]) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
149 repend = int(line[6]) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
150 fout.write(str(repchr) + '\t' + str(repstart) + '\t' + str(repend) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
151 + '\t' + str(repname) + '\n') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
152 if repname in rep_chr: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
153 rep_chr[repname].append(repchr) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
154 rep_start[repname].append(int(repstart)) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
155 rep_end[repname].append(int(repend)) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
156 else: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
157 rep_chr[repname] = [repchr] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
158 rep_start[repname] = [int(repstart)] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
159 rep_end[repname] = [int(repend)] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
160 x += 1 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
161 if is_bed == "TRUE": |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
162 repeat_elements = [] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
163 fout = open(os.path.realpath(setup_folder + os.path.sep + 'repnames.bed'), |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
164 'w') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
165 fin = open(os.path.realpath(annotation_file), 'r') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
166 x = 0 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
167 rep_chr = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
168 rep_start = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
169 rep_end = {} |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
170 x = 0 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
171 for line in fin: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
172 line = line.strip('\n') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
173 line = line.split('\t') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
174 line3 = line[3].replace("(", "_").replace(")", "_").replace("/", "_") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
175 repname = line3 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
176 if repname not in repeat_elements: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
177 repeat_elements.append(repname) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
178 repchr = line[0] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
179 repstart = int(line[1]) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
180 repend = int(line[2]) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
181 fout.write(str(repchr) + '\t' + str(repstart) + '\t' + |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
182 str(repend) + '\t' + str(repname) + '\n') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
183 # if rep_chr.has_key(repname): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
184 if repname in rep_chr: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
185 rep_chr[repname].append(repchr) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
186 rep_start[repname].append(int(repstart)) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
187 rep_end[repname].append(int(repend)) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
188 else: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
189 rep_chr[repname] = [repchr] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
190 rep_start[repname] = [int(repstart)] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
191 rep_end[repname] = [int(repend)] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
192 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
193 fin.close() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
194 fout.close() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
195 repeat_elements = sorted(repeat_elements) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
196 print("Writing a key for all repeats...") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
197 # print to fout the binary key that contains each repeat type with the |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
198 # associated binary number; sort the binary key: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
199 fout = open(os.path.realpath(setup_folder + os.path.sep + |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
200 'repgenomes_key.txt'), 'w') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
201 x = 0 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
202 for repeat in repeat_elements: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
203 # print >> fout, str(repeat) + '\t' + str(x) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
204 fout.write(str(repeat) + '\t' + str(x) + '\n') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
205 x += 1 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
206 fout.close() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
207 ############################################################################## |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
208 # generate spacer for psuedogenomes |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
209 spacer = "" |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
210 for i in range(gapl): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
211 spacer = spacer + "N" |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
212 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
213 # save file with number of fragments processed per repname |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
214 print("Saving number of fragments processed per repname to " |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
215 + nfragmentsfile1) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
216 fout1 = open(os.path.realpath(nfragmentsfile1), "w") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
217 for repname in rep_chr.keys(): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
218 rep_chr_current = rep_chr[repname] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
219 # print >>fout1, str(len(rep_chr[repname])) + "\t" + repname |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
220 fout1.write(str(len(rep_chr[repname])) + "\t" + repname + '\n') |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
221 fout1.close() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
222 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
223 # generate metagenomes and save them to FASTA files |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
224 k = 1 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
225 nrepgenomes = len(rep_chr.keys()) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
226 for repname in rep_chr.keys(): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
227 metagenome = "" |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
228 newname = repname.replace("(", "_").replace(")", "_").replace("/", "_") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
229 print("processing repgenome " + newname + ".fa" + " (" + str(k) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
230 + " of " + str(nrepgenomes) + ")") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
231 rep_chr_current = rep_chr[repname] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
232 rep_start_current = rep_start[repname] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
233 rep_end_current = rep_end[repname] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
234 print("-------> " + str(len(rep_chr[repname])) + " fragments") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
235 for i in range(len(rep_chr[repname])): |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
236 try: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
237 chr = rep_chr_current[i] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
238 rstart = max(rep_start_current[i] - flankingl, 0) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
239 rend = min(rep_end_current[i] + flankingl, lgenome[chr]-1) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
240 metagenome = metagenome + spacer + genome[chr][rstart:(rend+1)] |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
241 except KeyError: |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
242 print("Unrecognised Chromosome: "+chr) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
243 pass |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
244 # Convert metagenome to SeqRecord object (required by SeqIO.write) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
245 record = SeqRecord(Seq(metagenome, IUPAC.unambiguous_dna), id="repname", |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
246 name="", description="") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
247 print("saving repgenome " + newname + ".fa" + " (" + str(k) + " of " |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
248 + str(nrepgenomes) + ")") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
249 fastafilename = os.path.realpath(setup_folder + os.path.sep |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
250 + newname + ".fa") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
251 SeqIO.write(record, fastafilename, "fasta") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
252 print("indexing repgenome " + newname + ".fa" + " (" + |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
253 str(k) + " of " + str(nrepgenomes) + ")") |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
254 command = shlex.split('bowtie-build -f ' + fastafilename + ' ' + |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
255 setup_folder + os.path.sep + newname) |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
256 p = subprocess.Popen(command).communicate() |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
257 k += 1 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
258 |
f6f0f1e5e940
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/repenrich commit 61e203df0be5ed877ff92b917c7cde6eeeab8310
artbio
parents:
diff
changeset
|
259 print("... Done") |