annotate fsd_beforevsafter.py @ 0:6716b1cddf3e draft

planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
author mheinzl
date Thu, 10 May 2018 07:29:24 -0400
parents
children e8115b71edbd
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
1 #!/usr/bin/env python
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
2
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
3 # Family size distribution of DCS from various steps of the Galaxy pipeline
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
4 #
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
5 # Author: Monika Heinzl, Johannes-Kepler University Linz (Austria)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
6 # Contact: monika.heinzl@edumail.at
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
7 #
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
8 # Takes a TXT file with tags of reads that were aligned to certain regions of the reference genome (optional),
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
9 # a TABULAR file with tags before the alignment to the SSCS, a FASTA file with reads that were part of the DCS and
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
10 # a FASTA file with tags after trimming as input (optional).
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
11 # The program produces a plot which shows the distribution of family sizes of the DCS from the input files and
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
12 # a CSV file with the data of the plot.
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
13
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
14 # USAGE: python FSD before vs after_no_refF1.3_FINAL.py --inputFile_SSCS filenameSSCS --makeDCS filenameMakeDCS --afterTrimming filenameAfterTrimming -- alignedTags filenameTagsRefGenome
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
15 # --sep "characterWhichSeparatesCSVFile" --output_csv outptufile_name_csv --output_pdf outptufile_name_pdf
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
16
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
17
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
18 import numpy
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
19 import matplotlib.pyplot as plt
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
20 from collections import Counter
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
21 from Bio import SeqIO
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
22 import argparse
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
23 import sys
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
24 import os
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
25 from matplotlib.backends.backend_pdf import PdfPages
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
26
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
27 def readFileReferenceFree(file, delim):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
28 with open(file, 'r') as dest_f:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
29 data_array = numpy.genfromtxt(dest_f, skip_header=0, delimiter=delim, comments='#', dtype='string')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
30 return(data_array)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
31
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
32 def readFasta(file):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
33 tag_consensus = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
34 fs_consensus = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
35 with open(file, "r") as consFile:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
36 for record in SeqIO.parse(consFile, "fasta"):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
37 tag_consensus.append(record.id)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
38 line = record.description
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
39 a, b = line.split(" ")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
40 fs1, fs2 = b.split("-")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
41 fs_consensus.extend([fs1,fs2])
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
42 fs_consensus = numpy.array(fs_consensus).astype(int)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
43 return(tag_consensus, fs_consensus)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
44
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
45 def make_argparser():
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
46 parser = argparse.ArgumentParser(description='Analysis of read loss in duplex sequencing data')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
47 parser.add_argument('--inputFile_SSCS',
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
48 help='Tabular File with three columns: ab or ba, tag and family size.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
49 parser.add_argument('--makeDCS',
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
50 help='FASTA File with information about tag and family size in the header.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
51 parser.add_argument('--afterTrimming',default=None,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
52 help='FASTA File with information about tag and family size in the header.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
53 parser.add_argument('--alignedTags',default=None,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
54 help=' TXT file with tags aligned to the reference genome and family size.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
55 parser.add_argument('--output_csv', default="data.csv", type=str,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
56 help='Name of the pdf and csv file.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
57 parser.add_argument('--output_pdf', default="data.pdf", type=str,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
58 help='Name of the pdf and csv file.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
59 parser.add_argument('--sep', default=",",
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
60 help='Separator in the csv file.')
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
61 return parser
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
62
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
63 def compare_read_families_read_loss(argv):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
64 parser = make_argparser()
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
65 args = parser.parse_args(argv[1:])
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
66
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
67 SSCS_file = args.inputFile_SSCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
68 makeConsensus = args.makeDCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
69 afterTrimming = args.afterTrimming
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
70 ref_genome = args.alignedTags
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
71 title_file = args.output_csv
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
72 title_file2 = args.output_pdf
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
73 sep = args.sep
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
74
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
75 if type(sep) is not str or len(sep)>1:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
76 print("Error: --sep must be a single character.")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
77 exit(4)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
78
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
79 with open(title_file, "w") as output_file, PdfPages(title_file2) as pdf:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
80 ### PLOT ###
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
81 plt.rc('figure', figsize=(11.69, 8.27)) # A4 format
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
82 plt.rcParams['axes.facecolor'] = "E0E0E0" # grey background color
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
83 plt.rcParams['xtick.labelsize'] = 12
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
84 plt.rcParams['ytick.labelsize'] = 12
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
85 plt.rcParams['patch.edgecolor'] = "black"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
86 fig = plt.figure()
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
87 plt.subplots_adjust(bottom=0.3)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
88
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
89 list1 = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
90 colors = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
91 labels = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
92
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
93 ### data with tags of SSCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
94 data_array = readFileReferenceFree(SSCS_file, "\t")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
95 seq = numpy.array(data_array[:, 1])
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
96 tags = numpy.array(data_array[:, 2])
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
97 quant = numpy.array(data_array[:, 0]).astype(int)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
98
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
99 # split data with all tags of SSCS after ab and ba strands
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
100 all_ab = seq[numpy.where(tags == "ab")[0]]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
101 all_ba = seq[numpy.where(tags == "ba")[0]]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
102 quant_ab_sscs = quant[numpy.where(tags == "ab")[0]]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
103 quant_ba_sscs = quant[numpy.where(tags == "ba")[0]]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
104
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
105 seqDic_ab = dict(zip(all_ab, quant_ab_sscs))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
106 seqDic_ba = dict(zip(all_ba, quant_ba_sscs))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
107
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
108 ### get tags of the SSCS which form a DCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
109 # group large family sizes
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
110 bigFamilies = numpy.where(quant > 20)[0]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
111 quant[bigFamilies] = 22
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
112 maximumX = numpy.amax(quant)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
113
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
114 # find all unique tags and get the indices for ALL tags (ab AND ba)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
115 u, index_unique, c = numpy.unique(numpy.array(seq), return_counts=True, return_index=True)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
116 d = u[c > 1]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
117
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
118 # get family sizes, tag for the duplicates
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
119 duplTags_double = quant[numpy.in1d(seq, d)]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
120 list1.append(duplTags_double)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
121 colors.append("#0000FF")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
122 labels.append("before alignment\nto SSCS")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
123
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
124 duplTags = duplTags_double[0::2] # ab of DCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
125 duplTagsBA = duplTags_double[1::2] # ba of DCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
126
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
127 d2 = d[(duplTags >= 3) & (duplTagsBA >= 3)] # ab and ba FS>=3
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
128
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
129 # all SSCSs FS>=3
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
130 seq_unique, seqUnique_index = numpy.unique(seq, return_index=True)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
131 seq_unique_FS = quant[seqUnique_index]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
132 seq_unique_FS3 = seq_unique_FS[seq_unique_FS >= 3]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
133
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
134 legend1 = "\ntotal nr. of tags (unique, FS>=1):\nDCS (before alignment to SSCS, FS>=1):\ntotal nr. of tags (unique, FS>=3):\nDCS (before alignment to SSCS, FS>=3):"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
135 legend2 = "total numbers * \n{:,}\n{:,}\n{:,}\n{:,}".format(len(seq_unique_FS), len(duplTags),
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
136 len(seq_unique_FS3), len(d2))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
137 plt.text(0.55, 0.14, legend1, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
138 plt.text(0.88, 0.14, legend2, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
139
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
140 ## data make DCS
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
141 tag_consensus, fs_consensus = readFasta(makeConsensus)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
142 ### group large family sizes in the plot of fasta files
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
143 bigFamilies = numpy.where(fs_consensus > 20)[0]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
144 fs_consensus[bigFamilies] = 22
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
145 list1.append(fs_consensus)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
146 colors.append("#298A08")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
147 labels.append("make DCS")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
148 legend3 = "make DCS:"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
149 legend4 = "{:,}".format(len(tag_consensus))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
150 plt.text(0.55, 0.11, legend3, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
151 plt.text(0.88, 0.11, legend4, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
152
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
153 ### data after trimming
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
154 if afterTrimming != str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
155 tag_trimming, fs_trimming = readFasta(afterTrimming)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
156 bigFamilies = numpy.where(fs_trimming > 20)[0]
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
157 fs_trimming[bigFamilies] = 22
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
158 list1.append(fs_trimming)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
159 colors.append("#DF0101")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
160 labels.append("after trimming")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
161 legend5 = "after trimming:"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
162 legend6 = "{:,}".format(len(tag_trimming))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
163 plt.text(0.55, 0.09, legend5, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
164 plt.text(0.88, 0.09, legend6, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
165
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
166 ### data of tags aligned to reference genome
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
167 if ref_genome != str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
168 mut_array = readFileReferenceFree(ref_genome, " ")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
169 ### use only unique tags that were alignment to the reference genome
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
170 seq_mut, seqMut_index = numpy.unique(numpy.array(mut_array[:, 1]), return_index=True)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
171
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
172 # get family sizes
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
173 quant_ab = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
174 for i in seq_mut:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
175 quant_ab.append(seqDic_ab.get(i))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
176
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
177 quant_ba = []
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
178 for i in seq_mut:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
179 quant_ba.append(seqDic_ba.get(i))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
180
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
181 quant_ab_ref = numpy.array(quant_ab)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
182 quant_ba_ref = numpy.array(quant_ba)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
183 quant_all_ref = numpy.concatenate((quant_ab_ref, quant_ba_ref))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
184 bigFamilies = numpy.where(quant_all_ref > 20)[0] # group large family sizes
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
185 quant_all_ref[bigFamilies] = 22
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
186 list1.append(quant_all_ref)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
187 colors.append("#04cec7")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
188 labels.append("after alignment\nto reference genome")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
189 legend7 = "after alignment to reference genome:"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
190 length_DCS_ref = len(quant_ba_ref) # count of duplex tags that were aligned to reference genome
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
191 legend8 = "{:,}".format(length_DCS_ref)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
192 plt.text(0.55, 0.07, legend7, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
193 plt.text(0.88, 0.07, legend8, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
194
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
195
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
196 counts = plt.hist(list1, bins=range(-1, maximumX + 1), stacked=False, label=labels, color=colors,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
197 align="left", alpha=1, edgecolor="black", linewidth=1)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
198 ticks = numpy.arange(0, maximumX, 1)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
199 ticks1 = map(str, ticks)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
200 ticks1[len(ticks1) - 1] = ">20"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
201 plt.xticks(numpy.array(ticks), ticks1)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
202 if ref_genome != str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
203 count = numpy.array(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
204 [v for k, v in sorted(Counter(quant_ab_ref).iteritems())]) # count all family sizes from all ab strands
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
205
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
206 legend = "max. family size =\nabsolute frequency=\nrelative frequency=\n\ntotal nr. of reads (before)"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
207 plt.text(0.1, 0.105, legend, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
208
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
209 legend = "AB\n{}\n{}\n{:.5f}\n\n{:,}" \
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
210 .format(max(quant_ab_ref), count[len(count) - 1], float(count[len(count) - 1]) / sum(count),
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
211 sum(numpy.array(data_array[:, 0]).astype(int)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
212 plt.text(0.3, 0.105, legend, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
213
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
214 count2 = numpy.array(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
215 [v for k, v in sorted(Counter(quant_ba_ref).iteritems())]) # count all family sizes from all ba strands
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
216 legend = "BA\n{}\n{}\n{:.5f}" \
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
217 .format(max(quant_ba_ref), count2[len(count2) - 1], float(count2[len(count2) - 1]) / sum(count2))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
218 plt.text(0.4, 0.15, legend, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
219
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
220 legend4 = "* In the plot, the family sizes of ab and ba strands and of both duplex tags were used.\nWhereas the total numbers indicate only the single count of the formed duplex tags."
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
221 plt.text(0.1, 0.02, legend4, size=11, transform=plt.gcf().transFigure)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
222
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
223 plt.legend(loc='upper right', fontsize=14, bbox_to_anchor=(0.9, 1), frameon=True)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
224 plt.title("Family Size Distribution of Tags from various Steps of the Galaxy Pipeline", fontsize=14)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
225 plt.xlabel("No. of Family Members", fontsize=12)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
226 plt.ylabel("Absolute Frequency", fontsize=12)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
227 plt.grid(b=True, which="major", color="#424242", linestyle=":")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
228 plt.margins(0.01, None)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
229
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
230 pdf.savefig(fig, bbox_inch="tight")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
231 plt.close()
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
232
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
233 # write information about plot into a csv file
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
234 output_file.write("Dataset:{}{}\n".format(sep, SSCS_file))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
235 if ref_genome != str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
236 output_file.write("{}AB{}BA\n".format(sep, sep))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
237 output_file.write("max. family size:{}{}{}{}\n".format(sep, max(quant_ab_ref), sep, max(quant_ba_ref)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
238 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
239 "absolute frequency:{}{}{}{}\n".format(sep, count[len(count) - 1], sep, count2[len(count2) - 1]))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
240 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
241 "relative frequency:{}{:.3f}{}{:.3f}\n\n".format(sep, float(count[len(count) - 1]) / sum(count), sep,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
242 float(count2[len(count2) - 1]) / sum(count2)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
243
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
244 output_file.write("\n\ntotal nr. of reads{}{}\n".format(sep, sum(numpy.array(data_array[:, 0]).astype(int))))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
245 output_file.write("\n\nValues from family size distribution\n")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
246
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
247 if afterTrimming == str(None) and ref_genome == str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
248 if afterTrimming == str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
249 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
250 "{}before alignment to SSCS{}make DCS\n".format(sep, sep))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
251 elif ref_genome == str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
252 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
253 "{}before alignment to SSCS{}make DCS\n".format(sep, sep))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
254
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
255 for fs, sscs, dcs in zip(counts[1][2:len(counts[1])], counts[0][0][2:len(counts[0][0])],
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
256 counts[0][1][2:len(counts[0][1])]):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
257 if fs == 21:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
258 fs = ">20"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
259 else:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
260 fs = "={}".format(fs)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
261 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
262 "FS{}{}{}{}{}\n".format(fs, sep, int(sscs), sep, int(dcs)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
263 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
264 "sum{}{}{}{}\n".format(sep, int(sum(counts[0][0])), sep, int(sum(counts[0][1]))))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
265
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
266 elif afterTrimming == str(None) or ref_genome == str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
267 if afterTrimming == str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
268 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
269 "{}before alignment to SSCS{}make DCS{}after alignment to reference genome\n".format(sep, sep, sep))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
270 elif ref_genome == str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
271 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
272 "{}before alignment to SSCS{}make DCS{}after trimming\n".format(sep, sep, sep))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
273
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
274 for fs, sscs, dcs, reference in zip(counts[1][2:len(counts[1])], counts[0][0][2:len(counts[0][0])],
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
275 counts[0][1][2:len(counts[0][1])],counts[0][2][2:len(counts[0][2])]):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
276 if fs == 21:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
277 fs = ">20"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
278 else:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
279 fs = "={}".format(fs)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
280 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
281 "FS{}{}{}{}{}{}{}\n".format(fs, sep, int(sscs), sep, int(dcs), sep, int(reference)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
282 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
283 "sum{}{}{}{}{}{}\n".format(sep, int(sum(counts[0][0])), sep, int(sum(counts[0][1])), sep, int(sum(counts[0][2]))))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
284
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
285 else:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
286 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
287 "{}before alignment to SSCS{}make DCS{}after trimming{}after alignment to reference genome\n".format(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
288 sep, sep, sep, sep))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
289 for fs, sscs, dcs, trim, reference in zip(counts[1][2:len(counts[1])], counts[0][0][2:len(counts[0][0])],
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
290 counts[0][1][2:len(counts[0][1])],
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
291 counts[0][2][2:len(counts[0][2])],
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
292 counts[0][3][2:len(counts[0][3])]):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
293 if fs == 21:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
294 fs = ">20"
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
295 else:
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
296 fs = "={}".format(fs)
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
297 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
298 "FS{}{}{}{}{}{}{}{}{}\n".format(fs, sep, int(sscs), sep, int(dcs), sep, int(trim), sep,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
299 int(reference)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
300 output_file.write(
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
301 "sum{}{}{}{}{}{}{}{}\n".format(sep, int(sum(counts[0][0])), sep, int(sum(counts[0][1])), sep,
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
302 int(sum(counts[0][2])), sep, int(sum(counts[0][3]))))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
303
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
304 output_file.write("\n\nIn the plot, the family sizes of ab and ba strands and of both duplex tags were used.\nWhereas the total numbers indicate only the single count of the formed duplex tags.\n")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
305 output_file.write("total nr. of tags (unique, FS>=1){}{}\n".format(sep, len(seq_unique_FS)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
306 output_file.write("DCS (before alignment to SSCS, FS>=1){}{}\n".format(sep, len(duplTags)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
307 output_file.write("total nr. of tags (unique, FS>=3){}{}\n".format(sep, len(seq_unique_FS3)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
308 output_file.write("DCS (before alignment to SSCS, FS>=3){}{}\n".format(sep, len(d2)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
309 output_file.write("make DCS{}{}\n".format(sep, len(tag_consensus)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
310 if afterTrimming != str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
311 output_file.write("after trimming{}{}\n".format(sep, len(tag_trimming)))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
312 if ref_genome != str(None):
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
313 output_file.write("after alignment to reference genome{}{}\n".format(sep, length_DCS_ref))
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
314
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
315 print("Files successfully created!")
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
316
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
317 if __name__ == '__main__':
6716b1cddf3e planemo upload for repository https://github.com/monikaheinzl/galaxyProject/tree/master/tools/fsd_beforevsafter commit 6055f8c5c052f528ff85fb5e0d43b4500830637a
mheinzl
parents:
diff changeset
318 sys.exit(compare_read_families_read_loss(sys.argv))