# HG changeset patch # User jetbrains # Date 1542547227 18000 # Node ID 5b99943c46274756e023114b70c49ef7fe3a262e # Parent dfb1e66235c58c9de050f08a1d2a223b14f9e9ec Span version https://github.com/JetBrains-Research/galaxy-applications/commit/cbbba255d66a4775cc35caf5cb85665396fdcd2a diff -r dfb1e66235c5 -r 5b99943c4627 README.md --- a/README.md Thu Nov 15 11:30:01 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,3 +0,0 @@ -Release version -=============== -Release is just a `span` snapshot from [https://github.com/JetBrains-Research/galaxy-applications](https://github.com/JetBrains-Research/galaxy-applications) \ No newline at end of file diff -r dfb1e66235c5 -r 5b99943c4627 span.xml --- a/span.xml Thu Nov 15 11:30:01 2018 -0500 +++ b/span.xml Sun Nov 18 08:20:27 2018 -0500 @@ -1,8 +1,7 @@ - ChIP-Seq analysis + Semi-supervised Peak Analyzer for ChIP-Seq data package_span_jar - @@ -10,31 +9,58 @@ +#import re +#set treatment_identifier = re.sub('[^\w\-\.]', '_', str($treatment_file.element_identifier)) +#set genome_identifier = re.sub('[^\w\-\.]', '_', str($genome_file.element_identifier)) + +#if $control.control_selector + #set control_identifier = re.sub('[^\w\-\.]', '_', str($control.control_file.element_identifier)) +#end if + #if str($action.action_selector) == "model" #if $control.control_selector - span_wrapper.py model with_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" "${control.control_file}" + span_wrapper.py model with_control + "${genome_identifier}" "${genome_file}" + "${treatment_identifier}" "${treatment_file}" + "${bin}" "${action.model_file}" + "${control_identifier}" "${control.control_file}" #else - span_wrapper.py model without_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" + span_wrapper.py model without_control + "${genome_identifier}" "${genome_file}" + "${treatment_identifier}" "${treatment_file}" + "${bin}" "${action.model_file}" #end if #else #if $control.control_selector - span_wrapper.py peaks with_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" "${control.control_file}" "${fdr}" "${gap}" "${action.peaks_file}" + span_wrapper.py peaks with_control + "${genome_identifier}" "${genome_file}" + "${treatment_identifier}" "${treatment_file}" + "${bin}" "${action.model_file}" + "${control_identifier}" "${control.control_file}" + "${action.fdr}" "${action.gap}" "${action.peaks_file}" #else - span_wrapper.py peaks without_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" "${fdr}" "${gap}" "${action.peaks_file}" + span_wrapper.py peaks without_control + "${genome_identifier}" "${genome_file}" + "${treatment_identifier}" "${treatment_file}" + "${bin}" "${action.model_file}" + "${action.fdr}" "${action.gap}" "${action.peaks_file}" #end if #end if - + description="Treatment BAM reads to process" argument="--treatment" + help="Treatment BAM reads to process"/> + + description="Control BAM reads to process" argument="--control" + help="Control BAM reads to process"/> @@ -44,26 +70,107 @@ - + - - - - + + + + - + - - + + action['action_selector'] == "peaks" + - - SPAN Semi-supervised Peak Analyzer is a tool for analyzing ChIP-seq data. - Details: http://artyomovlab.wustl.edu/aging/span.html - + * **Required.** ChIP-seq treatment file. bam, bed or .bed.gz file; If multiple files are given, treated as replicates. + +*--chrom.sizes, --cs * **Required.** Chromosome sizes path, can be downloaded at http://hgdownload.cse.ucsc.edu/goldenPath//bigZips/.chrom.sizes + +*-c, --control * Control file. bam, bed or bed.gz file; Single control file or separate file per each treatment file required. + +*--fragment * Fragment size, read length if not given + +*-b, --bin * Bin size + +*-f, --fdr * Fdr value + +*-g, --gap * Gap size to merge peaks + +*-p, --peaks * Path to result peaks file in ENCODE broadPeak (BED 6+3) format + + +----- + +**Outputs** + +This tool produces a SPAN binary model file and/or peaks in ENCODE broadPeak (BED 6+3) format. + +Peak file columns contain the following data: + +* **1st**: chromosome name +* **2nd**: start position of peak +* **3rd**: end position of peak +* **4th**: name of peak +* **5th**: integer score for display in genome browser (e.g. UCSC) +* **6th**: strand, either "." (=no strand) or "+" or "-" +* **7th**: fold-change +* **8th**: -log10pvalue +* **9th**: -log10qvalue + +----- + +**SPAN workflow** + +* Convert raw reads to tags using *FRAGMENT* parameter. +* Compute coverage for all genome tiled into bins of *BIN* base pairs. +* Fit 3-state hidden Markov model that classifies bins as ZERO states with no coverage, LOW states of non-specific binding, and HIGH states of the specific binding. +* Compute posterior HIGH state probability of each bin. +* Trained model is saved into *.span* binary format. +* Peaks are computed using trained model and *FDR* and *GAP* parameters. + +------ + +**Citation** + +If you use this tool in Galaxy, please cite XXX, et al. *In preparation.* + +----- + +**More Information** + +* Project home page: https://research.jetbrains.org/groups/biolabs/tools/span-peak-analyzer +* Study cases: https://artyomovlab.wustl.edu/aging + +]]> diff -r dfb1e66235c5 -r 5b99943c4627 span_wrapper.py --- a/span_wrapper.py Thu Nov 15 11:30:01 2018 -0500 +++ b/span_wrapper.py Sun Nov 18 08:20:27 2018 -0500 @@ -1,67 +1,135 @@ #!/usr/bin/env python import os +import shutil +import subprocess import sys -import subprocess + argv = sys.argv[1:] print 'Arguments {0}'.format(argv) SPAN_JAR = os.environ.get("SPAN_JAR") -# span.jar from Docker container -# SPAN_JAR = "/root/span.jar" print 'Using SPAN Peak Analyzer distributive file {0}'.format(SPAN_JAR) -# #if $action.action_selector -# #if str($control.control_selector) == "with_control" -# span_wrapper.py model with_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" "${control.control_file}" +# #if str($action.action_selector) == "model" +# #if $control.control_selector +# span_wrapper.py model with_control +# "${genome_identifier}" "${genome_file}" +# "${treatment_identifier}" "${treatment_file}" +# "${bin}" "${action.model_file}" +# "${control_identifier}" "${control.control_file}" # #else -# span_wrapper.py model without_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" +# span_wrapper.py model without_control +# "${genome_identifier}" "${genome_file}" +# "${treatment_identifier}" "${treatment_file}" +# "${bin}" "${action.model_file}" # #end if # #else # #if $control.control_selector -# span_wrapper.py peaks with_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" "${control.control_file}" "${fdr}" "${gap}" "${action.peaks_file}" +# span_wrapper.py peaks with_control +# "${genome_identifier}" "${genome_file}" +# "${treatment_identifier}" "${treatment_file}" +# "${bin}" "${action.model_file}" +# "${control_identifier}" "${control.control_file}" +# "${fdr}" "${gap}" "${action.peaks_file}" # #else -# span_wrapper.py peaks without_control "${genome}" "${treatment_file}" "${bin}" "${action.model_file}" "${fdr}" "${gap}" "${action.peaks_file}" +# span_wrapper.py peaks with_control +# "${genome_identifier}" "${genome_file}" +# "${treatment_identifier}" "${treatment_file}" +# "${bin}" "${action.model_file}" +# "${fdr}" "${gap}" "${action.peaks_file}" # #end if # #end if -# See http://artyomovlab.wustl.edu/aging/span.html for command line options +# See https://research.jetbrains.org/groups/biolabs/tools/span-peak-analyzer for command line options action = argv[0] control = argv[1] + +working_dir = os.path.abspath('.') +print 'WORKING DIRECTORY: {}'.format(working_dir) + + +def link(name, f): + """ SPAN uses file extension to detect input type, so original names are necessary, instead of Galaxy .dat files""" + result = os.path.join(working_dir, name) + os.symlink(f, result) + return result + + if action == 'model': if control == 'with_control': - (chrom_sizes, treatment_file, bin, model_file, control_file) = argv[2:] + (chrom_sizes, chrom_sizes_file, + treatment, treatment_file, + bin, model_file, + control, control_file) = argv[2:] cmd = 'java -jar {} analyze --chrom.sizes {} --treatment {} --control {} --bin {}'.format( - SPAN_JAR, chrom_sizes, treatment_file, control_file, bin + SPAN_JAR, + link(chrom_sizes, chrom_sizes_file), + link(treatment, treatment_file), + link(control, control_file), + bin ) - print "MODEL FILE" + model_file elif control == 'without_control': - (chrom_sizes, treatment_file, bin, model_file) = argv[2:] + (chrom_sizes, chrom_sizes_file, + treatment, treatment_file, + bin, model_file) = argv[2:] cmd = 'java -jar {} analyze --chrom.sizes {} --treatment {} --bin {}'.format( - SPAN_JAR, argv[2], argv[3], argv[4] + SPAN_JAR, + link(chrom_sizes, chrom_sizes_file), + link(treatment, treatment_file), + bin ) - print "MODEL FILE" + model_file else: raise Exception("Unknown control option {}".format(control)) elif action == "peaks": if control == 'with_control': - (chrom_sizes, treatment_file, bin, model_file, control_file, fdr, gap, peaks_file) = argv[2:] + (chrom_sizes, chrom_sizes_file, + treatment, treatment_file, + bin, model_file, + control, control_file, + fdr, gap, peaks_file) = argv[2:] cmd = 'java -jar {} analyze --chrom.sizes {} --treatment {} --control {} --bin {} --fdr {} --gap {} --peaks {}'.format( - SPAN_JAR, chrom_sizes, treatment_file, control_file, bin, fdr, gap, peaks_file + SPAN_JAR, + link(chrom_sizes, chrom_sizes_file), + link(treatment, treatment_file), + link(control, control_file), + bin, fdr, gap, + os.path.join(working_dir, peaks_file) ) - print "MODEL FILE" + model_file elif control == 'without_control': - (chrom_sizes, treatment_file, bin, model_file, fdr, gap, peaks_file) = argv[2:] + (chrom_sizes, chrom_sizes_file, + treatment, treatment_file, + bin, model_file, + fdr, gap, peaks_file) = argv[2:] cmd = 'java -jar {} analyze --chrom.sizes {} --treatment {} --bin {} --fdr {} --gap {} --peaks {}'.format( - SPAN_JAR, chrom_sizes, treatment_file, bin, fdr, gap, peaks_file + SPAN_JAR, + link(chrom_sizes, chrom_sizes_file), + link(treatment, treatment_file), + bin, fdr, gap, + os.path.join(working_dir, peaks_file) ) - print "MODEL FILE" + model_file else: raise Exception("Unknown control option {}".format(control)) else: raise Exception("Unknown action command {}".format(action)) -print 'Launching SPAN: {0}'.format(cmd) +print 'Launching SPAN: {}'.format(cmd) +print 'Model file: {}'.format(model_file) +try: + print 'Peaks file: {}'.format(peaks_file) +except NameError: + pass + subprocess.check_call(cmd, cwd=None, shell=True) + +# Move model to the the working dir with given name +fit_dir = os.path.join(working_dir, 'fit') +model_original = os.path.join(fit_dir, os.listdir(fit_dir)[0]) +shutil.move(model_original, os.path.join(working_dir, model_file)) + +# Move log file +logs_dir = os.path.join(working_dir, 'logs') +log_original = os.path.join(logs_dir, os.listdir(logs_dir)[0]) +shutil.move(log_original, os.path.join(working_dir, "span.log"))