Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
changeset 0:47ec9c6f44b8 draft
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit b63924933a03255872077beb4d0fde49d77afa92
author | pjbriggs |
---|---|
date | Thu, 09 Nov 2017 10:13:29 -0500 |
parents | |
children | 1c1902e12caf |
files | README.rst amplicon_analysis_pipeline.py amplicon_analysis_pipeline.xml install_tool_deps.sh static/images/Pipeline_description_Fig1.png static/images/Pipeline_description_Fig2.png static/images/Pipeline_description_Fig3.png |
diffstat | 7 files changed, 1768 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Thu Nov 09 10:13:29 2017 -0500 @@ -0,0 +1,249 @@ +Amplicon_analysis-galaxy +======================== + +A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline +script at https://github.com/MTutino/Amplicon_analysis + +The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq +(Casava >= 1.8) and performs the following operations: + + * QC and clean up of input data + * Removal of singletons and chimeras and building of OTU table + and phylogenetic tree + * Beta and alpha diversity of analysis + +Usage documentation +=================== + +Usage of the tool (including required inputs) is documented within +the ``help`` section of the tool XML. + +Installing the tool in a Galaxy instance +======================================== + +The following sections describe how to install the tool files, +dependencies and reference data, and how to configure the Galaxy +instance to detect the dependencies and reference data correctly +at run time. + +1. Install the dependencies +--------------------------- + +The ``install_tool_deps.sh`` script can be used to fetch and install the +dependencies locally, for example:: + + install_tool_deps.sh /path/to/local_tool_dependencies + +This can take some time to complete. When finished it should have +created a set of directories containing the dependencies under the +specified top level directory. + +2. Install the tool files +------------------------- + +The core tool is hosted on the Galaxy toolshed, so it can be installed +directly from there (this is the recommended route): + + * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/ + +Alternatively it can be installed manually; in this case there are two +files to install: + + * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) + * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) + +Put these in a directory that is visible to Galaxy (e.g. a +``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` +file to tell Galaxy to offer the tool by adding the line e.g.:: + + <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> + +3. Install the reference data +----------------------------- + +The script ``References.sh`` from the pipeline package at +https://github.com/MTutino/Amplicon_analysis can be run to install +the reference data, for example:: + + cd /path/to/pipeline/data + wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh + /bin/bash ./References.sh + +will install the data in ``/path/to/pipeline/data``. + +**NB** The final amount of data downloaded and uncompressed will be +around 6GB. + +4. Configure dependencies and reference data in Galaxy +------------------------------------------------------ + +The final steps are to make your Galaxy installation aware of the +tool dependencies and reference data, so it can locate them both when +the tool is run. + +To target the tool dependencies installed previously, add the +following lines to the ``dependency_resolvers_conf.xml`` file in the +Galaxy ``config`` directory:: + + <dependency_resolvers> + ... + <galaxy_packages base_path="/path/to/local_tool_dependencies" /> + <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" /> + ... + </dependency_resolvers> + +(NB it is recommended to place these *before* the ``<conda ... />`` +resolvers) + +(If you're not familiar with dependency resolvers in Galaxy then +see the documentation at +https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html +for more details.) + +The tool locates the reference data via an environment variable called +``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent +directory where the reference data has been installed. + +There are various ways to do this, depending on how your Galaxy +installation is configured: + + * **For local instances:** add a line to set it in the + ``config/local_env.sh`` file of your Galaxy installation, e.g.:: + + export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data + + * **For production instances:** set the value in the ``job_conf.xml`` + configuration file, e.g.:: + + <destination id="amplicon_analysis"> + <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> + </destination> + + and then specify that the pipeline tool uses this destination:: + + <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> + + (For more about job destinations see the Galaxy documentation at + https://galaxyproject.org/admin/config/jobs/#job-destinations) + +5. Enable rendering of HTML outputs from pipeline +------------------------------------------------- + +To ensure that HTML outputs are displayed correctly in Galaxy +(for example the Vsearch OTU table heatmaps), Galaxy needs to be +configured not to sanitize the outputs from the ``Amplicon_analysis`` +tool. + +Either: + + * **For local instances:** set ``sanitize_all_html = False`` in + ``config/galaxy.ini`` (nb don't do this on production servers or + public instances!); or + + * **For production instances:** add the ``Amplicon_analysis`` tool + to the display whitelist in the Galaxy instance: + + - Set ``sanitize_whitelist_file = config/whitelist.txt`` in + ``config/galaxy.ini`` and restart Galaxy; + - Go to ``Admin>Manage Display Whitelist``, check the box for + ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' + search function to help locate it) and click on + ``Submit new whitelist`` to update the settings. + +Additional details +================== + +Some other things to be aware of: + + * Note that using the Silva database requires a minimum of 18Gb RAM + +Known problems +============== + + * Only the ``VSEARCH`` pipeline in Mauro's script is currently + available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` + pipelines have yet to be implemented. + * The images in the tool help section are not visible if the + tool has been installed locally, or if it has been installed in + a Galaxy instance which is served from a subdirectory. + + These are both problems with Galaxy and not the tool, see + https://github.com/galaxyproject/galaxy/issues/4490 and + https://github.com/galaxyproject/galaxy/issues/1676 + +Appendix: availability of tool dependencies +=========================================== + +The tool takes its dependencies from the underlying pipeline script (see +https://github.com/MTutino/Amplicon_analysis/blob/master/README.md +for details). + +As noted above, currently the ``install_tool_deps.sh`` script can be +used to manually install the dependencies for a local tool install. + +In principle these should also be available if the tool were installed +from a toolshed. However it would be preferrable in this case to get as +many of the dependencies as possible via the ``conda`` dependency +resolver. + +The following are known to be available via conda, with the required +version: + + - cutadapt 1.8.1 + - sickle-trim 1.33 + - bioawk 1.0 + - fastqc 0.11.3 + - R 3.2.0 + +Some dependencies are available but with the "wrong" versions: + + - spades (need 3.5.0) + - qiime (need 1.8.0) + - blast (need 2.2.26) + - vsearch (need 1.1.3) + +The following dependencies are currently unavailable: + + - fasta_number (need 02jun2015) + - fasta-splitter (need 0.2.4) + - rdp_classifier (need 2.2) + - microbiomeutil (need r20110519) + +(NB usearch 6.1.544 and 8.0.1623 are special cases which must be +handled outside of Galaxy's dependency management systems.) + +History +======= + +========== ====================================================================== +Version Changes +---------- ---------------------------------------------------------------------- +1.1.0 First official version on Galaxy toolshed. +1.0.6 Expand inline documentation to provide detailed usage guidance. +1.0.5 Updates including: + + - Capture read counts from quality control as new output dataset + - Capture FastQC per-base quality boxplots for each sample as + new output dataset + - Add support for -l option (sliding window length for trimming) + - Default for -L set to "200" +1.0.4 Various updates: + + - Additional outputs are captured when a "Categories" file is + supplied (alpha diversity rarefaction curves and boxplots) + - Sample names derived from Fastqs in a collection of pairs + are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) + - Input Fastqs can now be of more general ``fastq`` type + - Log file outputs are captured in new output dataset + - User can specify a "title" for the job which is copied into + the dataset names (to distinguish outputs from different runs) + - Improved detection and reporting of problems with input + Metatable +1.0.3 Take the sample names from the collection dataset names when + using collection as input (this is now the default input mode); + collect additional output dataset; disable ``usearch``-based + pipelines (i.e. ``UPARSE`` and ``QIIME``). +1.0.2 Enable support for FASTQs supplied via dataset collections and + fix some broken output datasets. +1.0.1 Initial version +========== ======================================================================
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/amplicon_analysis_pipeline.py Thu Nov 09 10:13:29 2017 -0500 @@ -0,0 +1,329 @@ +#!/usr/bin/env python +# +# Wrapper script to run Amplicon_analysis_pipeline.sh +# from Galaxy tool + +import sys +import os +import argparse +import subprocess +import glob + +class PipelineCmd(object): + def __init__(self,cmd): + self.cmd = [str(cmd)] + def add_args(self,*args): + for arg in args: + self.cmd.append(str(arg)) + def __repr__(self): + return ' '.join([str(arg) for arg in self.cmd]) + +def ahref(target,name=None,type=None): + if name is None: + name = os.path.basename(target) + ahref = "<a href='%s'" % target + if type is not None: + ahref += " type='%s'" % type + ahref += ">%s</a>" % name + return ahref + +def check_errors(): + # Errors in Amplicon_analysis_pipeline.log + with open('Amplicon_analysis_pipeline.log','r') as pipeline_log: + log = pipeline_log.read() + if "Names in the first column of Metatable.txt and in the second column of Final_name.txt do not match" in log: + print_error("""*** Sample IDs don't match dataset names *** + +The sample IDs (first column of the Metatable file) don't match the +supplied sample names for the input Fastq pairs. +""") + # Errors in pipeline output + with open('pipeline.log','r') as pipeline_log: + log = pipeline_log.read() + if "Errors and/or warnings detected in mapping file" in log: + with open("Metatable_log/Metatable.log","r") as metatable_log: + # Echo the Metatable log file to the tool log + print_error("""*** Error in Metatable mapping file *** + +%s""" % metatable_log.read()) + elif "No header line was found in mapping file" in log: + # Report error to the tool log + print_error("""*** No header in Metatable mapping file *** + +Check you've specified the correct file as the input Metatable""") + +def print_error(message): + width = max([len(line) for line in message.split('\n')]) + 4 + sys.stderr.write("\n%s\n" % ('*'*width)) + for line in message.split('\n'): + sys.stderr.write("* %s%s *\n" % (line,' '*(width-len(line)-4))) + sys.stderr.write("%s\n\n" % ('*'*width)) + +def clean_up_name(sample): + # Remove trailing "_L[0-9]+_001" from Fastq + # pair names + split_name = sample.split('_') + if split_name[-1] == "001": + split_name = split_name[:-1] + if split_name[-1].startswith('L'): + try: + int(split_name[-1][1:]) + split_name = split_name[:-1] + except ValueError: + pass + return '_'.join(split_name) + +def list_outputs(filen=None): + # List the output directory contents + # If filen is specified then will be the filename to + # write to, otherwise write to stdout + if filen is not None: + fp = open(filen,'w') + else: + fp = sys.stdout + results_dir = os.path.abspath("RESULTS") + fp.write("Listing contents of output dir %s:\n" % results_dir) + ix = 0 + for d,dirs,files in os.walk(results_dir): + ix += 1 + fp.write("-- %d: %s\n" % (ix, + os.path.relpath(d,results_dir))) + for f in files: + ix += 1 + fp.write("---- %d: %s\n" % (ix, + os.path.relpath(f,results_dir))) + # Close output file + if filen is not None: + fp.close() + +if __name__ == "__main__": + # Command line + print "Amplicon analysis: starting" + p = argparse.ArgumentParser() + p.add_argument("metatable", + metavar="METATABLE_FILE", + help="Metatable.txt file") + p.add_argument("fastq_pairs", + metavar="SAMPLE_NAME FQ_R1 FQ_R2", + nargs="+", + default=list(), + help="Triplets of SAMPLE_NAME followed by " + "a R1/R2 FASTQ file pair") + p.add_argument("-g",dest="forward_pcr_primer") + p.add_argument("-G",dest="reverse_pcr_primer") + p.add_argument("-q",dest="trimming_threshold") + p.add_argument("-O",dest="minimum_overlap") + p.add_argument("-L",dest="minimum_length") + p.add_argument("-l",dest="sliding_window_length") + p.add_argument("-P",dest="pipeline", + choices=["vsearch","uparse","qiime"], + type=str.lower, + default="vsearch") + p.add_argument("-S",dest="use_silva",action="store_true") + p.add_argument("-r",dest="reference_data_path") + p.add_argument("-c",dest="categories_file") + args = p.parse_args() + + # Build the environment for running the pipeline + print "Amplicon analysis: building the environment" + metatable_file = os.path.abspath(args.metatable) + os.symlink(metatable_file,"Metatable.txt") + print "-- made symlink to Metatable.txt" + + # Link to Categories.txt file (if provided) + if args.categories_file is not None: + categories_file = os.path.abspath(args.categories_file) + os.symlink(categories_file,"Categories.txt") + print "-- made symlink to Categories.txt" + + # Link to FASTQs and construct Final_name.txt file + sample_names = [] + with open("Final_name.txt",'w') as final_name: + fastqs = iter(args.fastq_pairs) + for sample_name,fqr1,fqr2 in zip(fastqs,fastqs,fastqs): + sample_name = clean_up_name(sample_name) + r1 = "%s_R1_.fastq" % sample_name + r2 = "%s_R2_.fastq" % sample_name + os.symlink(fqr1,r1) + os.symlink(fqr2,r2) + final_name.write("%s\n" % '\t'.join((r1,sample_name))) + final_name.write("%s\n" % '\t'.join((r2,sample_name))) + sample_names.append(sample_name) + + # Construct the pipeline command + print "Amplicon analysis: constructing pipeline command" + pipeline = PipelineCmd("Amplicon_analysis_pipeline.sh") + if args.forward_pcr_primer: + pipeline.add_args("-g",args.forward_pcr_primer) + if args.reverse_pcr_primer: + pipeline.add_args("-G",args.reverse_pcr_primer) + if args.trimming_threshold: + pipeline.add_args("-q",args.trimming_threshold) + if args.minimum_overlap: + pipeline.add_args("-O",args.minimum_overlap) + if args.minimum_length: + pipeline.add_args("-L",args.minimum_length) + if args.sliding_window_length: + pipeline.add_args("-l",args.sliding_window_length) + if args.reference_data_path: + pipeline.add_args("-r",args.reference_data_path) + pipeline.add_args("-P",args.pipeline) + if args.use_silva: + pipeline.add_args("-S") + + # Echo the pipeline command to stdout + print "Running %s" % pipeline + + # Run the pipeline + with open("pipeline.log","w") as pipeline_out: + try: + subprocess.check_call(pipeline.cmd, + stdout=pipeline_out, + stderr=subprocess.STDOUT) + exit_code = 0 + print "Pipeline completed ok" + except subprocess.CalledProcessError as ex: + # Non-zero exit status + sys.stderr.write("Pipeline failed: exit code %s\n" % + ex.returncode) + exit_code = ex.returncode + except Exception as ex: + # Some other problem + sys.stderr.write("Unexpected error: %s\n" % str(ex)) + + # Write out the list of outputs + outputs_file = "Pipeline_outputs.txt" + list_outputs(outputs_file) + + # Check for log file + log_file = "Amplicon_analysis_pipeline.log" + if os.path.exists(log_file): + print "Found log file: %s" % log_file + if exit_code == 0: + # Create an HTML file to link to log files etc + # NB the paths to the files should be correct once + # copied by Galaxy on job completion + with open("pipeline_outputs.html","w") as html_out: + html_out.write("""<html> +<head> +<title>Amplicon analysis pipeline: log files</title> +<head> +<body> +<h1>Amplicon analysis pipeline: log files</h1> +<ul> +""") + html_out.write( + "<li>%s</li>\n" % + ahref("Amplicon_analysis_pipeline.log", + type="text/plain")) + html_out.write( + "<li>%s</li>\n" % + ahref("pipeline.log",type="text/plain")) + html_out.write( + "<li>%s</li>\n" % + ahref("Pipeline_outputs.txt", + type="text/plain")) + html_out.write( + "<li>%s</li>\n" % + ahref("Metatable.html")) + html_out.write("""<ul> +</body> +</html> +""") + else: + # Check for known error messages + check_errors() + # Write pipeline stdout to tool stderr + sys.stderr.write("\nOutput from pipeline:\n") + with open("pipeline.log",'r') as log: + sys.stderr.write("%s" % log.read()) + # Write log file contents to tool log + print "\nAmplicon_analysis_pipeline.log:" + with open(log_file,'r') as log: + print "%s" % log.read() + else: + sys.stderr.write("ERROR missing log file \"%s\"\n" % + log_file) + + # Handle FastQC boxplots + print "Amplicon analysis: collating per base quality boxplots" + with open("fastqc_quality_boxplots.html","w") as quality_boxplots: + # PHRED value for trimming + phred_score = 20 + if args.trimming_threshold is not None: + phred_score = args.trimming_threshold + # Write header for HTML output file + quality_boxplots.write("""<html> +<head> +<title>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</title> +<head> +<body> +<h1>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</h1> +""") + # Look for raw and trimmed FastQC output for each sample + for sample_name in sample_names: + fastqc_dir = os.path.join(sample_name,"FastQC") + quality_boxplots.write("<h2>%s</h2>" % sample_name) + for d in ("Raw","cutdapt_sickle/Q%s" % phred_score): + quality_boxplots.write("<h3>%s</h3>" % d) + fastqc_html_files = glob.glob( + os.path.join(fastqc_dir,d,"*_fastqc.html")) + if not fastqc_html_files: + quality_boxplots.write("<p>No FastQC outputs found</p>") + continue + # Pull out the per-base quality boxplots + for f in fastqc_html_files: + boxplot = None + with open(f) as fp: + for line in fp.read().split(">"): + try: + line.index("alt=\"Per base quality graph\"") + boxplot = line + ">" + break + except ValueError: + pass + if boxplot is None: + boxplot = "Missing plot" + quality_boxplots.write("<h4>%s</h4><p>%s</p>" % + (os.path.basename(f), + boxplot)) + quality_boxplots.write("""</body> +</html> +""") + + # Handle additional output when categories file was supplied + if args.categories_file is not None: + # Alpha diversity boxplots + print "Amplicon analysis: indexing alpha diversity boxplots" + boxplots_dir = os.path.abspath( + os.path.join("RESULTS", + "%s_%s" % (args.pipeline.title(), + ("gg" if not args.use_silva + else "silva")), + "Alpha_diversity", + "Alpha_diversity_boxplot", + "Categories_shannon")) + print "Amplicon analysis: gathering PDFs from %s" % boxplots_dir + boxplot_pdfs = [os.path.basename(pdf) + for pdf in + sorted(glob.glob( + os.path.join(boxplots_dir,"*.pdf")))] + with open("alpha_diversity_boxplots.html","w") as boxplots_out: + boxplots_out.write("""<html> +<head> +<title>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</title> +<head> +<body> +<h1>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</h1> +""") + boxplots_out.write("<ul>\n") + for pdf in boxplot_pdfs: + boxplots_out.write("<li>%s</li>\n" % ahref(pdf)) + boxplots_out.write("<ul>\n") + boxplots_out.write("""</body> +</html> +""") + + # Finish + print "Amplicon analysis: finishing, exit code: %s" % exit_code + sys.exit(exit_code)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/amplicon_analysis_pipeline.xml Thu Nov 09 10:13:29 2017 -0500 @@ -0,0 +1,484 @@ +<tool id="amplicon_analysis_pipeline" name="Amplicon Analysis Pipeline" version="1.0.6"> + <description>analyse 16S rRNA data from Illumina Miseq paired-end reads</description> + <requirements> + <requirement type="package" version="1.1">amplicon_analysis_pipeline</requirement> + <requirement type="package" version="1.11">cutadapt</requirement> + <requirement type="package" version="1.33">sickle</requirement> + <requirement type="package" version="27-08-2013">bioawk</requirement> + <requirement type="package" version="2.8.1">pandaseq</requirement> + <requirement type="package" version="3.5.0">spades</requirement> + <requirement type="package" version="0.11.3">fastqc</requirement> + <requirement type="package" version="1.8.0">qiime</requirement> + <requirement type="package" version="2.2.26">blast</requirement> + <requirement type="package" version="0.2.4">fasta-splitter</requirement> + <requirement type="package" version="2.2">rdp-classifier</requirement> + <requirement type="package" version="3.2.0">R</requirement> + <requirement type="package" version="1.1.3">vsearch</requirement> + <requirement type="package" version="2010-04-29">microbiomeutil</requirement> + <requirement type="package">fasta_number</requirement> + </requirements> + <stdio> + <exit_code range="1:" /> + </stdio> + <command><![CDATA[ + ## Set the reference database name + #if $reference_database == "" + #set reference_database_name = "gg" + #else + #set reference_database_name = "silva" + #end if + + ## Run the amplicon analysis pipeline wrapper + python $__tool_directory__/amplicon_analysis_pipeline.py + ## Set options + #if str( $forward_pcr_primer ) != "" + -g "$forward_pcr_primer" + #end if + #if str( $reverse_pcr_primer ) != "" + -G "$reverse_pcr_primer" + #end if + #if str( $trimming_threshold ) != "" + -q $trimming_threshold + #end if + #if str( $sliding_window_length ) != "" + -l $sliding_window_length + #end if + #if str( $minimum_overlap ) != "" + -O $minimum_overlap + #end if + #if str( $minimum_length ) != "" + -L $minimum_length + #end if + -P $pipeline + -r \$AMPLICON_ANALYSIS_REF_DATA_PATH + #if str( $reference_database ) != "" + "${reference_database}" + #end if + #if str($categories_file_in) != 'None' + -c "${categories_file_in}" + #end if + ## Input files + "${metatable_file_in}" + ## FASTQ pairs + #if str($input_type.pairs_or_collection) == "collection" + #set fastq_pairs = $input_type.fastq_collection + #else + #set fastq_pairs = $input_type.fastq_pairs + #end if + #for $fq_pair in $fastq_pairs + "${fq_pair.name}" "${fq_pair.forward}" "${fq_pair.reverse}" + #end for + && + + ## Collect outputs + cp Metatable_log/Metatable_mod.txt "${metatable_mod}" && + cp ${pipeline}_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom "${tax_otu_table_biom_file}" && + cp ${pipeline}_OTU_tables/otus.tre "${otus_tre_file}" && + cp RESULTS/${pipeline}_${reference_database_name}/OTUs_count.txt "${otus_count_file}" && + cp RESULTS/${pipeline}_${reference_database_name}/table_summary.txt "${table_summary_file}" && + cp Multiplexed_files/${pipeline}_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta "${dereplicated_nonchimera_otus_fasta}" && + cp QUALITY_CONTROL/Reads_count.txt "$read_counts_out" && + cp fastqc_quality_boxplots.html "${fastqc_quality_boxplots_html}" && + + ## HTML outputs + + ## OTU table + mkdir $heatmap_otu_table_html.files_path && + cp -r RESULTS/${pipeline}_${reference_database_name}/Heatmap/js $heatmap_otu_table_html.files_path && + cp RESULTS/${pipeline}_${reference_database_name}/Heatmap/otu_table.html "${heatmap_otu_table_html}" && + + ## Phylum genus barcharts + mkdir $phylum_genus_dist_barcharts_html.files_path && + cp -r RESULTS/${pipeline}_${reference_database_name}/phylum_genus_charts/charts $phylum_genus_dist_barcharts_html.files_path && + cp -r RESULTS/${pipeline}_${reference_database_name}/phylum_genus_charts/raw_data $phylum_genus_dist_barcharts_html.files_path && + cp RESULTS/${pipeline}_${reference_database_name}/phylum_genus_charts/bar_charts.html "${phylum_genus_dist_barcharts_html}" && + + ## Beta diversity weighted 2d plots + mkdir $beta_div_even_weighted_2d_plots.files_path && + cp -r RESULTS/${pipeline}_${reference_database_name}/beta_div_even/weighted_2d_plot/* $beta_div_even_weighted_2d_plots.files_path && + cp RESULTS/${pipeline}_${reference_database_name}/beta_div_even/weighted_2d_plot/weighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_weighted_2d_plots}" && + + ## Beta diversity unweighted 2d plots + mkdir $beta_div_even_unweighted_2d_plots.files_path && + cp -r RESULTS/${pipeline}_${reference_database_name}/beta_div_even/unweighted_2d_plot/* $beta_div_even_unweighted_2d_plots.files_path && + cp RESULTS/${pipeline}_${reference_database_name}/beta_div_even/unweighted_2d_plot/unweighted_unifrac_pc_2D_PCoA_plots.html "${beta_div_even_unweighted_2d_plots}" && + + ## Alpha diversity rarefaction plots + mkdir $alpha_div_rarefaction_plots.files_path && + cp RESULTS/${pipeline}_${reference_database_name}/Alpha_diversity/rarefaction_curves/rarefaction_plots.html $alpha_div_rarefaction_plots && + cp -r RESULTS/${pipeline}_${reference_database_name}/Alpha_diversity/rarefaction_curves/average_plots $alpha_div_rarefaction_plots.files_path && + + ## Categories data + #if str($categories_file_in) != 'None' + ## Alpha diversity boxplots + mkdir $alpha_div_boxplots.files_path && + cp alpha_diversity_boxplots.html "$alpha_div_boxplots" && + cp RESULTS/${pipeline}_${reference_database_name}/Alpha_diversity/Alpha_diversity_boxplot/Categories_shannon/*.pdf $alpha_div_boxplots.files_path && + #end if + + ## Pipeline outputs (log files etc) + mkdir $log_files.files_path && + cp Amplicon_analysis_pipeline.log $log_files.files_path && + cp pipeline.log $log_files.files_path && + cp Pipeline_outputs.txt $log_files.files_path && + cp Metatable_log/Metatable.html $log_files.files_path && + cp pipeline_outputs.html "$log_files" + ]]></command> + <inputs> + <param name="title" type="text" value="test" size="25" + label="Title" help="Optional text that will be added to the output dataset names" /> + <param type="data" name="metatable_file_in" format="tabular" + label="Input Metatable.txt file" /> + <param type="data" name="categories_file_in" format="txt" + label="Input Categories.txt file" optional="true" + help="(optional)" /> + <conditional name="input_type"> + <param name="pairs_or_collection" type="select" + label="Input FASTQ type"> + <option value="pairs_of_files">Pairs of datasets</option> + <option value="collection" selected="true">Dataset pairs in a collection</option> + </param> + <when value="collection"> + <param name="fastq_collection" type="data_collection" + format="fastqsanger,fastq" collection_type="list:paired" + label="Collection of FASTQ forward and reverse (R1/R2) pairs" + help="Each FASTQ pair will be treated as one sample; the name of each sample will be taken from the first column of the Metatable file " /> + </when> + <when value="pairs_of_files"> + <repeat name="fastq_pairs" title="Input fastq pairs" min="1"> + <param type="text" name="name" value="" + label="Final name for FASTQ pair" /> + <param type="data" name="fastq_r1" format="fastqsanger,fastq" + label="FASTQ with forward reads (R1)" /> + <param type="data" name="fastq_r2" format="fastqsanger,fastq" + label="FASTQ with reverse reads (R2)" /> + </repeat> + </when> + </conditional> + <param type="text" name="forward_pcr_primer" value="" + label="Forward PCR primer sequence" + help="Optional; must not include barcode or adapter sequence (-g)" /> + <param type="text" name="reverse_pcr_primer" value="" + label="Reverse PCR primer sequence" + help="Optional; must not include barcode or adapter sequence (-G)" /> + <param type="integer" name="trimming_threshold" value="20" + label="Threshold quality below which read will be trimmed" + help="Phred score; default is 20 (-q)" /> + <param type="integer" name="minimum_overlap" value="10" + label="Minimum overlap in bp between forward and reverse reads" + help="Default is 10 (-O)" /> + <param type="integer" name="minimum_length" value="200" + label="Minimum length in bp to keep sequence after overlapping" + help="Default is 200 (-L)" /> + <param type="integer" name="sliding_window_length" value="10" + label="Minimum length in bp to retain a read after trimming" + help="Supplied to Sickle; default is 10 (-l)" /> + <param type="select" name="pipeline" + label="Pipeline to use for analysis"> + <option value="Vsearch" selected="true" >Vsearch</option> + <!-- + Remove the QIIME and Uparse options for now + <option value="QIIME">QIIME</option> + <option value="Uparse">Uparse</option> + --> + </param> + <param type="select" name="reference_database" + label="Reference database"> + <option value="" selected="true">GreenGenes</option> + <option value="-S">Silva</option> + </param> + </inputs> + <outputs> + <data format="tabular" name="metatable_mod" + label="${tool.name}:${title} Metatable_mod.txt" /> + <data format="tabular" name="read_counts_out" + label="${tool.name} (${pipeline}):${title} read counts" /> + <data format="biom" name="tax_otu_table_biom_file" + label="${tool.name} (${pipeline}):${title} tax OTU table (biom format)" /> + <data format="tabular" name="otus_tre_file" + label="${tool.name} (${pipeline}):${title} otus.tre" /> + <data format="html" name="phylum_genus_dist_barcharts_html" + label="${tool.name} (${pipeline}):${title} phylum genus dist barcharts HTML" /> + <data format="tabular" name="otus_count_file" + label="${tool.name} (${pipeline}):${title} OTUs count file" /> + <data format="tabular" name="table_summary_file" + label="${tool.name} (${pipeline}):${title} table summary file" /> + <data format="fasta" name="dereplicated_nonchimera_otus_fasta" + label="${tool.name} (${pipeline}):${title} multiplexed linearized dereplicated mc2 repset nonchimeras OTUs FASTA" /> + <data format="html" name="fastqc_quality_boxplots_html" + label="${tool.name} (${pipeline}):${title} FastQC per-base quality boxplots HTML" /> + <data format="html" name="heatmap_otu_table_html" + label="${tool.name} (${pipeline}):${title} heatmap OTU table HTML" /> + <data format="html" name="beta_div_even_weighted_2d_plots" + label="${tool.name} (${pipeline}):${title} beta diversity weighted 2D plots HTML" /> + <data format="html" name="beta_div_even_unweighted_2d_plots" + label="${tool.name} (${pipeline}):${title} beta diversity unweighted 2D plots HTML" /> + <data format="html" name="alpha_div_rarefaction_plots" + label="${tool.name} (${pipeline}):${title} alpha diversity rarefaction plots HTML" /> + <data format="html" name="alpha_div_boxplots" + label="${tool.name} (${pipeline}):${title} alpha diversity boxplots"> + <filter>categories_file_in is not None</filter> + </data> + <data format="html" name="log_files" + label="${tool.name} (${pipeline}):${title} log files" /> + </outputs> + <tests> + </tests> + <help><![CDATA[ + +What it does +------------ + +This pipeline has been designed for the analysis of 16S rRNA data from +Illumina Miseq (Casava >= 1.8) paired-end reads. + +Usage +----- + +1. Preparation of the mapping file and format of unique sample id +***************************************************************** + +Before using the amplicon analysis pipeline it would be necessary to +follow the steps as below to avoid analysis failures and ensure samples +are labelled appropriately. Sample names for the labelling are derived +from the fastq files names that are generated from the sequencing. The +labels will include everything between the beginning of the name and +the sample number (from C11 to S19 in Fig. 1) + +.. image:: Pipeline_description_Fig1.png + :height: 46 + :width: 382 + +**Figure 1** + +If analysing 16S data from multiple runs: + +The samples from different runs may have identical IDs. For example, +when sequencing the same samples twice, by chance, these could be at +the same position in both the runs. This would cause the fastq files +to have exactly the same IDs (Fig. 2). + +.. image:: Pipeline_description_Fig2.png + :height: 100 + :width: 463 + +**Figure 2** + +In case of identical sample IDs the pipeline will fail to run and +generate an error at the beginning of the analysis. + +To avoid having to change the file names, before uploading the files, +ensure that the samples IDs are not repeated. + +2. To upload the file +********************* + +Click on **Get Data/Upload File** from the Galaxy tool panel on the +left hand side. + +From the pop-up window, choose how to upload the file. The +**Choose local file** option can be used for files up to 4Gb. Fastq files +from Illumina MiSeq will rarely be bigger than 4Gb and this option is +recommended. + +After choosing the files click **Start** to begin the upload. The window can +now be closed and the files will be uploaded onto the Galaxy server. You +will see the progress on the ``HISTORY`` panel on the right +side of the screen. The colour will change from grey (queuing), to yellow +(uploading) and finally green (uploaded). + +Once all the files are uploaded, click on the operations on multiple +datasets icon and select the fastq files that need to be analysed. +Click on the tab **For all selected...** and on the option +**Build List of Dataset pairs** (Fig. 3). + +.. image:: Pipeline_description_Fig3.png + :height: 247 + :width: 586 + +**Figure 3** + +Change the filter parameter ``_1`` and ``_2`` to be ``_R1`` and ``_R2``. +The fastq files forward R1 and reverse R2 should now appear in the +corresponding columns. + +Select **Autopair**. This creates a collection of paired fastq files for +the forward and reverse reads for each sample. The name of the pairs will +be the ones used by the pipeline. You are free to change the names at this +point as long as they are the same used in the Metatable file +(see section 3). + +Name the collection and click on **create list**. This reduces the time +required to input the forward and reverse reads for each individual sample. + +3. Create the Metatable files +***************************** + +Metatable.txt +~~~~~~~~~~~~~ + +Click on the list of pairs you just created to see the name of the single +pairs. The name of the pairs will be the ones used by the pipeline, +therefore, these are the names that need to be used in the Metatable file. + +The Metatable file has to be in QIIME format. You can find a description +of it on QIIME website http://qiime.org/documentation/file_formats.html + +EXAMPLE:: + + #SampleID BarcodeSequence LinkerPrimerSequence Disease Gender Description + Mock-RUN1 TAAGGCGAGCGTAAGA PsA Male Control + Mock-RUN2 CGTACTAGGCGTAAGA PsA Male Control + Mock-RUN3 AGGCAGAAGCGTAAGA PsC Female Control + +Briefly: the column ``LinkerPrimerSequence`` is empty but it cannot be +deleted. The header is very important. ``#SampleID``, ``Barcode``, +``LinkerPrimerSequence`` and ``Description`` are mandatory. Between +``LinkerPrimerSequence`` and ``Description`` you can add as many columns +as you want. For every column a PCoA plot will be created (see +**Results** section). You can create this file in Excel and it will have +to be saved as ``Text(Tab delimited)``. + +During the analysis the Metatable.txt will be checked to ensure that the +file has the correct format. If necessary, this will be modified and will +be available as Metatable_corrected.txt in the history panel. If you are +going to use the metatable file for any other statistical analyses, +remember to use the ``Metatable_mod.txt`` one, otherwise the sample +names might not match! + +Categories.txt (optional) +~~~~~~~~~~~~~~~~~~~~~~~~~ + +This file is required if you want to get box plots for comparison of +alpha diversity indices (see **Results** section). The file is a list +(without header and IN ONE COLUMN) of categories present in the +Metatable.txt file. THE NAMES YOU ARE USING HAVE TO BE THE SAME AS THE +ONES USED IN THE METATABLE.TXT. You can create this file in Excel and +will have to be saved as ``Text(Tab delimited)``. + +EXAMPLE:: + + Disease + Gender + +Metatable and categories files can be uploaded using Get Data as done +with the fatsq files. + +4. Analysis +*********** + +Under **Amplicon_Analysis_Pipeline** + + * **Title** Name to distinguish between the runs. It will be shown at + the beginning of each output file name. + + * **Input Metatable.txt file** Select the Metatable.txt file related to + this analysis + + * **Input Categories.txt file (Optional)** Select the Categories.txt file + related to this analysis + + * **Input FASTQ type** select *Dataset pairs in a collection* and, then, + the collection of pairs you created earlier. + + * **Forward/Reverse PCR primer sequence** if the PCR primer sequences + have not been removed from the MiSeq during the fastq creation, they + have to be removed before the analysis. Insert the PCR primer sequence + in the corresponding field. DO NOT include any barcode or adapter + sequence. If the PCR primers have been already trimmed by the MiSeq, + and you include the sequence in this field, this would lead to an error. + Only include the sequences if still present in the fastq files. + + * **Threshold quality below which reads will be trimmed** Choose the + Phred score used by Sickle to trim the reads at the 3’ end. + + * **Minimum length to retain a read after trimming** If the read length + after trimming is shorter than a user defined length, the read, along + with the corresponding read pair, will be discarded. + + * **Minimum overlap in bp between forward and reverse reads** Choose the + minimum basepair overlap used by Pandaseq to assemble the reads. + Default is 10. + + * **Minimum length in bp to keep a sequence after overlapping** Choose the + minimum sequence length used by Pandaseq to keep a sequence after the + overlapping. This depends on the expected amplicon length. Default is + 380 (used for V3-V4 16S sequencing; expected length ~440bp) + + * **Pipeline to use for analysis** Choose the pipeline to use for OTU + clustering and chimera removal. The Galaxy tool currently supports + ``Vsearch`` only. ``Uparse`` and ``QIIME`` are planned to be added + shortly (the tools are already available for the stand-alone pipeline). + + * **Reference database** Choose between ``GreenGenes`` and ``Silva`` + databases for taxa assignment. + +Click on **Execute** to start the analysis. + +5. Results +********** + +Results are entirely generated using QIIME scripts. The results will +appear in the History panel when the analysis is completed + + * **Vsearch_tax_OTU_table (biom format)** The OTU table in BIOM format + (http://biom-format.org/) + + * **Vsearch_OTUs.tree** Phylogenetic tree constructed using + ``make_phylogeny.py`` (fasttree) QIIME script + (http://qiime.org/scripts/make_phylogeny.html) + + * **Vsearch_phylum_genus_dist_barcharts_HTML** HTML file with bar + charts at Phylum, Genus and Species level + (http://qiime.org/scripts/summarize_taxa.html and + http://qiime.org/scripts/plot_taxa_summary.html) + + * **Vsearch_OTUs_count_file** Summary of OTU counts per sample + (http://biom-format.org/documentation/summarizing_biom_tables.html) + + * **Vsearch_table_summary_file** Summary of sequences counts per sample + (http://biom-format.org/documentation/summarizing_biom_tables.html) + + * **Vsearch_multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta** + Fasta file with OTU sequences + + * **Vsearch_heatmap_OTU_table_HTML** Interactive OTU heatmap + (http://qiime.org/1.8.0/scripts/make_otu_heatmap_html.html ) + + * **Vsearch_beta_diversity_weighted_2D_plots_HTML** PCoA plots in HTML + format using weighted Unifrac distance measure. Samples are grouped + by the column names present in the Metatable file. The samples are + firstly rarefied to the minimum sequencing depth + (http://qiime.org/scripts/beta_diversity_through_plots.html ) + + * **Vsearch_beta_diversity_unweighted_2D_plots_HTML** PCoA plots in HTML + format using Unweighted Unifrac distance measure. Samples are grouped + by the column names present in the Metatable file. The samples are + firstly rarefied to the minimum sequencing depth + (http://qiime.org/scripts/beta_diversity_through_plots.html ) + +Code availability +----------------- + +**Code is available at** https://github.com/MTutino/Amplicon_analysis + +Credits +------- + +Pipeline author: Mauro Tutino + +Galaxy tool: Peter Briggs + + ]]></help> + <citations> + <citation type="bibtex"> + @misc{githubAmplicon_analysis, + author = {Tutino, Mauro}, + year = {2017}, + title = {Amplicon Analysis Pipeline}, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/MTutino/Amplicon_analysis}, +}</citation> + </citations> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/install_tool_deps.sh Thu Nov 09 10:13:29 2017 -0500 @@ -0,0 +1,706 @@ +#!/bin/bash -e +# +# Install the tool dependencies for Amplicon_analysis_pipeline.sh for +# testing from command line +# +function install_python_package() { + echo Installing $2 $3 from $4 under $1 + local install_dir=$1 + local install_dirs="$install_dir $install_dir/bin $install_dir/lib/python2.7/site-packages" + for d in $install_dirs ; do + if [ ! -d $d ] ; then + mkdir -p $d + fi + done + wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q $4 + if [ ! -f "$(basename $4)" ] ; then + echo "No archive $(basename $4)" + exit 1 + fi + tar xzf $(basename $4) + if [ ! -d "$5" ] ; then + echo "No directory $5" + exit 1 + fi + cd $5 + /bin/bash <<EOF +export PYTHONPATH=$install_dir:$PYTHONPATH && \ +export PYTHONPATH=$install_dir/lib/python2.7/site-packages:$PYTHONPATH && \ +python setup.py install --prefix=$install_dir --install-scripts=$install_dir/bin --install-lib=$install_dir/lib/python2.7/site-packages >>$INSTALL_DIR/INSTALLATION.log 2>&1 +EOF + popd + rm -rf $wd/* + rmdir $wd +} +function install_amplicon_analysis_pipeline_1_1() { + install_amplicon_analysis_pipeline $1 1.1 +} +function install_amplicon_analysis_pipeline_1_0() { + install_amplicon_analysis_pipeline $1 1.0 +} +function install_amplicon_analysis_pipeline() { + version=$2 + echo Installing Amplicon_analysis $version + install_dir=$1/amplicon_analysis_pipeline/$version + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + echo Moving to $install_dir + pushd $install_dir + wget -q https://github.com/MTutino/Amplicon_analysis/archive/v${version}.tar.gz + tar zxf v${version}.tar.gz + mv Amplicon_analysis-${version} Amplicon_analysis + rm -rf v${version}.tar.gz + popd + # Make setup file + cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup Amplicon_analysis/$version +echo Setting up Amplicon analysis pipeline $version +export PATH=$install_dir/Amplicon_analysis:\$PATH +## AMPLICON_ANALYSIS_REF_DATA_PATH should be set in +## config/local_env.sh or in the job_conf.xml file +## - see the README +##export AMPLICON_ANALYSIS_REF_DATA_PATH= +# +EOF +} +function install_amplicon_analysis_pipeline_1_0_patched() { + version="1.0-patched" + echo Installing Amplicon_analysis $version + install_dir=$1/amplicon_analysis_pipeline/$version + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + echo Moving to $install_dir + pushd $install_dir + # Clone and patch analysis pipeline scripts + git clone https://github.com/pjbriggs/Amplicon_analysis.git + cd Amplicon_analysis + git checkout -b $version + branches= + if [ ! -z "$branches" ] ; then + for branch in $branches ; do + git checkout -b $branch origin/$branch + git checkout $version + git merge -m "Merge $branch into $version" $branch + done + fi + cd .. + popd + # Make setup file + cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup Amplicon_analysis/$version +echo Setting up Amplicon analysis pipeline $version +export PATH=$install_dir/Amplicon_analysis:\$PATH +## AMPLICON_ANALYSIS_REF_DATA_PATH should be set in +## config/local_env.sh or in the job_conf.xml file +## - see the README +##export AMPLICON_ANALYSIS_REF_DATA_PATH= +# +EOF +} +function install_cutadapt_1_11() { + echo Installing cutadapt 1.11 + INSTALL_DIR=$1/cutadapt/1.11 + if [ -f $INSTALL_DIR/env.sh ] ; then + return + fi + mkdir -p $INSTALL_DIR + install_python_package $INSTALL_DIR cutadapt 1.11 \ + https://pypi.python.org/packages/47/bf/9045e90dac084a90aa2bb72c7d5aadefaea96a5776f445f5b5d9a7a2c78b/cutadapt-1.11.tar.gz \ + cutadapt-1.11 + # Make setup file + cat > $INSTALL_DIR/env.sh <<EOF +#!/bin/sh +# Source this to setup cutadapt/1.11 +echo Setting up cutadapt 1.11 +#if [ -f $1/python/2.7.10/env.sh ] ; then +# . $1/python/2.7.10/env.sh +#fi +export PATH=$INSTALL_DIR/bin:\$PATH +export PYTHONPATH=$INSTALL_DIR:\$PYTHONPATH +export PYTHONPATH=$INSTALL_DIR/lib:\$PYTHONPATH +export PYTHONPATH=$INSTALL_DIR/lib/python2.7:\$PYTHONPATH +export PYTHONPATH=$INSTALL_DIR/lib/python2.7/site-packages:\$PYTHONPATH +# +EOF +} +function install_sickle_1_33() { + echo Installing sickle 1.33 + INSTALL_DIR=$1/sickle/1.33 + if [ -f $INSTALL_DIR/env.sh ] ; then + return + fi + mkdir -p $INSTALL_DIR + mkdir -p $INSTALL_DIR/bin + wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://github.com/najoshi/sickle/archive/v1.33.tar.gz + tar zxf v1.33.tar.gz + cd sickle-1.33 + make >$INSTALL_DIR/INSTALLATION.log 2>&1 + mv sickle $INSTALL_DIR/bin + popd + rm -rf $wd/* + rmdir $wd + # Make setup file + cat > $INSTALL_DIR/env.sh <<EOF +#!/bin/sh +# Source this to setup sickle/1.33 +echo Setting up sickle 1.33 +export PATH=$INSTALL_DIR/bin:\$PATH +# +EOF +} +function install_bioawk_27_08_2013() { + echo Installing bioawk 27-08-2013 + INSTALL_DIR=$1/bioawk/27-08-2013 + if [ -f $INSTALL_DIR/env.sh ] ; then + return + fi + mkdir -p $INSTALL_DIR + mkdir -p $INSTALL_DIR/bin + wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://github.com/lh3/bioawk/archive/v1.0.tar.gz + tar zxf v1.0.tar.gz + cd bioawk-1.0 + make >$INSTALL_DIR/INSTALLATION.log 2>&1 + mv bioawk $INSTALL_DIR/bin + mv maketab $INSTALL_DIR/bin + popd + rm -rf $wd/* + rmdir $wd + # Make setup file + cat > $INSTALL_DIR/env.sh <<EOF +#!/bin/sh +# Source this to setup bioawk/2013-07-13 +echo Setting up bioawk 2013-07-13 +export PATH=$INSTALL_DIR/bin:\$PATH +# +EOF +} +function install_pandaseq_2_8_1() { + # Taken from https://github.com/fls-bioinformatics-core/galaxy-tools/blob/master/local_dependency_installers/pandaseq.sh + echo Installing pandaseq 2.8.1 + local install_dir=$1/pandaseq/2.8.1 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://github.com/neufeld/pandaseq/archive/v2.8.1.tar.gz + tar xzf v2.8.1.tar.gz + cd pandaseq-2.8.1 + ./autogen.sh >$install_dir/INSTALLATION.log 2>&1 + ./configure --prefix=$install_dir >>$install_dir/INSTALLATION.log 2>&1 + make; make install >>$install_dir/INSTALLATION.log 2>&1 + popd + rm -rf $wd/* + rmdir $wd + # Make setup file + cat > $1/pandaseq/2.8.1/env.sh <<EOF +#!/bin/sh +# Source this to setup pandaseq/2.8.1 +echo Setting up pandaseq 2.8.1 +export PATH=$install_dir/bin:\$PATH +export LD_LIBRARY_PATH=$install_dir/lib:\$LD_LIBRARY_PATH +# +EOF +} +function install_spades_3_5_0() { + # See http://spades.bioinf.spbau.ru/release3.5.0/manual.html + echo Installing spades 3.5.0 + local install_dir=$1/spades/3.5.0 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q http://spades.bioinf.spbau.ru/release3.5.0/SPAdes-3.5.0-Linux.tar.gz + tar zxf SPAdes-3.5.0-Linux.tar.gz + cd SPAdes-3.5.0-Linux + mv bin $install_dir + mv share $install_dir + popd + rm -rf $wd/* + rmdir $wd + # Make setup file + cat > $1/spades/3.5.0/env.sh <<EOF +#!/bin/sh +# Source this to setup spades/3.5.0 +echo Setting up spades 3.5.0 +export PATH=$install_dir/bin:\$PATH +# +EOF +} +function install_fastqc_0_11_3() { + echo Installing fastqc 0.11.3 + local install_dir=$1/fastqc/0.11.3 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.3.zip + unzip -qq fastqc_v0.11.3.zip + cd FastQC + chmod 0755 fastqc + mv * $install_dir + popd + rm -rf $wd/* + rmdir $wd + # Make setup file + cat > $1/fastqc/0.11.3/env.sh <<EOF +#!/bin/sh +# Source this to setup fastqc/0.11.3 +echo Setting up fastqc 0.11.3 +export PATH=$install_dir:\$PATH +# +EOF +} +function install_qiime_1_8_0() { + # See http://qiime.org/1.8.0/install/install.html + echo Installing qiime 1.8.0 + INSTALL_DIR=$1/qiime/1.8.0 + if [ -f $INSTALL_DIR/env.sh ] ; then + return + fi + mkdir -p $INSTALL_DIR + # Atlas 3.10 (precompiled) + # NB this stolen from galaxyproject/iuc-tools + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://depot.galaxyproject.org/software/atlas/atlas_3.10.2_linux_x64.tar.gz + tar zxvf atlas_3.10.2_linux_x64.tar.gz + mv lib $INSTALL_DIR + command -v gfortran || return 0 + BUNDLED_LGF_CANON=$INSTALL_DIR/lib/libgfortran.so.3.0.0 + BUNDLED_LGF_VERS=`objdump -p $BUNDLED_LGF_CANON | grep GFORTRAN_1 | sed -r 's/.*GFORTRAN_1\.([0-9])+/\1/' | sort -n | tail -1` + echo 'program test; end program test' > test.f90 + gfortran -o test test.f90 + LGF=`ldd test | grep libgfortran | awk '{print $3}'` + LGF_CANON=`readlink -f $LGF` + LGF_VERS=`objdump -p $LGF_CANON | grep GFORTRAN_1 | sed -r 's/.*GFORTRAN_1\.([0-9])+/\1/' | sort -n | tail -1` + if [ $LGF_VERS -gt $BUNDLED_LGF_VERS ]; then + cp -p $BUNDLED_LGF_CANON ${BUNDLED_LGF_CANON}.bundled + cp -p $LGF_CANON $BUNDLED_LGF_CANON + fi + popd + rm -rf $wd/* + rmdir $wd + # Atlas 3.10 (build from source) + # NB this stolen from galaxyproject/iuc-tools + ##local wd=$(mktemp -d) + ##echo Moving to $wd + ##pushd $wd + ##wget -q https://depot.galaxyproject.org/software/atlas/atlas_3.10.2+gx0_src_all.tar.bz2 + ##wget -q https://depot.galaxyproject.org/software/lapack/lapack_3.5.0_src_all.tar.gz + ##wget -q https://depot.galaxyproject.org/software/atlas/atlas_patch-blas-lapack-1.0_src_all.diff + ##wget -q https://depot.galaxyproject.org/software/atlas/atlas_patch-shared-lib-1.0_src_all.diff + ##wget -q https://depot.galaxyproject.org/software/atlas/atlas_patch-cpu-throttle-1.0_src_all.diff + ##tar -jxvf atlas_3.10.2+gx0_src_all.tar.bz2 + ##cd ATLAS + ##mkdir build + ##patch -p1 < ../atlas_patch-blas-lapack-1.0_src_all.diff + ##patch -p1 < ../atlas_patch-shared-lib-1.0_src_all.diff + ##patch -p1 < ../atlas_patch-cpu-throttle-1.0_src_all.diff + ##cd build + ##../configure --prefix="$INSTALL_DIR" -D c -DWALL -b 64 -Fa alg '-fPIC' --with-netlib-lapack-tarfile=../../lapack_3.5.0_src_all.tar.gz -v 2 -t 0 -Si cputhrchk 0 + ##make + ##make install + ##popd + ##rm -rf $wd/* + ##rmdir $wd + export ATLAS_LIB_DIR=$INSTALL_DIR/lib + export ATLAS_INCLUDE_DIR=$INSTALL_DIR/include + export ATLAS_BLAS_LIB_DIR=$INSTALL_DIR/lib/atlas + export ATLAS_LAPACK_LIB_DIR=$INSTALL_DIR/lib/atlas + export ATLAS_ROOT_PATH=$INSTALL_DIR + export LD_LIBRARY_PATH=$INSTALL_DIR/lib:$LD_LIBRARY_PATH + export LD_LIBRARY_PATH=$INSTALL_DIR/lib/atlas:$LD_LIBRARY_PATH + # Numpy 1.7.1 + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://depot.galaxyproject.org/software/numpy/numpy_1.7_src_all.tar.gz + tar -zxvf numpy_1.7_src_all.tar.gz + cd numpy-1.7.1 + cat > site.cfg <<EOF +[DEFAULT] +library_dirs = $ATLAS_LIB_DIR +include_dirs = $ATLAS_INCLUDE_DIR +[blas_opt] +libraries = blas, atlas +[lapack_opt] +libraries = lapack, atlas +EOF + export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python2.7 + export ATLAS=$ATLAS_ROOT_PATH + python setup.py install --install-lib $INSTALL_DIR/lib/python2.7 --install-scripts $INSTALL_DIR/bin + popd + rm -rf $wd/* + rmdir $wd + # Python packages + ##install_python_package $INSTALL_DIR numpy 1.7.1 \ + ## https://pypi.python.org/packages/84/fb/5e9dfeeb5d8909d659e6892c97c9aa66d3798fad50e1d3d66b3c614a9c35/numpy-1.7.1.tar.gz \ + ## numpy-1.7.1 + install_python_package $INSTALL_DIR matplotlib 1.3.1 \ + https://pypi.python.org/packages/d4/d0/17f17792a4d50994397052220dbe3ac9850ecbde0297b7572933fa4a5c98/matplotlib-1.3.1.tar.gz \ + matplotlib-1.3.1 + install_python_package $INSTALL_DIR qiime 1.8.0 \ + https://github.com/biocore/qiime/archive/1.8.0.tar.gz \ + qiime-1.8.0 + install_python_package $INSTALL_DIR pycogent 1.5.3 \ + https://pypi.python.org/packages/1f/9f/c6f6afe09a3d62a6e809c7745413ffff0f1e8e04d88ab7b56faedf31fe28/cogent-1.5.3.tgz \ + cogent-1.5.3 + install_python_package $INSTALL_DIR pyqi 0.3.1 \ + https://pypi.python.org/packages/60/f0/a7392f5f5caf59a50ccaddbb35a458514953512b7dd6053567cb02849c6e/pyqi-0.3.1.tar.gz \ + pyqi-0.3.1 + install_python_package $INSTALL_DIR biom-format 1.3.1 \ + https://pypi.python.org/packages/98/3b/4e80a9a5c4a3c6764aa8c0c994973e7df71eee02fc6b8cc6e1d06a64ab7e/biom-format-1.3.1.tar.gz \ + biom-format-1.3.1 + install_python_package $INSTALL_DIR qcli 0.1.0 \ + https://pypi.python.org/packages/9a/9a/9c634aed339a5f063e0c954ae439d03b33a7159aa50c6f21034fe2d48fe8/qcli-0.1.0.tar.gz \ + qcli-0.1.0 + install_python_package $INSTALL_DIR pynast 1.2.2 \ + https://pypi.python.org/packages/a0/82/f381ff91afd7a2d92e74c7790823e256d87d5cd0a98c12eaac3d3ec64b8f/pynast-1.2.2.tar.gz \ + pynast-1.2.2 + install_python_package $INSTALL_DIR emperor 0.9.3 \ + https://pypi.python.org/packages/cd/f1/5d502a16a348efe1af7a8d4f41b639c9a165bca0b2f9db36bce89ad1ab40/emperor-0.9.3.tar.gz \ + emperor-0.9.3 + # Update the acceptable Python version + sed -i 's/acceptable_version = (2,7,3)/acceptable_version = (2,7,6)/g' $INSTALL_DIR/bin/print_qiime_config.py + # Non-Python dependencies + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q http://www.microbesonline.org/fasttree/FastTree + chmod 0755 FastTree + mv FastTree $INSTALL_DIR/bin + # Config file + sed -i 's,qiime_scripts_dir,qiime_scripts_dir\t'"$INSTALL_DIR\/bin"',g' $INSTALL_DIR/lib/python2.7/site-packages/qiime/support_files/qiime_config + popd + rm -rf $wd/* + rmdir $wd + # Make setup file + cat > $INSTALL_DIR/env.sh <<EOF +#!/bin/sh +# Source this to setup qiime/1.8.0 +echo Setting up qiime 1.8.0 +#if [ -f $1/python/2.7.10/env.sh ] ; then +# . $1/python/2.7.10/env.sh +#fi +export QIIME_CONFIG_FP=$INSTALL_DIR/lib/python2.7/site-packages/qiime/support_files/qiime_config +export PATH=$INSTALL_DIR/bin:\$PATH +export PYTHONPATH=$INSTALL_DIR:\$PYTHONPATH +export PYTHONPATH=$INSTALL_DIR/lib:\$PYTHONPATH +export PYTHONPATH=$INSTALL_DIR/lib/python2.7:\$PYTHONPATH +export PYTHONPATH=$INSTALL_DIR/lib/python2.7/site-packages:\$PYTHONPATH +export LD_LIBRARY_PATH=$ATLAS_LIB_DIR:\$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=$ATLAS_LIB_DIR/atlas::\$LD_LIBRARY_PATH +# +EOF +} +function install_vsearch_1_1_3() { + echo Installing vsearch 1.1.3 + local install_dir=$1/vsearch/1.1.3 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir/bin + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://github.com/torognes/vsearch/releases/download/v1.1.3/vsearch-1.1.3-linux-x86_64 + chmod 0755 vsearch-1.1.3-linux-x86_64 + mv vsearch-1.1.3-linux-x86_64 $install_dir/bin/vsearch + ln -s $install_dir/bin/vsearch $install_dir/bin/vsearch113 + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup vsearch/1.1.3 +echo Setting up vsearch 1.1.3 +export PATH=$install_dir/bin:\$PATH +# +EOF +} +function install_microbiomeutil_2010_04_29() { + # Provides ChimeraSlayer + echo Installing microbiomeutil 2010-04-29 + local install_dir=$1/microbiomeutil/2010-04-29 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://sourceforge.net/projects/microbiomeutil/files/__OLD_VERSIONS/microbiomeutil_2010-04-29.tar.gz + tar zxf microbiomeutil_2010-04-29.tar.gz + cd microbiomeutil_2010-04-29 + make >$install_dir/INSTALLATION.log 2>&1 + mv * $install_dir + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup microbiomeutil/2010-04-29 +echo Setting up microbiomeutil 2010-04-29 +export PATH=$install_dir/ChimeraSlayer:\$PATH +# +EOF +} +function install_blast_2_2_26() { + echo Installing blast 2.2.26 + local install_dir=$1/blast/2.2.26 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/blast-2.2.26-x64-linux.tar.gz + tar zxf blast-2.2.26-x64-linux.tar.gz + cd blast-2.2.26 + mv * $install_dir + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup blast/2.2.26 +echo Setting up blast 2.2.26 +export PATH=$install_dir/bin:\$PATH +# +EOF +} +function install_fasta_number() { + # See http://drive5.com/python/fasta_number_py.html + echo Installing fasta_number + # Install to "default" version i.e. essentially a versionless + # installation (see Galaxy dependency resolver docs) + local install_dir=$1/fasta_number + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + # Download and use MD5 as local version + wget -q http://drive5.com/python/python_scripts.tar.gz + local version=$(md5sum python_scripts.tar.gz | cut -d" " -f1) + # Check for existing installation + local default_dir=$install_dir/default + install_dir=$install_dir/$version + if [ -f $install_dir/env.sh ] ; then + return + fi + # Install scripts and make 'default' link + mkdir -p $install_dir/bin + mkdir -p $install_dir/lib + tar zxf python_scripts.tar.gz + mv fasta_number.py $install_dir/bin + mv die.py $install_dir/lib + ln -s $version $default_dir + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup fasta_number/$version +echo Setting up fasta_number $version +export PATH=$install_dir/bin:\$PATH +export PYTHONPATH=$install_dir/lib:\$PYTHONPATH +# +EOF +} +function install_fasta_splitter_0_2_4() { + echo Installing fasta-splitter 0.2.4 + local install_dir=$1/fasta-splitter/0.2.4 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir/bin + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + # Install Perl packages using cpanm + mkdir -p $install_dir/lib/perl5 + wget -q -L https://cpanmin.us/ -O cpanm + chmod +x cpanm + for package in "File::Util" ; do + /bin/bash <<EOF +export PATH=$install_dir/bin:$PATH PERL5LIB=$install_dir/lib/perl5:$PERL5LIB && \ +./cpanm -l $install_dir $package >>$install_dir/INSTALLATION.log +EOF + done + # Install fasta-splitter + wget -q http://kirill-kryukov.com/study/tools/fasta-splitter/files/fasta-splitter-0.2.4.zip + unzip -qq fasta-splitter-0.2.4.zip + chmod 0755 fasta-splitter.pl + mv fasta-splitter.pl $install_dir/bin + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup fasta-splitter/0.2.4 +echo Setting up fasta-splitter 0.2.4 +export PATH=$install_dir/bin:\$PATH +export PERL5LIB=$install_dir/lib/perl5:\$PERL5LIB +# +EOF +} +function install_rdp_classifier_2_2() { + echo Installing rdp-classifier 2.2R + local install_dir=$1/rdp-classifier/2.2 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q https://sourceforge.net/projects/rdp-classifier/files/rdp-classifier/rdp_classifier_2.2.zip + unzip -qq rdp_classifier_2.2.zip + cd rdp_classifier_2.2 + mv * $install_dir + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup rdp-classifier/2.2 +echo Setting up RDP classifier 2.2 +export RDP_JAR_PATH=$install_dir/rdp_classifier-2.2.jar +# +EOF +} +function install_R_3_2_0() { + # Adapted from https://github.com/fls-bioinformatics-core/galaxy-tools/blob/master/local_dependency_installers/R.sh + echo Installing R 3.2.0 + local install_dir=$1/R/3.2.0 + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q http://cran.r-project.org/src/base/R-3/R-3.2.0.tar.gz + tar xzf R-3.2.0.tar.gz + cd R-3.2.0 + ./configure --prefix=$install_dir + make + make install + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup R/3.2.0 +echo Setting up R 3.2.0 +export PATH=$install_dir/bin:\$PATH +export TCL_LIBRARY=$install_dir/lib/libtcl8.4.so +export TK_LIBRARY=$install_dir/lib/libtk8.4.so +# +EOF +} +function install_uc2otutab() { + # See http://drive5.com/python/uc2otutab_py.html + echo Installing uc2otutab + # Install to "default" version i.e. essentially a versionless + # installation (see Galaxy dependency resolver docs) + local install_dir=$1/uc2otutab/default + if [ -f $install_dir/env.sh ] ; then + return + fi + mkdir -p $install_dir/bin + local wd=$(mktemp -d) + echo Moving to $wd + pushd $wd + wget -q http://drive5.com/python/python_scripts.tar.gz + tar zxf python_scripts.tar.gz + mv die.py fasta.py progress.py uc.py $install_dir/bin + echo "#!/usr/bin/env python" >$install_dir/bin/uc2otutab.py + cat uc2otutab.py >>$install_dir/bin/uc2otutab.py + chmod +x $install_dir/bin/uc2otutab.py + popd + # Clean up + rm -rf $wd/* + rmdir $wd + # Make setup file +cat > $install_dir/env.sh <<EOF +#!/bin/sh +# Source this to setup uc2otutab/default +echo Setting up uc2otutab \(default\) +export PATH=$install_dir/bin:\$PATH +# +EOF +} +########################################################## +# Main script starts here +########################################################## +# Fetch top-level installation directory from command line +TOP_DIR=$1 +if [ -z "$TOP_DIR" ] ; then + echo Usage: $(basename $0) DIR + exit +fi +if [ -z "$(echo $TOP_DIR | grep ^/)" ] ; then + TOP_DIR=$(pwd)/$TOP_DIR +fi +if [ ! -d "$TOP_DIR" ] ; then + mkdir -p $TOP_DIR +fi +# Install dependencies +install_amplicon_analysis_pipeline_1_1 $TOP_DIR +install_cutadapt_1_11 $TOP_DIR +install_sickle_1_33 $TOP_DIR +install_bioawk_27_08_2013 $TOP_DIR +install_pandaseq_2_8_1 $TOP_DIR +install_spades_3_5_0 $TOP_DIR +install_fastqc_0_11_3 $TOP_DIR +install_qiime_1_8_0 $TOP_DIR +install_vsearch_1_1_3 $TOP_DIR +install_microbiomeutil_2010_04_29 $TOP_DIR +install_blast_2_2_26 $TOP_DIR +install_fasta_number $TOP_DIR +install_fasta_splitter_0_2_4 $TOP_DIR +install_rdp_classifier_2_2 $TOP_DIR +install_R_3_2_0 $TOP_DIR +install_uc2otutab $TOP_DIR +## +#