Mercurial > repos > iuc > ampvis2_load
view load.xml @ 0:474bbc45ddd9 draft
"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/ampvis2 commit 7c0ecbffdb5e993f5af7e3b52c424c2761fb91d3"
author | iuc |
---|---|
date | Mon, 04 Apr 2022 10:24:51 +0000 |
parents | |
children | 8d77d277996e |
line wrap: on
line source
<tool id="ampvis2_load" name="ampvis2 load" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@" license="MIT"> <description></description> <macros> <import>macros.xml</import> </macros> <expand macro="header"/> <command detect_errors="exit_code"><![CDATA[ #if $otutable.ext.startswith("biom") ln -s '$otutable' otutable.biom && #else ln -s '$otutable' otutable.tsv && #end if #if $taxonomy ln -s '$taxonomy' taxonomy.tsv && #end if Rscript '$rscript' ]]></command> <configfiles> <configfile name="rscript"><![CDATA[ library(ampvis2, quietly = TRUE) library(readr, quietly = TRUE) ## 'manually' load metadata treating all columns as character ## giving colClasses to amp_load seems not possible #if $metadata metadata <- read.table("$metadata", header = TRUE, sep = "\t", colClasses = "character") #end if data <- amp_load( #if $otutable.ext.startswith("biom") otutable = "otutable.biom", #else otutable = "otutable.tsv", #end if #if $metadata metadata = metadata, #end if #if $taxonomy taxonomy = "taxonomy.tsv", #end if #if $fasta fasta = "$fasta", #end if #if $tree tree = "$tree", #end if pruneSingletons = $pruneSingletons ) ## try to guess column types with plyr::type.convert #if $guess_column_types data\$metadata <- readr::type_convert(data\$metadata, guess_integer=TRUE) #end if saveRDS(data, "$ampvis") ## write metadata list for biom input or if metadata is given #if "metadata" in $write_lists @SAVE_METADATA_LIST@ #end if #if "tax" in $write_lists @SAVE_TAX_LIST@ #end if ## print overview of the data to stdout data ]]></configfile> </configfiles> <inputs> <param argument="otutable" type="data" format="tabular,biom1,biom2" label="OTU table"/> <param argument="metadata" type="data" format="tsv" optional="true" label="Sample metadata"> <validator type="expression" message="Table must have at least 1 column"><![CDATA[value.metadata.columns > 0]]></validator> <!-- TODO in future versions this might change https://github.com/MadsAlbertsen/ampvis2/pull/134 if so, then also adapt help text and test data --> <validator type="expression" message="First column must be named SampleID"><![CDATA[value.metadata.column_names[0] == "SampleID"]]></validator> </param> <param name="guess_column_types" type="boolean" checked="true" label="Guess metadata column types" help="See help"/> <param argument="taxonomy" type="data" format="tabular" optional="true" label="Taxonomy table"/> <param argument="fasta" type="data" format="fasta" optional="true" label="Fasta file"/> <param argument="tree" type="data" format="newick" optional="true" label="Phylogenetic tree"/> <param argument="pruneSingletons" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove singleton OTUs"/> <param name="write_lists" type="select" optional="true" multiple="true" label="Output list data sets" help="Needed by most downstream tools. Select if the inputs contain taxonomic / metadata information."> <option value="tax" selected="true">Taxonomy list</option> <option value="metadata" selected="true">Metadata list</option> </param> </inputs> <outputs> <data name="ampvis" format="ampvis2"/> <data name="metadata_list_out" format="tabular" label="${tool.name} on ${on_string}: metadata list"> <filter>write_lists and "metadata" in write_lists</filter> </data> <data name="taxonomy_list_out" format="tabular" label="${tool.name} on ${on_string}: taxonomy list"> <filter>write_lists and "tax" in write_lists</filter> </data> </outputs> <tests> <!-- load otu table + metadata + taxonomy --> <test expect_num_outputs="3"> <param name="otutable" value="AalborgWWTPs.otu.csv"/> <param name="metadata" value="AalborgWWTPs.tsv" ftype="tsv"/> <param name="taxonomy" value="AalborgWWTPs.tax"/> <output name="ampvis" value="AalborgWWTPs.rds" ftype="ampvis2" compare="sim_size"/> <output name="metadata_list_out" value="AalborgWWTPs-metadata.list"/> <output name="taxonomy_list_out" value="AalborgWWTPs-taxonomy.list"/> <assert_stdout> <has_text text="575.79"/> <has_text text="SampleID, Plant, Date, Year, Period"/> <has_text text="200(100%) 194(97%) 177(88.5%) 170(85%) 152(76%) 113(56.5%) 2(1%)"/> </assert_stdout> </test> <!-- load otu table + metadata + taxonomy + tree + fasta --> <test expect_num_outputs="3"> <param name="otutable" value="AalborgWWTPs.otu.csv"/> <param name="metadata" value="AalborgWWTPs.tsv" ftype="tsv"/> <param name="taxonomy" value="AalborgWWTPs.tax"/> <param name="fasta" value="AalborgWWTPs.fa" ftype="fasta"/> <param name="tree" value="AalborgWWTPs.nwk" ftype="newick"/> <output name="ampvis" value="AalborgWWTPs-complete.rds" ftype="ampvis2" compare="sim_size"/> <output name="metadata_list_out" value="AalborgWWTPs-metadata.list"/> <output name="taxonomy_list_out" value="AalborgWWTPs-taxonomy.list"/> <assert_stdout> <has_text text="575.79"/> <has_text text="SampleID, Plant, Date, Year, Period"/> <has_text text="200(100%) 194(97%) 177(88.5%) 170(85%) 152(76%) 113(56.5%) 2(1%)"/> </assert_stdout> </test> <!-- test biom 1/2 input (taken from https://github.com/biocore/biom-format/tree/master/examples) metadata seems not to be loaded from a biom file https://github.com/MadsAlbertsen/ampvis2/issues/129 taxonomy is loaded from all but 1 --> <test> <param name="otutable" value="rich-dense.biom" ftype="biom1"/> <output name="ampvis" ftype="ampvis2"> <assert_contents> <has_size value="748"/> </assert_contents> </output> <assert_stdout> <has_text text="4.5"/> <has_text text="SampleID, BarcodeSequence, LinkerPrimerSequence, BODY_SITE, Description"/> <has_text text="5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 1(20%)"/> </assert_stdout> </test> <test> <param name="otutable" value="rich-sparse.biom" ftype="biom1"/> <output name="ampvis" ftype="ampvis2"> <assert_contents> <has_size value="751"/> </assert_contents> </output> <assert_stdout> <has_text text="4.5"/> <has_text text="SampleID, BarcodeSequence, LinkerPrimerSequence, BODY_SITE, Description"/> <has_text text="5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 1(20%)"/> </assert_stdout> </test> <test> <param name="otutable" value="min_sparse_otu_table_hdf5.biom" ftype="biom2"/> <output name="ampvis" ftype="ampvis2"> <assert_contents> <has_size value="395"/> </assert_contents> </output> <assert_stdout> <has_text text="4.5"/> <!-- input file seems to miss metadata check that no metadata & taxonomy is loaded (ampvis2 adds dummy metadata) --> <has_text text="SampleID, DummyVariable"/> <has_text text="0(0%) 0(0%) 0(0%) 0(0%) 0(0%) 0(0%) 0(0%)"/> </assert_stdout> </test> <test> <param name="otutable" value="rich_sparse_otu_table_hdf5.biom" ftype="biom2"/> <output name="ampvis" ftype="ampvis2"> <assert_contents> <has_size value="753"/> </assert_contents> </output> <assert_stdout> <has_text text="4.5"/> <has_text text="SampleID, BODY_SITE, BarcodeSequence, Description, LinkerPrimerSequence"/> <has_text text="5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 1(20%)"/> </assert_stdout> </test> </tests> <help><![CDATA[ What it does ============ This tool reads an OTU-table and corresponding sample metadata, and returns a RDS data set for use in all ampvis2 tools. It is therefore required to load data with this tool before any other ampvis2 tools can be used. The Galaxy tool calls the `amp_load <https://madsalbertsen.github.io/ampvis2/reference/amp_load.html>`_ function of the ampvis2 package. This function validates and corrects the provided data frames in different ways to make it suitable for the rest of the ampvis2 tools. It is important that the provided data sets match the requirements as described in the following to work properly. Input ===== **The OTU-table** contains information about the OTUs, their read counts in each sample, and optionally their assigned taxonomy. The OTU table can be given as - Tabular data set - BIOM version (1 and 2) Metadata and taxonomy in the tabular or BIOM files that are given via the ``OTU table`` parameter can is overwritten if by data presented via the ``Sample metadata`` or ``Taxonomy table`` parameters. If given in tabular format the provided OTU-table must be a table with the following requirements: - The rows are OTU IDs and the columns are samples. - The OTU ID's are expected to be in a column called "OTU", "ASV", or "#OTU ID". - The column names of the table are the sample IDs, exactly matching those in the metadata - The last 7 columns are optionally the corresponding taxonomy assigned to the OTUs, named "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species". Generally avoid special characters and spaces in row- and column names. The OTU table can also contain the taxonomic information in additional columns: Kingdom, Phylum, Class, Order, Family, Genus. Check `here <https://biom-format.org/>`_ for information on the BIOM formats. **The metadata** contains additional information about the samples, for example where each sample was taken, date, pH, treatment etc, which is used to compare and group the samples during analysis. The amount of information in the metadata is unlimited, it can contain any number of columns (variables), however there are a few requirements: - The sample IDs must be in the first column and the column must be named ``SampleID``. These sample IDs must match exactly to those in the OTU-table. Any unmatched samples between the otutable and metadata will be removed with a warning. - Generally avoid special characters and spaces in row- and column names. By default the data types of metadata columns are guessed with ``readr::type_convert``. The guessed column types can be seen in the last (4th) column of the ``metadata list`` output and also stdout of the tool. Guessing of data types can be disabled using the parameter ``Guess metadata column types``. If disabled matadata from separate tabular input is treated as character data, and if loaded from biom files that data is used as is. Metadata types can be set manually using the tool ``ampvis2: set metadata`` Dates should be given in the format ``YYYY-MM-DD`` (Y: year, M: month, D: day). In addition to the RDS data set a metadata (resp. taxonomy) list data set is returned if metadata (resp. taxonomic information) is given to this tool. It contains restructured metadata (taxonomic information) that is used in downstream ampvis2 Galaxy tools in order to select metadata / metadata values (resp. taxonomic levels). **Taxonomy** is a tabular data set with 7 columns and one row per ASV/OTU: - the 1st column is identical to the 1st column of the OTU table parameter - the remaining columns contain data for Kingdom, Phylum, Class, Order, Family, Genus Note that the taxonomic information can also be embedded in the OTU table. **Tree** a tree with branch lengths in Newick format. This is needed / usefull only if the data is used as input of: ``ampvis2: ordination plot`` for ordination methods NNDS / MMDS with (un)weighted UniFrac distances. Note that the loaded tree is also filtered by the ``ampvis2: subset ...`` tools. **Fasta** a fasta file containing the sequences of the OTUs. Note that this information is only used in ``ampvis2: export fasta``. If the OTU table is modified by ``ampvis2: mergereplicates`` or the ``ampvis2: subset ...`` tools this might be useful to obtain a filtered list of sequences. Output ====== **RDS** The main output of the tool is an RDS data set that contains the R representation of the ampvis2 object containing the provided data (OTU table, metadata, taxonomy, phylogenetic tree, and fasta). **List files** Summarize the metadata and taxonomy information: - the taxonomy list file lists all taxa in a 1 column tabular data set - the metadata list file lists the Metadata variables (column 1), and the corresponding available metadata values (column 2), if the variable is the SampleID (column 3), and the data type of the corresponding metadata variable (column 4) These files are auxilliary files that are needed in downstream ``ampvis2`` Galaxy tools to allow selecting metadata and taxonomy. They are not passed to the underlying R functions. Note that, if the no taxonomy (or metadata) is given then the underlying ``ampvis2`` R function adds dummy taxonomy (resp. metadata). In this case the output of the list datasets can be disabled with the ``Output list data sets`` parameter. ]]></help> <expand macro="citations"/> </tool>