Mercurial > repos > iuc > anndata_manipulate
view manipulate.xml @ 14:c4209ea387d4 draft default tip
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/anndata/ commit c1d84c1850c53deccc384de3960d2cec37bb2869
author | iuc |
---|---|
date | Fri, 08 Nov 2024 21:59:34 +0000 |
parents | 7e8c677a7b71 |
children |
line wrap: on
line source
<tool id="anndata_manipulate" name="Manipulate AnnData" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> <description>object</description> <macros> <import>macros.xml</import> <xml name="param_join"> <param name="join" type="select" label="The connecting string between name and integer"> <option value="-">-</option> <option value="_">_</option> <option value=" "> </option> <option value="/">/</option> </param> </xml> </macros> <expand macro="bio_tools"/> <expand macro="requirements"/> <expand macro="version_command"/> <command detect_errors="exit_code"><![CDATA[ @CMD@ ]]></command> <configfiles> <configfile name="script_file"><![CDATA[ @CMD_imports@ adata = ad.read_h5ad('$input') #if $manipulate.function == 'concatenate' #for i, filepath in enumerate($manipulate.other_adatas) adata_$i = ad.read_h5ad('$filepath') #end for adata = adata.concatenate( #for i, filepath in enumerate($manipulate.other_adatas) adata_$i, #end for join='$manipulate.join', #if str($manipulate.index_unique) != '' index_unique='$manipulate.index_unique', #else index_unique=None, #end if batch_key='$manipulate.batch_key') #else if $manipulate.function == 'var_names_make_unique' adata.var_names_make_unique(join='$manipulate.join') #else if $manipulate.function == 'obs_names_make_unique' adata.obs_names_make_unique(join='$manipulate.join') #else if $manipulate.function == 'rename_categories' #set $categories = [x.strip() for x in str($manipulate.categories).split(',')] adata.rename_categories( key='$manipulate.key', categories=$categories) #else if $manipulate.function == 'remove_keys' #if $manipulate.obs_keys #set $keys = [x.strip() for x in str($manipulate.obs_keys).split(',')] adata.obs = adata.obs.drop(columns=$keys) #end if #if $manipulate.var_keys #set $keys = [x.strip() for x in str($manipulate.var_keys).split(',')] adata.var = adata.vars.drop(columns=$keys) #end if #else if $manipulate.function == 'flag_genes' ## adapted from anndata operations #for $flag in $manipulate.gene_flags k_cat = adata.var_names.str.startswith('${flag.startswith}') if k_cat.sum() > 0: adata.var['${flag.col_name}'] = k_cat else: print(f'No genes starting with {'${flag.startswith}'} found.') #end for #else if $manipulate.function == 'strings_to_categoricals' adata.strings_to_categoricals() #else if $manipulate.function == 'transpose' adata = adata.transpose() #else if $manipulate.function == 'add_annotation' import pandas as pd extra_annot_t = pd.read_csv('$manipulate.new_annot', sep='\t').reset_index(drop=True) #if $manipulate.var_obs == 'var' var_index = adata.var_names var = pd.concat([adata.var.reset_index(drop=True), extra_annot_t], axis=1) var.index = var_index adata.var = var #else if $manipulate.var_obs == 'obs' obs_index = adata.obs.index obs = pd.concat([adata.obs.reset_index(drop=True), extra_annot_t], axis=1) obs.index = obs_index adata.obs = obs #end if #else if $manipulate.function == 'split_on_obs' import os res_dir = "output_split" os.makedirs(res_dir, exist_ok=True) for s,field_value in enumerate(adata.obs["${manipulate.key}"].unique()): ad_s = adata[adata.obs.${manipulate.key} == field_value] ad_s.write(f"{res_dir}/${manipulate.key}_{s}.h5ad", compression='gzip') #else if $manipulate.function == 'filter' #if $manipulate.filter.filter == 'key' #if $manipulate.var_obs == 'var' filtered = adata.var['$manipulate.filter.key'] #else if $manipulate.var_obs == 'obs' filtered = adata.obs['$manipulate.filter.key'] #end if #if $manipulate.filter.filter_key.type == 'number' #if $manipulate.filter.filter_key.filter == 'equal' filtered = filtered == $manipulate.filter.filter_key.value #else if $manipulate.filter.filter_key.filter == 'equal' filtered = filtered != $manipulate.filter.filter_key.value #else if $manipulate.filter.filter_key.filter == 'less' filtered = filtered < $manipulate.filter.filter_key.value #else if $manipulate.filter.filter_key.filter == 'less_or_equal' filtered = filtered <= $manipulate.filter.filter_key.value #else if $manipulate.filter.filter_key.filter == 'greater' filtered = filtered > $manipulate.filter.filter_key.value #else if $manipulate.filter.filter_key.filter == 'greater_or_equal' filtered = filtered >= $manipulate.filter.filter_key.value #end if #else if $manipulate.filter.filter_key.type == 'text' #if $manipulate.filter.filter_key.filter == 'equal' filtered = filtered == '$manipulate.filter.filter_key.value' #else filtered = filtered != '$manipulate.filter.filter_key.value' #end if #else if $manipulate.filter.filter_key.type == 'boolean' filtered = filtered == $manipulate.filter.filter_key.value #end if #else if $manipulate.filter.filter == 'index' #if str($manipulate.filter.index.format) == 'file' with open('$manipulate.filter.index.file', 'r') as filter_f: filters = [str(x.strip()) for x in filter_f.readlines()] filtered = filters #else #set $filters = [str(x.strip()) for x in $manipulate.filter.index.text.split(',')] filtered = $filters #end if #end if print(filtered) #if $manipulate.var_obs == 'var' adata = adata[:,filtered] #else if $manipulate.var_obs == 'obs' adata = adata[filtered, :] #end if #else if $manipulate.function == 'save_raw' adata.raw = adata #end if #if $manipulate.function != 'split_on_obs' adata.write('anndata.h5ad', compression='gzip') print(adata) #end if ]]></configfile> </configfiles> <inputs> <param name="input" type="data" format="h5ad" label="Annotated data matrix"/> <conditional name="manipulate"> <param name="function" type="select" label="Function to manipulate the object"> <option value="concatenate">Concatenate along the observations axis</option> <option value="obs_names_make_unique">Makes the obs index unique by appending '1', '2', etc</option> <option value="var_names_make_unique">Makes the var index unique by appending '1', '2', etc</option> <option value="rename_categories">Rename categories of annotation</option> <option value="remove_keys">Remove keys from obs or var annotations</option> <option value="flag_genes">Flag genes start with a pattern</option><!--adapted from EBI anndata operations tool --> <option value="strings_to_categoricals">Transform string annotations to categoricals</option> <option value="transpose">Transpose the data matrix, leaving observations and variables interchanged</option> <option value="add_annotation">Add new annotation(s) for observations or variables</option> <option value="split_on_obs">Split the AnnData object into multiple AnnData objects based on the values of a given obs key</option><!--adapted from EBI anndata operations tool--> <option value="filter">Filter observations or variables</option> <option value="save_raw">Freeze the current state into the 'raw' attribute</option> </param> <when value="concatenate"> <param name="other_adatas" type="data" format="h5ad" multiple="true" label="Annotated data matrix to add"/> <param name="join" type="select" label="Join method"> <option value="inner">Intersection of variables</option> <option value="outer">Union of variables</option> </param> <param name="batch_key" type="text" value="batch" label="Key to add the batch annotation to obs"/> <param name="index_unique" type="select" label="Separator to join the existing index names with the batch category" help="Leave it empty to keep existing indices"> <option value="-">-</option> <option value="_">_</option> <option value=" "> </option> <option value="/">/</option> </param> </when> <when value="obs_names_make_unique"> <expand macro="param_join"/> </when> <when value="var_names_make_unique"> <expand macro="param_join"/> </when> <when value="rename_categories"> <param name="key" type="text" value="" label="Key for observations or variables annotation" help="Annotation key in obs or var"/> <param name="categories" type="text" value="" label="Comma-separated list of new categories" help="It should be the same number as the old categories"/> </when> <when value="remove_keys"> <param name="obs_keys" type="text" value="" optional="true" label="Keys/fields to remove from observations (obs)"> <expand macro="sanitize_query"/> </param> <param name="var_keys" type="text" value="" optional="true" label="Keys/fields to remove from variables (var)"> <expand macro="sanitize_query"/> </param> </when> <when value="flag_genes"> <repeat name="gene_flags" title="Flag genes that start with these names"> <param name="startswith" type="text" label="Text that you expect the genes to be flagged to start with" help="For example, 'MT-' for mito genes"> <sanitizer invalid_char=""> <valid initial="string.ascii_letters,string.digits,string.punctuation"> <remove value="'" /> </valid> </sanitizer> </param> <param name="col_name" type="text" label="Name of the column in var.names where this boolean flag is stored" help="For example, name this column as 'mito' for mitochondrial genes."/> </repeat> </when> <when value="strings_to_categoricals" ></when> <when value="transpose" ></when> <when value="add_annotation"> <param name="var_obs" type="select" label="What to annotate?"> <option value="var">Variables (var)</option> <option value="obs">Observations (obs)</option> </param> <param name="new_annot" type="data" format="tabular" label="Table with new annotations" help="The new table should have the same number of rows and same order than obs or var. The key names should be in the header (1st line)"/> </when> <when value="split_on_obs"> <param name="key" type="text" label="The obs key to split on" help="For example, if you want to split on cluster annotation, you can use the key 'louvain'. The output will be a collection of anndata objects"> <sanitizer invalid_char=""> <valid initial="string.ascii_letters,string.digits,string.punctuation"> <remove value="'" /> </valid> </sanitizer> </param> </when> <when value="filter"> <param name="var_obs" type="select" label="What to filter?"> <option value="var">Variables (var)</option> <option value="obs">Observations (obs)</option> </param> <conditional name="filter"> <param name="filter" type="select" label="Type of filtering?"> <option value="key">By key (column) values</option> <option value="index">By index (row)</option> </param> <when value="key"> <param name="key" type="text" value="n_genes" label="Key to filter"/> <conditional name="filter_key"> <param name="type" type="select" label="Type of value to filter"> <option value="number">Number</option> <option value="text">Text</option> <option value="boolean">Boolean</option> </param> <when value="number"> <param name="filter" type="select" label="Filter"> <option value="equal">equal to</option> <option value="not_equal">not equal to</option> <option value="less">less than</option> <option value="less_or_equal">less than or equal to</option> <option value="greater">greater than</option> <option value="greater_or_equal">greater than or equal to</option> </param> <param name="value" type="float" value="2500" label="Value"/> </when> <when value="text"> <param name="filter" type="select" label="Filter"> <option value="equal">equal to</option> <option value="not_equal">not equal to</option></param> <param name="value" type="text" value="2500" label="Value"/> </when> <when value="boolean"> <param name="value" type="boolean" truevalue="True" falsevalue="False" checked="true" label="Value to keep"/> </when> </conditional> </when> <when value="index"> <conditional name="index"> <param name="format" type="select" label="Format for the filter by index"> <option value="file">File</option> <option value="text" selected="true">Text</option> </param> <when value="text"> <param name="text" type="text" value="" label="List of index to keep" help="Indexes separated by a comma"/> </when> <when value="file"> <param name="file" type="data" format="txt" label="File with the list of index to keep" help="One index per line"/> </when> </conditional> </when> </conditional> </when> <when value="save_raw"></when> </conditional> </inputs> <outputs> <data name="anndata" format="h5ad" from_work_dir="anndata.h5ad" label="${tool.name} (${manipulate.function}) on ${on_string}"> <filter>manipulate['function'] != 'split_on_obs'</filter> </data> <collection name="output_h5ad_split" type="list" label="${tool.name} (${manipulate.function}) on ${on_string} Collection"> <discover_datasets pattern="(?P<designation>.+)\.h5" directory="output_split" format="h5ad" visible="true"/> <filter>manipulate['function'] == 'split_on_obs'</filter> </collection> </outputs> <tests> <test expect_num_outputs="1"> <!-- test 1 --> <param name="input" value="import.csv.h5ad"/> <conditional name="manipulate"> <param name="function" value="concatenate"/> <param name="other_adatas" value="import.csv.h5ad"/> <param name="join" value="inner"/> <param name="batch_key" value="batch"/> <param name="index_unique" value="-"/> </conditional> <assert_stdout> <has_text_matching expression="adata_0"/> <has_text_matching expression="adata.concatenate"/> <has_text_matching expression="join='inner'"/> <has_text_matching expression="index_unique='-'"/> <has_text_matching expression="batch_key='batch'"/> <has_text_matching expression="6 × 2"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/batch"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 2 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="obs_names_make_unique"/> <param name="join" value="-"/> </conditional> <assert_stdout> <has_text_matching expression="adata.obs_names_make_unique\(join='-'\)"/> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 3 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="var_names_make_unique"/> <param name="join" value="-"/> </conditional> <assert_stdout> <has_text_matching expression="adata.var_names_make_unique\(join='-'\)"/> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 4 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="rename_categories"/> <param name="key" value="cell_type"/> <param name="categories" value="ery, mk, mo, progenitor"/> </conditional> <assert_stdout> <has_text_matching expression="adata.rename_categories"/> <has_text_matching expression="key='cell_type'"/> <has_text_matching expression="categories=\['ery', 'mk', 'mo', 'progenitor'\]"/> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 5 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="strings_to_categoricals"/> </conditional> <assert_stdout> <has_text_matching expression="adata.strings_to_categoricals"/> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 6 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="transpose"/> </conditional> <assert_stdout> <has_text_matching expression="adata.transpose"/> <has_text_matching expression="11 × 500"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="var/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 7 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="add_annotation"/> <param name="var_obs" value="var"/> <param name="new_annot" value="var_add_annotation.tabular"/> </conditional> <assert_stdout> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="var/annot1"/> <has_h5_keys keys="var/annot2"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 8 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="add_annotation"/> <param name="var_obs" value="obs"/> <param name="new_annot" value="obs_add_annotation.tabular"/> </conditional> <assert_stdout> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="obs/annot1"/> <has_h5_keys keys="obs/annot2"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 9 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="filter"/> <param name="var_obs" value="var"/> <conditional name="filter"> <param name="filter" value="index"/> <conditional name="index"> <param name="format" value="text"/> <param name="text" value="Gata2,EKLF"/> </conditional> </conditional> </conditional> <assert_stdout> <has_text_matching expression="500 × 2"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 10 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="filter"/> <param name="var_obs" value="obs"/> <conditional name="filter"> <param name="filter" value="key"/> <param name="key" value="cell_type"/> <conditional name="filter_key"> <param name="type" value="text"/> <param name="filter" value="equal"/> <param name="value" value="progenitor"/> </conditional> </conditional> </conditional> <assert_stdout> <has_text_matching expression="260 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 11 --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="save_raw"/> </conditional> <assert_stdout> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 12 remove_keys --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="remove_keys"/> <param name="obs_keys" value="cell_type"/> </conditional> <assert_stdout> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 13 flag_genes --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="flag_genes"/> <repeat name="gene_flags"> <param name="startswith" value="Gata"/> <param name="col_name" value="Gata_TF"/> </repeat> <repeat name="gene_flags"> <param name="startswith" value="Gf"/> <param name="col_name" value="GF"/> </repeat> </conditional> <assert_stdout> <has_text_matching expression="500 × 11"/> </assert_stdout> <output name="anndata" ftype="h5ad"> <assert_contents> <has_h5_keys keys="var/Gata_TF"/> <has_h5_keys keys="var/GF"/> </assert_contents> </output> </test> <test expect_num_outputs="1"> <!-- test 14 split_on_obs --> <param name="input" value="krumsiek11.h5ad"/> <conditional name="manipulate"> <param name="function" value="split_on_obs"/> <param name="key" value="cell_type"/> </conditional> <output_collection name="output_h5ad_split" type="list"> <element name="cell_type_0"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </element> <element name="cell_type_1"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </element> <element name="cell_type_2"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </element> <element name="cell_type_3"> <assert_contents> <has_h5_keys keys="obs/cell_type"/> <has_h5_keys keys="uns/highlights"/> <has_h5_keys keys="uns/iroot"/> </assert_contents> </element> </output_collection> </test> </tests> <help><![CDATA[ **What it does** This tool takes a AnnData dataset, manipulates it and returns it. The possible manipulations are: - Concatenate along the observations axis (`concatenate method <https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.concatenate.html>`__) The `uns`, `varm` and `obsm` attributes are ignored. If you use `join='outer'` this fills 0s for sparse data when variables are absent in a batch. Use this with care. Dense data is filled with `NaN` - Makes the obs index unique by appending '1', '2', etc (`obs_names_make_unique method <https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.obs_names_make_unique.html>`__) The first occurance of a non-unique value is ignored. - Makes the var index unique by appending '1', '2', etc (`var_names_make_unique method <https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.var_names_make_unique.html>`__) The first occurance of a non-unique value is ignored. - Rename categories of annotation `key` in `obs`, `var` and `uns` (`rename_categories method <https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.rename_categories.html>`__) Besides calling `self.obs[key].cat.categories = categories` - similar for `var` - this also renames categories in unstructured annotation that uses the categorical annotation `key` - Remove keys from obs or var annotations Helps in cleaning up andata with many annotations. For example, helps in removing qc metrics calculated during the preprocesing or already existing cluster annotations. - Flag genes start with a pattern Useful for flagging the mitochoncdrial or ribosomal protein genes - Transform string annotations to categoricals (`strings_to_categoricals method <https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.strings_to_categoricals.html>`__) Only affects string annotations that lead to less categories than the total number of observations. - Transpose the data matrix, leaving observations and variables interchanged (`transpose method <https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.transpose.html>`__) Data matrix is transposed, observations and variables are interchanged. - Add annotation for variables or - Split the AnnData object into multiple AnnData objects based on the values of a given obs key For example, helps in splitting an anndata objects based on cluster annotation. This function generates a collection with number of elements equal to the number of categories in the input obs key. - Filter data variables or observations, by index or key - Freeze the current state into the 'raw' attribute @HELP@ ]]></help> <expand macro="citations"/> </tool>