Mercurial > repos > iuc > datamash_ops
view datamash-ops.xml @ 4:746e8e4bf929 draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/datamash commit 384736acb5d2ae31d796c53031fad0e5da5424e3
author | iuc |
---|---|
date | Fri, 01 Jul 2022 16:18:07 +0000 |
parents | 419027d822d6 |
children | 4c07ddedc198 |
line wrap: on
line source
<tool id="datamash_ops" name="Datamash" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> <description>(operations on tabular data)</description> <macros> <import>macros.xml</import> </macros> <expand macro="requirements" /> <expand macro="stdio" /> <command> <![CDATA[ datamash $header_in $header_out $need_sort $print_full_line $ignore_case $narm @FIELD_SEPARATOR@ #if str($grouping) != '' --group '$grouping' #end if #for $oper in $operations ${oper.op_name} ${oper.op_column} #end for < $in_file > '$out_file' ]]> </command> <expand macro="inputs_outputs"> <param argument="--group" name="grouping" type="text" label="Group by fields" help="Group consecutive rows with equal values in the chosen fields. If no columns are specified, each operation is performed in the entire input file. Comma separated list of column indices, e.g. 1,5"> <sanitizer invalid_char=""> <valid initial="string.digits"> <add value="," /> </valid> <mapping initial="none"> <add source=" " target=""/> </mapping> </sanitizer> <validator message="Invalid value in field. Allowed is a comma separated list of integer values or the empty string" type="regex">(^$)|(^\s*\d+\s*(,\s*\d+\s*)*$)</validator> </param> <param argument="--sort" name="need_sort" type="boolean" truevalue="--sort" falsevalue="" label="Sort input" help="Input file must be sorted by the grouping columns. Enable this option to automatically sort the input."/> <param argument="--header-in" type="boolean" truevalue="--header-in" falsevalue="" label="Input file has a header line" /> <param argument="--header-out" type="boolean" truevalue="--header-out" falsevalue="" label="Print header line" /> <param argument="--full" name="print_full_line" type="boolean" truevalue="--full" falsevalue="" label="Print all fields from input file" /> <param argument="--ignore-case" type="boolean" truevalue="--ignore-case" falsevalue="" label="Ignore case when grouping" /> <param argument="--narm" type="boolean" truevalue="--narm" falsevalue="" label="Skip NA or NaN values" /> <repeat name="operations" default="1" min="1" title="Operation to perform on each group"> <param name="op_name" type="select" label="Type"> <option value="count">count</option> <option value="sum">sum</option> <option value="min">minimum</option> <option value="max">maximum</option> <option value="absmin">Absolute minimum</option> <option value="absmax">Absolute maximum</option> <option value="mean">Mean</option> <option value="pstdev">Population Standard deviantion</option> <option value="sstdev">Sample Standard deviantion</option> <option value="median">Median</option> <option value="q1">1st quartile</option> <option value="q3">3rd quartile</option> <option value="iqr">Inter-quartile range</option> <option value="mad">Median Absolute Deviation</option> <option value="pvar">Variance (population)</option> <option value="svar">Variance (sample)</option> <option value="sskew">Skewness (sample)</option> <option value="pskew">Skewness (population)</option> <option value="skurt">Kurtosis (sample)</option> <option value="pkurt">Kurtosis (population)</option> <option value="jarque">Jarque-Bera Normality test</option> <option value="dpo">D Agostino-Pearson Omnibus Normality Test</option> <option value="mode">Mode</option> <option value="antimode">Anti-Mode</option> <option value="rand">One random value from the group</option> <option value="unique">Combine all unique values</option> <option value="collapse">Combine all values</option> <option value="countunique">Count Unique values</option> </param> <param name="op_column" data_ref="in_file" label="On column" type="data_column" /> </repeat> </expand> <tests> <test> <param name="in_file" value="group_compute_input.txt" ftype="tabular" /> <param name="grouping" value="2" /> <param name="header_in" value="true" /> <param name="header_out" value="true" /> <param name="need_sort" value="true" /> <param name="print_full_line" value="false" /> <param name="ignore_case" value="false" /> <repeat name="operations"> <param name="op_name" value="sum" /> <param name="op_column" value="3" /> </repeat> <output file="group_compute_output.txt" name="out_file" ftype="tabular" /> </test> <test> <param name="in_file" value="group_compute_input.txt" ftype="tsv" /> <param name="grouping" value="2" /> <param name="header_in" value="true" /> <param name="header_out" value="true" /> <param name="need_sort" value="true" /> <param name="print_full_line" value="false" /> <param name="ignore_case" value="false" /> <repeat name="operations"> <param name="op_name" value="sum" /> <param name="op_column" value="3" /> </repeat> <output file="group_compute_output.txt" name="out_file" ftype="tsv" /> </test> <test> <param name="in_file" value="group_compute_input.csv" ftype="csv" /> <param name="grouping" value="2" /> <param name="header_in" value="true" /> <param name="header_out" value="true" /> <param name="need_sort" value="true" /> <param name="print_full_line" value="false" /> <param name="ignore_case" value="false" /> <repeat name="operations"> <param name="op_name" value="sum" /> <param name="op_column" value="3" /> </repeat> <output name="out_file" ftype="csv"> <assert_contents> <has_n_lines n="7"/> <has_line line="Arts,1310"/> </assert_contents> </output> </test> <test><!-- test with a file containing NA and NaN values and the -narm parameter " --> <param name="in_file" value="na_values_input.tsv" ftype="tabular" /> <param name="grouping" value="2" /> <param name="header_in" value="true" /> <param name="print_full_line" value="false" /> <param name="need_sort" value="true" /> <param name="narm" value="true" /> <repeat name="operations"> <param name="op_name" value="mean" /> <param name="op_column" value="3" /> </repeat> <output name="out_file" ftype="tabular"> <assert_contents> <has_n_lines n="2"/> <has_line_matching expression="DE\t173.5"/> <has_line_matching expression="NL\t177.5"/> </assert_contents> </output> </test> </tests> <help> <![CDATA[ @HELP_HEADER@ **Syntax** This tools performs common operations (such as summing, counting, mean, standard-deviation) on input file, based on tabular data. The tool can also optionaly group the input based on a given field. ----- **Example 1** - Find the average score in statistics course of college students, grouped by their college major. The input file has three fields (Name,Major,Score) and a header line:: Name Major Score Bryan Arts 68 Gabriel Health-Medicine 100 Isaiah Arts 80 Tysza Business 92 Zackery Engineering 54 ... ... - Grouping the input by the second column (*Major*), sorting the input, and performing operations **mean** and **sample standard deviation** on the third column (*Score*), gives:: GroupBy(Major) mean(Score) sstdev(Score) Arts 68.9474 10.4215 Business 87.3636 5.18214 Engineering 66.5385 19.8814 Health-Medicine 90.6154 9.22441 Life-Sciences 55.3333 20.606 Social-Sciences 60.2667 17.2273 Note that input needs sorting here, since the column used for grouping (*Major*) is not sorted. This sample file is available at http://www.gnu.org/software/datamash . **Example 2** - Using the UCSC RefSeq Human Gene Track, available at: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz - List the number and identifiers of isoforms per gene. The gene identifier is in column 13, the isoform/transcript identifier is in column 2. Grouping by column 13 and performing **count** and **Combine all values** on column 2, gives:: GroupBy(field-13) count(field-2) collapse(field-2) A1BG 1 NM_130786 A1BG-AS1 1 NR_015380 A1CF 6 NM_001198818,NM_001198819,NM_001198820,NM_014576,NM_138932,NM_138933 A2M 1 NM_000014 A2M-AS1 1 NR_026971 A2ML1 2 NM_001282424,NM_144670 ... - Count how many transcripts are listed for each chromosome and strand. Chromosome is on column 3, Strand is in column 4. Transcript identifiers are in column 2. Grouping by columns **3,4** and performing operation **count** on column 2, gives:: GroupBy(field-3) GroupBy(field-4) count(field-2) chr1 + 2456 chr1 - 2431 chr2 + 1599 chr2 - 1419 chr3 + 1287 chr3 - 1249 ... @HELP_FOOTER@ ]]> </help> </tool>