Mercurial > repos > testtool > accuracy
changeset 2:6169ba9ed42a draft
Uploaded
author | testtool |
---|---|
date | Fri, 13 Oct 2017 10:10:32 -0400 |
parents | a3a8499f0f95 |
children | a5a5716e0317 |
files | accuracy.xml |
diffstat | 1 files changed, 56 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/accuracy.xml Fri Oct 13 10:10:32 2017 -0400 @@ -0,0 +1,56 @@ +<tool id="accuracy" name="accuracy" version="1.0.0"> + <description>model creation and accuracy estimation</description> + <requirements> + <requirement type="package" version="6.0_76">r-caret</requirement> + </requirements> + <command detect_errors="aggressive"> + Rscript '$__tool_directory__/accuracy.R' '$input' '$p' '$output1' '$output2' + </command> +<inputs> + <param format="csv" type="data" name="input" value="" label="Input dataset" help=" + e.g. iris species table +Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species +5.1,3.5,1.4,0.2,Iris-setosa +4.9,3,1.4,0.2,Iris-setosa +4.7,3.2,1.3,0.2,Iris-setosa +4.6,3.1,1.5,0.2,Iris-setosa''"/> + <param name="p" type="integer" value="0.80" label="Select % of data to training and testing the models"/> + </inputs> + <outputs> + <data format="csv" name="output1" label="dataset_summary.csv" /> + <data format="csv" name="output2" label="accuracy_summary.csv" /> + </outputs> + <tests> + <test> + <param name="test"> + <element name="test-data"> + <collection type="data"> + <element format="csv" name="input" label="test-data/input.csv"/> + </collection> + </element> + </param> + <output format="csv" name="fit" label="test-data/dataset_summary.csv"/> + <output format="csv" name="fit" label="test-data/accuracy_summary.csv"/> + </test> + </tests> + <help> +Tool allow us to build 5 different models to predict e.g. species from flower measurements. +In the end we can select the best model for further analysis. + +Let’s evaluate 5 different algorithms: + +**Linear Discriminant Analysis (LDA)** +**Classification and Regression Trees (CART).** +**k-Nearest Neighbors (kNN).** +**Support Vector Machines (SVM) with a linear kernel.** +**Random Forest (RF)** + +This is a good mixture of simple linear (LDA), nonlinear (CART, kNN) and complex nonlinear methods (SVM, RF). +We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed +using exactly the same data splits. It ensures the results are directly comparable. + +</help> +<citations> + <citation>https://CRAN.R-project.org/package=caret</citation> +</citations> +</tool>