Mercurial > repos > testtool > accuracy

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/accuracy.xml	Fri Oct 13 10:10:32 2017 -0400
@@ -0,0 +1,56 @@
+<tool id="accuracy" name="accuracy" version="1.0.0">
+    <description>model creation and accuracy estimation</description>
+    <requirements>
+        <requirement type="package" version="6.0_76">r-caret</requirement>
+    </requirements>
+    <command detect_errors="aggressive">
+        Rscript '$__tool_directory__/accuracy.R' '$input' '$p' '$output1' '$output2'
+    </command>
+<inputs>
+        <param format="csv" type="data" name="input"  value="" label="Input dataset" help="
+   e.g. iris species table
+Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
+5.1,3.5,1.4,0.2,Iris-setosa
+4.9,3,1.4,0.2,Iris-setosa
+4.7,3.2,1.3,0.2,Iris-setosa
+4.6,3.1,1.5,0.2,Iris-setosa''"/>
+ <param name="p" type="integer" value="0.80" label="Select % of data to training and testing the models"/>
+ </inputs>
+    <outputs>
+        <data format="csv" name="output1" label="dataset_summary.csv" />
+        <data format="csv" name="output2" label="accuracy_summary.csv" />
+    </outputs>
+ <tests>
+    <test>
+      <param name="test">
+      <element name="test-data">
+          <collection type="data">
+                <element format="csv" name="input" label="test-data/input.csv"/>
+          </collection>
+        </element>
+        </param>
+        <output format="csv"  name="fit" label="test-data/dataset_summary.csv"/>
+        <output format="csv"  name="fit" label="test-data/accuracy_summary.csv"/>
+        </test>
+    </tests>
+  <help>
+Tool allow us to build 5 different models to predict e.g. species from flower measurements.
+In the end we can select the best model for further analysis.
+
+Let’s evaluate 5 different algorithms:
+
+**Linear Discriminant Analysis (LDA)**
+**Classification and Regression Trees (CART).**
+**k-Nearest Neighbors (kNN).**
+**Support Vector Machines (SVM) with a linear kernel.**
+**Random Forest (RF)**
+
+This is a good mixture of simple linear (LDA), nonlinear (CART, kNN) and complex nonlinear methods (SVM, RF).
+We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed
+using exactly the same data splits. It ensures the results are directly comparable.
+
+</help>
+<citations>
+ <citation>https://CRAN.R-project.org/package=caret</citation>
+</citations>
+</tool>