2
|
1 <tool id="accuracy" name="accuracy" version="1.0.0">
|
|
2 <description>model creation and accuracy estimation</description>
|
|
3 <requirements>
|
|
4 <requirement type="package" version="6.0_76">r-caret</requirement>
|
|
5 </requirements>
|
|
6 <command detect_errors="aggressive">
|
|
7 Rscript '$__tool_directory__/accuracy.R' '$input' '$p' '$output1' '$output2'
|
|
8 </command>
|
|
9 <inputs>
|
|
10 <param format="csv" type="data" name="input" value="" label="Input dataset" help="
|
|
11 e.g. iris species table
|
|
12 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
|
|
13 5.1,3.5,1.4,0.2,Iris-setosa
|
|
14 4.9,3,1.4,0.2,Iris-setosa
|
|
15 4.7,3.2,1.3,0.2,Iris-setosa
|
|
16 4.6,3.1,1.5,0.2,Iris-setosa''"/>
|
|
17 <param name="p" type="integer" value="0.80" label="Select % of data to training and testing the models"/>
|
|
18 </inputs>
|
|
19 <outputs>
|
|
20 <data format="csv" name="output1" label="dataset_summary.csv" />
|
|
21 <data format="csv" name="output2" label="accuracy_summary.csv" />
|
|
22 </outputs>
|
|
23 <tests>
|
|
24 <test>
|
|
25 <param name="test">
|
|
26 <element name="test-data">
|
|
27 <collection type="data">
|
|
28 <element format="csv" name="input" label="test-data/input.csv"/>
|
|
29 </collection>
|
|
30 </element>
|
|
31 </param>
|
|
32 <output format="csv" name="fit" label="test-data/dataset_summary.csv"/>
|
|
33 <output format="csv" name="fit" label="test-data/accuracy_summary.csv"/>
|
|
34 </test>
|
|
35 </tests>
|
|
36 <help>
|
|
37 Tool allow us to build 5 different models to predict e.g. species from flower measurements.
|
|
38 In the end we can select the best model for further analysis.
|
|
39
|
|
40 Let’s evaluate 5 different algorithms:
|
|
41
|
|
42 **Linear Discriminant Analysis (LDA)**
|
|
43 **Classification and Regression Trees (CART).**
|
|
44 **k-Nearest Neighbors (kNN).**
|
|
45 **Support Vector Machines (SVM) with a linear kernel.**
|
|
46 **Random Forest (RF)**
|
|
47
|
|
48 This is a good mixture of simple linear (LDA), nonlinear (CART, kNN) and complex nonlinear methods (SVM, RF).
|
|
49 We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed
|
|
50 using exactly the same data splits. It ensures the results are directly comparable.
|
|
51
|
|
52 </help>
|
|
53 <citations>
|
|
54 <citation>https://CRAN.R-project.org/package=caret</citation>
|
|
55 </citations>
|
|
56 </tool>
|