comparison Marea/marea_cluster.xml @ 34:1a97d1537623 draft

Lot of bug fixes
author bimib
date Sat, 26 Oct 2019 07:49:31 -0400
parents abf0bfe01c78
children 94c51690d40c
comparison
equal deleted inserted replaced
33:abf0bfe01c78 34:1a97d1537623
1 <tool id="MaREA_cluester" name="Cluster Analysis" version="1.0.7"> 1 <tool id="MaREA_cluester" name="Cluster Analysis" version="1.0.8">
2 <description></description> 2 <description></description>
3 <macros> 3 <macros>
4 <import>marea_macros.xml</import> 4 <import>marea_macros.xml</import>
5 </macros> 5 </macros>
6 <requirements> 6 <requirements>
32 #end if 32 #end if
33 #end if 33 #end if
34 #if $data.clust_type == 'hierarchy': 34 #if $data.clust_type == 'hierarchy':
35 --k_min ${data.k_min} 35 --k_min ${data.k_min}
36 --k_max ${data.k_max} 36 --k_max ${data.k_max}
37 --silhouette ${data.silhouette}
37 #end if 38 #end if
38 ]]> 39 ]]>
39 </command> 40 </command>
40 <inputs> 41 <inputs>
41 <param name="input" argument="--input" type="data" format="tabular, csv, tsv" label="Input dataset" /> 42 <param name="input" argument="--input" type="data" format="tabular, csv, tsv" label="Input dataset" />
65 66
66 </when> 67 </when>
67 </conditional> 68 </conditional>
68 </when> 69 </when>
69 <when value="hierarchy"> 70 <when value="hierarchy">
70 <param name="k_min" argument="--k_min" type="integer" min="2" max="99" value="3" label="Min number of clusters (k) to be tested" /> 71 <param name="k_min" argument="--k_min" type="integer" min="2" max="20" value="2" label="Min number of clusters (k) to be tested" />
71 <param name="k_max" argument="--k_max" type="integer" min="3" max="99" value="5" label="Max number of clusters (k) to be tested" /> 72 <param name="k_max" argument="--k_max" type="integer" min="3" max="20" value="3" label="Max number of clusters (k) to be tested" />
73 <param name="silhouette" argument="--silhouette" type="boolean" value="true" label="Draw the Silhouette plot from k-min to k-max"/>
72 </when> 74 </when>
73 </conditional> 75 </conditional>
74 </inputs> 76 </inputs>
75 77
76 <outputs> 78 <outputs>
85 <![CDATA[ 87 <![CDATA[
86 88
87 What it does 89 What it does
88 ------------- 90 -------------
89 91
92 The tool performs cluster analysis of any dataset, according to most used algorithms: K-means, agglomerative
93 clustering and DBSCAN (Density Based Spatial Clustering of Applications with Noise).
94
95 Accepted files are:
96 - Tabular files in which rows indicate different variables and columns different observations. The first row reports the observations’ labels.
97
98
99 Example of input dataset:
100 -------------------------
101
102 +----------+----------+----------+
103 |TCGAA62670|TCGAA62671|TCGAA62672|
104 +==========+==========+==========+
105 | 0.523167 | 0.371355 | 0.925661 |
106 +----------+----------+----------+
107 | 0.568765 | 0.765567 | 0.456789 |
108 +----------+----------+----------+
109 | 0.876545 | 0.768933 | 0.987654 |
110 +----------+----------+----------+
111 | 0.456788 | 0.876543 | 0.876542 |
112 +----------+----------+----------+
113 | 0.876543 | 0.786543 | 0.897654 |
114 +----------+----------+----------+
115
116 .
117
118
119 Options:
120 --------
121
122 The following clustering types can be chosen:
123 - K-means. This option requires the number of clusters (k) to be set. Different values of k can be tested.
124 - Agglomerative clustering. Different values of k can be set, to cut the resulting dendrogram.
125 - DBSCAN. The DBSCAN method chooses the number of clusters based on parameters that define when a region is to be considered dense. Custom parameters may be used, namely the maximum distance between two samples for one to be considered as in the neighborhood of the other and the number of samples in a neighborhood for a point to be considered as a core point.
126
127 The tool generates:
128 - a tab-separated file: reporting the affiliation of each observation to a cluster. In case different numbers of clusters have been tested, the best cluster assignment is reported according to maximum average silhouette score. If desired, the elbow plot is generated, as well as silhouette plot for each k.
129 - a list of items, including: 1) the cluster assignment for each tested number of clusters 2) the dendrogram in case of agglomerative clustering 3) elbow and silhouete plots in case of k-means clustering.
130 - a log file (.txt).
131
132
133 .. class:: infomark
134
135 **TIP**: This tool has been conceived to cluster gene expression data, by using the RAS scores computed by the tool
136 `MaREA`_ as feature
137
138 .. class:: infomark
139
140 **TIP**: If your data is not TAB delimited, use `Convert delimiters to TAB`_.
141
142
143
144 @REFERENCE@
145
146 .. _MaREA: https://www.biorxiv.org/content/early/2018/01/16/248724
147 .. _Convert delimiters to TAB: https://usegalaxy.org/?tool_id=Convert+characters1&version=1.0.0&__identifer=6t22teyofhj
148
90 149
91 ]]> 150 ]]>
92 </help> 151 </help>
93 <expand macro="citations" /> 152 <expand macro="citations" />
94 </tool> 153 </tool>