comparison feature_help_modal.py @ 8:1aed7d47c5ec draft default tip

planemo upload for repository https://github.com/goeckslab/gleam commit 8112548ac44b7a4769093d76c722c8fcdeaaef54
author goeckslab
date Fri, 25 Jul 2025 19:02:32 +0000
parents a32ff7201629
children
comparison
equal deleted inserted replaced
7:f4cb41f458fd 8:1aed7d47c5ec
1 def get_feature_metrics_help_modal() -> str: 1 def get_feature_metrics_help_modal() -> str:
2 modal_html = """ 2 modal_html = """
3 <div id="featureMetricsHelpModal" class="modal"> 3 <div id="metricsHelpModal" class="modal">
4 <div class="modal-content"> 4 <div class="modal-content">
5 <span class="close-feature-metrics">&times;</span> 5 <span class="close">&times;</span>
6 <h2>Help Guide: Common Model Metrics</h2> 6 <h2>Help Guide: Common Model Metrics</h2>
7 <div class="metrics-guide" style="max-height:65vh;overflow-y:auto;font-size:1.04em;"> 7 <div class="metrics-guide">
8 <h3>1) General Metrics</h3>
9 <h4>Classification</h4>
10 <p><strong>Accuracy:</strong> The proportion of correct predictions among all predictions. It is calculated as (TP + TN) / (TP + TN + FP + FN). While intuitive, Accuracy can be misleading for imbalanced datasets where one class dominates. For example, in a dataset with 95% negative cases, a model predicting all negatives achieves 95% Accuracy but fails to identify positives.</p>
11 <p><strong>AUC (Area Under the Curve):</strong> Specifically, the Area Under the Receiver Operating Characteristic Curve (ROC-AUC) measures a model’s ability to distinguish between classes. It ranges from 0 to 1, where 1 indicates perfect separation and 0.5 suggests random guessing. ROC-AUC is robust for binary and multiclass problems but may be less informative for highly imbalanced datasets.</p>
12 <h4>Regression</h4>
13 <p><strong>R2 (Coefficient of Determination):</strong> Measures the proportion of variance in the dependent variable explained by the independent variables. It ranges from 0 to 1, with 1 indicating perfect prediction and 0 indicating no explanatory power. Negative values are possible if the model performs worse than a mean-based baseline. R2 is widely used but sensitive to outliers.</p>
14 <p><strong>RMSE (Root Mean Squared Error):</strong> The square root of the average squared differences between predicted and actual values. It penalizes larger errors more heavily and is expressed in the same units as the target variable, making it interpretable. Lower RMSE indicates better model performance.</p>
15 <p><strong>MAE (Mean Absolute Error):</strong> The average of absolute differences between predicted and actual values. It is less sensitive to outliers than RMSE and provides a straightforward measure of average error magnitude. Lower MAE is better.</p>
16 8
17 <h3>2) Precision, Recall & Specificity</h3> 9 <!-- Classification Metrics -->
18 <h4>Classification</h4> 10 <h3>1) Classification Metrics</h3>
19 <p><strong>Precision:</strong> The proportion of positive predictions that are correct, calculated as TP / (TP + FP). High Precision is crucial when false positives are costly, such as in spam email detection, where misclassifying legitimate emails as spam disrupts user experience.</p>
20 <p><strong>Recall (Sensitivity):</strong> The proportion of actual positives correctly predicted, calculated as TP / (TP + FN). High Recall is vital when missing positives is risky, such as in disease diagnosis, where failing to identify a sick patient could have severe consequences.</p>
21 <p><strong>Specificity:</strong> The true negative rate, calculated as TN / (TN + FP). It measures how well a model identifies negatives, making it valuable in medical testing to minimize false alarms (e.g., incorrectly diagnosing healthy patients as sick).</p>
22 11
23 <h3>3) Macro, Micro, and Weighted Averages</h3> 12 <p><strong>Accuracy:</strong>
24 <h4>Classification</h4> 13 The proportion of correct predictions over all predictions:<br>
25 <p><strong>Macro Precision / Recall / F1:</strong> Computes the metric for each class independently and averages them, treating all classes equally. This is ideal for balanced datasets or when all classes are equally important, such as in multiclass image classification with similar class frequencies.</p> 14 <code>(TP + TN) / (TP + TN + FP + FN)</code>.
26 <p><strong>Micro Precision / Recall / F1:</strong> Aggregates true positives (TP), false positives (FP), and false negatives (FN) across all classes before computing the metric. It provides a global perspective and is suitable for imbalanced datasets or multilabel problems, as it accounts for class frequency.</p> 15 <em>Use when</em> classes are balanced and you want a single easy‐to‐interpret number.</p>
27 <p><strong>Weighted Precision / Recall / F1:</strong> Averages the metric across classes, weighted by the number of true instances per class. This balances the importance of classes based on their frequency, making it useful for imbalanced datasets where larger classes should have more influence but smaller classes are still considered.</p>
28 16
29 <h3>4) Average Precision (PR-AUC Variants)</h3> 17 <p><strong>Precision:</strong>
30 <h4>Classification</h4> 18 The fraction of positive predictions that are actually positive:<br>
31 <p><strong>Average Precision:</strong> The Area Under the Precision-Recall Curve (PR-AUC) summarizes the trade-off between Precision and Recall. It is particularly useful for imbalanced datasets, where ROC-AUC may overestimate performance. Average Precision is computed by averaging Precision values at different Recall thresholds, providing a robust measure for ranking tasks or rare class detection.</p> 19 <code>TP / (TP + FP)</code>.
20 <em>Use when</em> false positives are costly (e.g. spam filter—better to miss some spam than flag good mail).</p>
32 21
33 <h3>5) ROC-AUC Variants</h3> 22 <p><strong>Recall (Sensitivity):</strong>
34 <h4>Classification</h4> 23 The fraction of actual positives correctly identified:<br>
35 <p><strong>ROC-AUC:</strong> The Area Under the Receiver Operating Characteristic Curve plots the true positive rate (Recall) against the false positive rate (1 - Specificity) at various thresholds. It quantifies the model’s ability to separate classes, with higher values indicating better performance.</p> 24 <code>TP / (TP + FN)</code>
36 <p><strong>Macro ROC-AUC:</strong> Averages the ROC-AUC scores across all classes, treating each class equally. This is suitable for balanced multiclass problems where all classes are of equal importance.</p> 25 <em>Use when</em> false negatives are costly (e.g. disease screening—don’t miss sick patients).</p>
37 <p><strong>Micro ROC-AUC:</strong> Computes a single ROC-AUC by aggregating predictions and true labels across all classes. It is effective for multiclass or multilabel problems with class imbalance, as it accounts for the overall prediction distribution.</p>
38 26
39 <h3>6) Confusion Matrix Stats (Per Class)</h3> 27 <p><strong>F1 Score:</strong>
40 <h4>Classification</h4> 28 The harmonic mean of Precision and Recall:<br>
41 <p><strong>True Positives (TP):</strong> The number of correct positive predictions for a given class.</p> 29 <code>2·(Precision·Recall)/(Precision+Recall)</code>
42 <p><strong>True Negatives (TN):</strong> The number of correct negative predictions for a given class.</p> 30 <em>Use when</em> you need a balance between Precision & Recall on an imbalanced dataset.</p>
43 <p><strong>False Positives (FP):</strong> The number of incorrect positive predictions for a given class (false alarms).</p>
44 <p><strong>False Negatives (FN):</strong> The number of incorrect negative predictions for a given class (missed detections). These stats are visualized in PyCaret’s confusion matrix plots, aiding class-wise performance analysis.</p>
45 31
46 <h3>7) Other Useful Metrics</h3> 32 <p><strong>ROC-AUC (Area Under ROC Curve):</strong>
47 <h4>Classification</h4> 33 Measures ability to distinguish classes across all thresholds.
48 <p><strong>Cohen’s Kappa:</strong> Measures the agreement between predicted and actual labels, adjusted for chance. It ranges from -1 to 1, where 1 indicates perfect agreement, 0 indicates chance-level agreement, and negative values suggest worse-than-chance performance. Kappa is useful for multiclass problems with imbalanced labels.</p> 34 Ranges from 0.5 (random) to 1 (perfect).
49 <p><strong>Matthews Correlation Coefficient (MCC):</strong> A balanced measure that considers TP, TN, FP, and FN, calculated as (TP * TN - FP * FN) / sqrt((TP + FP)(TP + FN)(TN + FP)(TN + FN)). It ranges from -1 to 1, with 1 being perfect prediction. MCC is particularly effective for imbalanced datasets due to its symmetry across classes.</p> 35 <em>Use when</em> you care about ranking positives above negatives.</p>
50 <h4>Regression</h4> 36
51 <p><strong>MSE (Mean Squared Error):</strong> The average of squared differences between predicted and actual values. It amplifies larger errors, making it sensitive to outliers. Lower MSE indicates better performance.</p> 37 <p><strong>PR-AUC (Area Under Precision-Recall Curve):</strong>
52 <p><strong>MAPE (Mean Absolute Percentage Error):</strong> The average of absolute percentage differences between predicted and actual values, calculated as (1/n) * Σ(|actual - predicted| / |actual|) * 100. It is useful when relative errors are important but can be unstable if actual values are near zero.</p> 38 Summarizes Precision vs. Recall trade-off.
39 More informative than ROC-AUC when positives are rare.
40 <em>Use when</em> dealing with highly imbalanced data.</p>
41
42 <p><strong>Log Loss:</strong>
43 Penalizes confident wrong predictions via negative log-likelihood.
44 Lower is better.
45 <em>Use when</em> you need well-calibrated probability estimates.</p>
46
47 <p><strong>Cohen’s Kappa:</strong>
48 Measures agreement between predictions and true labels accounting for chance.
49 1 is perfect, 0 is random.
50 <em>Use when</em> you want to factor out chance agreement.</p>
51
52 <hr>
53
54 <!-- Regression Metrics -->
55 <h3>2) Regression Metrics</h3>
56
57 <p><strong>R² (Coefficient of Determination):</strong>
58 Proportion of variance in the target explained by features:<br>
59 1 is perfect, 0 means no better than predicting the mean, negative is worse than mean.
60 <em>Use when</em> you want a normalized measure of fit.</p>
61
62 <p><strong>MAE (Mean Absolute Error):</strong>
63 Average absolute difference between predictions and actual values:<br>
64 <code>mean(|y_pred − y_true|)</code>
65 <em>Use when</em> you need an interpretable “average” error and want to downweight outliers.</p>
66
67 <p><strong>RMSE (Root Mean Squared Error):</strong>
68 Square root of the average squared errors:<br>
69 <code>√mean((y_pred − y_true)²)</code>.
70 Penalizes large errors more heavily.
71 <em>Use when</em> large deviations are especially undesirable.</p>
72
73 <p><strong>MSE (Mean Squared Error):</strong>
74 The average squared error:<br>
75 <code>mean((y_pred − y_true)²)</code>.
76 Similar to RMSE but in squared units; often used in optimization.</p>
77
78 <p><strong>RMSLE (Root Mean Squared Log Error):</strong>
79 <code>√mean((log(1+y_pred) − log(1+y_true))²)</code>.
80 Less sensitive to large differences when both true and predicted are large.
81 <em>Use when</em> target spans several orders of magnitude.</p>
82
83 <p><strong>MAPE (Mean Absolute Percentage Error):</strong>
84 <code>mean(|(y_true − y_pred)/y_true|)·100</code>.
85 Expresses error as a percentage.
86 <em>Use when</em> relative error matters—but avoid if y_true≈0.</p>
87
53 </div> 88 </div>
54 </div> 89 </div>
55 </div> 90 </div>
56 """ 91 """
92
57 modal_css = """ 93 modal_css = """
58 <style> 94 <style>
59 /* Modal Background & Content */ 95 .modal {
60 #featureMetricsHelpModal.modal {
61 display: none; 96 display: none;
62 position: fixed; 97 position: fixed;
63 z-index: 9999; 98 z-index: 1;
64 left: 0; top: 0; 99 left: 0;
65 width: 100%; height: 100%; 100 top: 0;
101 width: 100%;
102 height: 100%;
66 overflow: auto; 103 overflow: auto;
67 background-color: rgba(0,0,0,0.45); 104 background-color: rgba(0,0,0,0.4);
68 } 105 }
69 #featureMetricsHelpModal .modal-content { 106 .modal-content {
70 background-color: #fefefe; 107 background-color: #fefefe;
71 margin: 5% auto; 108 margin: 15% auto;
72 padding: 24px 28px 20px 28px; 109 padding: 20px;
73 border: 1.5px solid #17623b; 110 border: 1px solid #888;
74 width: 90%; 111 width: 80%;
75 max-width: 800px; 112 max-width: 800px;
76 border-radius: 18px;
77 box-shadow: 0 8px 32px rgba(23,98,59,0.20);
78 } 113 }
79 #featureMetricsHelpModal .close-feature-metrics { 114 .close {
80 color: #17623b; 115 color: #aaa;
81 float: right; 116 float: right;
82 font-size: 28px; 117 font-size: 28px;
83 font-weight: bold; 118 font-weight: bold;
119 }
120 .close:hover,
121 .close:focus {
122 color: black;
123 text-decoration: none;
84 cursor: pointer; 124 cursor: pointer;
85 transition: color 0.2s;
86 } 125 }
87 #featureMetricsHelpModal .close-feature-metrics:hover { 126 .metrics-guide h3 {
88 color: #21895e; 127 margin-top: 20px;
89 } 128 }
90 .metrics-guide h3 { margin-top: 20px; } 129 .metrics-guide p {
91 .metrics-guide h4 { margin-top: 12px; color: #17623b; } 130 margin: 5px 0;
92 .metrics-guide p { margin: 5px 0 10px 0; } 131 }
93 .metrics-guide ul { margin: 10px 0 10px 24px; } 132 .metrics-guide ul {
133 margin: 10px 0;
134 padding-left: 20px;
135 }
94 </style> 136 </style>
95 """ 137 """
96 modal_js = """ 138 modal_js = """
97 <script> 139 <script>
98 document.addEventListener("DOMContentLoaded", function() { 140 document.addEventListener("DOMContentLoaded", function() {
99 var modal = document.getElementById("featureMetricsHelpModal"); 141 var modal = document.getElementById("metricsHelpModal");
100 var openBtn = document.getElementById("openFeatureMetricsHelp"); 142 var openBtn = document.getElementById("openMetricsHelp");
101 var span = document.getElementsByClassName("close-feature-metrics")[0]; 143 var span = document.getElementsByClassName("close")[0];
102 if (openBtn && modal) { 144 if (openBtn && modal) {
103 openBtn.onclick = function() { 145 openBtn.onclick = function() {
104 modal.style.display = "block"; 146 modal.style.display = "block";
105 }; 147 };
106 } 148 }