'
+ '
1) General Metrics (Regression and Classification)
'
+ '
Loss (Regression & Classification): '
+ 'Measures the difference between predicted and actual values, '
+ 'optimized during training. Lower is better. '
+ 'For regression, this is often Mean Squared Error (MSE) or '
+ 'Mean Absolute Error (MAE). For classification, it\'s typically '
+ 'cross-entropy or log loss.
'
+ '
2) Regression Metrics
'
+ '
Mean Absolute Error (MAE): '
+ 'Average of absolute differences between predicted and actual values, '
+ 'in the same units as the target. Use for interpretable error measurement '
+ 'when all errors are equally important. Less sensitive to outliers than MSE.
'
+ '
Mean Squared Error (MSE): '
+ 'Average of squared differences between predicted and actual values. '
+ 'Penalizes larger errors more heavily, useful when large deviations are critical. '
+ 'Often used as the loss function in regression.
'
+ '
Root Mean Squared Error (RMSE): '
+ 'Square root of MSE, in the same units as the target. '
+ 'Balances interpretability and sensitivity to large errors. '
+ 'Widely used for regression evaluation.
'
+ '
Mean Absolute Percentage Error (MAPE): '
+ 'Average absolute error as a percentage of actual values. '
+ 'Scale-independent, ideal for comparing relative errors across datasets. '
+ 'Avoid when actual values are near zero.
'
+ '
Root Mean Squared Percentage Error (RMSPE): '
+ 'Square root of mean squared percentage error. Scale-independent, '
+ 'penalizes larger relative errors more than MAPE. Use for forecasting '
+ 'or when relative accuracy matters.
'
+ '
R² Score: Proportion of variance in the target '
+ 'explained by the model. Ranges from negative infinity to 1 (perfect prediction). '
+ 'Use to assess model fit; negative values indicate poor performance '
+ 'compared to predicting the mean.
'
+ '
3) Classification Metrics
'
+ '
Accuracy: Proportion of correct predictions '
+ 'among all predictions. Simple but misleading for imbalanced datasets, '
+ 'where high accuracy may hide poor performance on minority classes.
'
+ '
Micro Accuracy: Sums true positives and true negatives '
+ 'across all classes before computing accuracy. Suitable for multiclass or '
+ 'multilabel problems with imbalanced data.
'
+ '
Token Accuracy: Measures how often predicted tokens '
+ '(e.g., in sequences) match true tokens. Common in NLP tasks like text generation '
+ 'or token classification.
'
+ '
Precision: Proportion of positive predictions that are '
+ 'correct (TP / (TP + FP)). Use when false positives are costly, e.g., spam detection.
'
+ '
Recall (Sensitivity): Proportion of actual positives '
+ 'correctly predicted (TP / (TP + FN)). Use when missing positives is risky, '
+ 'e.g., disease detection.
'
+ '
Specificity: True negative rate (TN / (TN + FP)). '
+ 'Measures ability to identify negatives. Useful in medical testing to avoid '
+ 'false alarms.
'
+ '
4) Classification: Macro, Micro, and Weighted Averages
'
+ '
Macro Precision / Recall / F1: Averages the metric '
+ 'across all classes, treating each equally. Best for balanced datasets where '
+ 'all classes are equally important.
'
+ '
Micro Precision / Recall / F1: Aggregates true positives, '
+ 'false positives, and false negatives across all classes before computing. '
+ 'Ideal for imbalanced or multilabel classification.
'
+ '
Weighted Precision / Recall / F1: Averages metrics '
+ 'across classes, weighted by the number of true instances per class. Balances '
+ 'class importance based on frequency.
'
+ '
5) Classification: Average Precision (PR-AUC Variants)
'
+ '
Average Precision Macro: Precision-Recall AUC averaged '
+ 'equally across classes. Use for balanced multiclass problems.
'
+ '
Average Precision Micro: Global Precision-Recall AUC '
+ 'using all instances. Best for imbalanced or multilabel classification.
'
+ '
Average Precision Samples: Precision-Recall AUC averaged '
+ 'across individual samples. Ideal for multilabel tasks where samples have multiple '
+ 'labels.
'
+ '
6) Classification: ROC-AUC Variants
'
+ '
ROC-AUC: Measures ability to distinguish between classes. '
+ 'AUC = 1 is perfect; 0.5 is random guessing. Use for binary classification.
'
+ '
Macro ROC-AUC: Averages AUC across all classes equally. '
+ 'Suitable for balanced multiclass problems.
'
+ '
Micro ROC-AUC: Computes AUC from aggregated predictions '
+ 'across all classes. Useful for imbalanced or multilabel settings.
'
+ '
7) Classification: Confusion Matrix Stats (Per Class)
'
+ '
True Positives / Negatives (TP / TN): Correct predictions '
+ 'for positives and negatives, respectively.
'
+ '
False Positives / Negatives (FP / FN): Incorrect predictions '
+ '— false alarms and missed detections.
'
+ '
8) Classification: Ranking Metrics
'
+ '
Hits at K: Measures whether the true label is among the '
+ 'top-K predictions. Common in recommendation systems and retrieval tasks.
'
+ '
9) Other Metrics (Classification)
'
+ '
Cohen\'s Kappa: Measures agreement between predicted and '
+ 'actual labels, adjusted for chance. Useful for multiclass classification with '
+ 'imbalanced data.
'
+ '
Matthews Correlation Coefficient (MCC): Balanced measure '
+ 'using TP, TN, FP, and FN. Effective for imbalanced datasets.
'
+ '
10) Metric Recommendations
'
+ '
'
+ ' - Regression: Use RMSE or '
+ 'MAE for general evaluation, MAPE for relative '
+ 'errors, and R² to assess model fit. Use MSE or '
+ 'RMSPE when large errors are critical.
'
+ ' - Classification (Balanced Data): Use Accuracy '
+ 'and F1 for overall performance.
'
+ ' - Classification (Imbalanced Data): Use Precision, '
+ 'Recall, and ROC-AUC to focus on minority class '
+ 'performance.
'
+ ' - Multilabel or Imbalanced Classification: Use '
+ 'Micro Precision/Recall/F1 or Micro ROC-AUC.
'
+ ' - Balanced Multiclass: Use Macro Precision/Recall/F1 '
+ 'or Macro ROC-AUC.
'
+ ' - Class Frequency Matters: Use Weighted Precision/Recall/F1 '
+ 'to account for class imbalance.
'
+ ' - Recommendation/Ranking: Use Hits at K for retrieval tasks.
'
+ ' - Detailed Analysis: Use Confusion Matrix stats '
+ 'for class-wise performance in classification.
'
+ '
'
+ '