Implemented Performance Measures
This page shows the performance measures available for the different types of learning problems as well as general performance measures in alphabetical order. (See also the documentation about measures and makeMeasure for available measures and their properties.)
If you find that a measure is missing, you can either open an issue or try to implement a measure yourself.
Column Minim. indicates if the measure is minimized during, e.g., tuning or feature selection. Best and Worst show the best and worst values the performance measure can attain. For classification, column Multi indicates if a measure is suitable for multi-class problems. If not, the measure can only be used for binary classification problems.
The next six columns refer to information required to calculate the performance measure.
- Pred.: The Prediction object.
- Truth: The true values of the response variable(s) (for supervised learning).
- Probs: The predicted probabilities (might be needed for classification).
- Model: The WrappedModel (e.g., for calculating the training time).
- Task: The Task (relevant for cost-sensitive classification).
- Feats: The predicted data (relevant for clustering).
Aggr. shows the default aggregation method tied to the measure.
Classification
ID / Name | Minim. | Best | Worst | Multi | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|---|
acc Accuracy |
1 | 0 | X | X | X | test.mean | ||||||
auc Area under the curve |
1 | 0 | X | X | X | test.mean | ||||||
bac Balanced accuracy |
1 | 0 | X | X | test.mean | Mean of true positive rate and true negative rate. | ||||||
ber Balanced error rate |
X | 0 | 1 | X | X | X | test.mean | Mean of misclassification error rates on all individual classes. | ||||
brier Brier score |
X | 0 | 1 | X | X | X | test.mean | |||||
brier.scaled Brier scaled |
1 | 0 | X | X | X | test.mean | Brier score scaled to [0,1], see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575184/. | |||||
f1 F1 measure |
1 | 0 | X | X | test.mean | |||||||
fdr False discovery rate |
X | 0 | 1 | X | X | test.mean | ||||||
fn False negatives |
X | 0 | Inf | X | X | test.mean | Also called misses. | |||||
fnr False negative rate |
X | 0 | 1 | X | X | test.mean | ||||||
fp False positives |
X | 0 | Inf | X | X | test.mean | Also called false alarms. | |||||
fpr False positive rate |
X | 0 | 1 | X | X | test.mean | Also called false alarm rate or fall-out. | |||||
gmean G-mean |
1 | 0 | X | X | test.mean | Geometric mean of recall and specificity. | ||||||
gpr Geometric mean of precision and recall |
1 | 0 | X | X | test.mean | |||||||
logloss Logarithmic loss |
X | 0 | Inf | X | X | X | test.mean | Defined as: -mean(log(p_i)), where p_i is the predicted probability of the true class of observation i. Inspired by https://www.kaggle.com/wiki/MultiClassLogLoss. | ||||
mcc Matthews correlation coefficient |
1 | -1 | X | X | test.mean | |||||||
mmce Mean misclassification error |
X | 0 | 1 | X | X | X | test.mean | |||||
multiclass.au1p Weighted average 1 vs. 1 multiclass AUC |
1 | 0.5 | X | X | X | X | test.mean | Computes AUC of c(c - 1) binary classifiers while considering the a priori distribution of the classes. See Ferri et. al paper: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf | ||||
multiclass.au1u Average 1 vs. 1 multiclass AUC |
1 | 0.5 | X | X | X | X | test.mean | Computes AUC of c(c - 1) binary classifiers (all possible pairwise combinations) while considering uniform distribution of the classes. See Ferri et. al: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf | ||||
multiclass.aunp Weighted average multiclass AUC |
1 | 0.5 | X | X | X | X | test.mean | Computes the AUC treating a c-dimensional classifier as c two-dimensional classifiers, taking into account the prior probability of each class (p(j)). See Ferri et. al: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf | ||||
multiclass.aunu Average multiclass AUC |
1 | 0.5 | X | X | X | X | test.mean | Computes the AUC treating a c-dimensional classifier as c two-dimensional classifiers, where classes are assumed to have uniform distribution, in order to have a measure which is independent of class distribution change. See Ferri et. al: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf | ||||
multiclass.brier Multiclass Brier score |
X | 0 | 2 | X | X | X | X | test.mean | Defined as: (1/n) sum_i sum_j (y_ij - p_ij)^2, where y_ij = 1 if observation i has class j (else 0), and p_ij is the predicted probablity of observation i for class j. From http://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf | |||
npv Negative predictive value |
1 | 0 | X | X | test.mean | |||||||
ppv Positive predictive value |
1 | 0 | X | X | test.mean | Also called precision. | ||||||
tn True negatives |
Inf | 0 | X | X | test.mean | Also called correct rejections. | ||||||
tnr True negative rate |
1 | 0 | X | X | test.mean | Also called specificity. | ||||||
tp True positives |
Inf | 0 | X | X | test.mean | |||||||
tpr True positive rate |
1 | 0 | X | X | test.mean | Also called hit rate or recall. |
Regression
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
adjrsq Adjusted coefficient of determination |
1 | 0 | X | X | test.mean | Adjusted R-squared is only defined for normal linear regression | |||||
expvar Explained variance |
1 | 0 | X | X | test.mean | Similar to measaure rsq (R-squared). Defined as explained_sum_of_squares / total_sum_of_squares. | |||||
mae Mean of absolute errors |
X | 0 | Inf | X | X | test.mean | |||||
medae Median of absolute errors |
X | 0 | Inf | X | X | test.mean | |||||
medse Median of squared errors |
X | 0 | Inf | X | X | test.mean | |||||
mse Mean of squared errors |
X | 0 | Inf | X | X | test.mean | |||||
rmse Root mean square error |
X | 0 | Inf | X | X | test.rmse | The RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean . |
||||
rsq Coefficient of determination |
1 | -Inf | X | X | test.mean | Also called R-squared, which is 1 - residual_sum_of_squares / total_sum_of_squares. | |||||
sae Sum of absolute errors |
X | 0 | Inf | X | X | test.mean | |||||
sse Sum of squared errors |
X | 0 | Inf | X | X | test.mean |
Survival analysis
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
cindex Concordance index |
1 | 0 | X | X | test.mean |
Cluster analysis
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
db Davies-Bouldin cluster separation measure |
X | 0 | Inf | X | X | test.mean | See ?clusterSim::index.DB . |
||||
dunn Dunn index |
Inf | 0 | X | X | test.mean | See ?clValid::dunn . |
|||||
G1 Calinski-Harabasz pseudo F statistic |
Inf | 0 | X | X | test.mean | See ?clusterSim::index.G1 . |
|||||
G2 Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic |
Inf | 0 | X | X | test.mean | See ?clusterSim::index.G2 . |
|||||
silhouette Rousseeuw's silhouette internal cluster quality index |
Inf | 0 | X | X | test.mean | See ?clusterSim::index.S . |
Cost-sensitive classification
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
mcp Misclassification penalty |
X | 0 | Inf | X | X | test.mean | Average difference between costs of oracle and model prediction. | ||||
meancosts Mean costs of the predicted choices |
X | 0 | Inf | X | X | test.mean |
Note that in case of ordinary misclassification costs you can also generate performance measures from cost matrices by function makeCostMeasure. For details see the tutorial page on cost-sensitive classification and also the page on custom performance measures.
Multilabel classification
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
multilabel.acc Accuracy (multilabel) |
1 | 0 | X | X | test.mean | Mean of proportion of correctly predicted labels with respect to the total number of labels for each instance, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf | |||||
multilabel.f1 F1 measure (multilabel) |
1 | 0 | X | X | test.mean | Harmonic mean of precision and recall on a per instance basis (Micro-F1), following the definition by Montanes et al.: http://www.sciencedirect.com/science/article/pii/S0031320313004019 | |||||
multilabel.hamloss Hamming loss |
X | 0 | 1 | X | X | test.mean | Proportion of labels whose relevance is incorrectly predicted, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf | ||||
multilabel.ppv Postive predicive value (multilabel) |
1 | 0 | X | X | test.mean | Also called precision. Mean of ratio of truly predicted labels for each instance, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf | |||||
multilabel.subset01 Subset-0-1 loss |
X | 0 | 1 | X | X | test.mean | Proportion of observations where the complete multilabel set (all 0-1-lables) is not correctly predicted, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf | ||||
multilabel.tpr TPR (multilabel) |
1 | 0 | X | X | test.mean | Also called recall. Mean of proportion of predicted labels which are relevant for each instance, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf |
General performance measures
ID / Name | Minim. | Best | Worst | Pred. | Truth | Probs | Model | Task | Feats | Aggr. | Note |
---|---|---|---|---|---|---|---|---|---|---|---|
featperc Percentage of original features used for model |
X | 0 | 1 | X | X | test.mean | Useful for feature selection. | ||||
timeboth timetrain + timepredict |
X | 0 | Inf | X | X | test.mean | |||||
timepredict Time of predicting test set |
X | 0 | Inf | X | test.mean | ||||||
timetrain Time of fitting the model |
X | 0 | Inf | X | test.mean |