Implemented Performance Measures

This page shows the performance measures available for the different types of learning problems as well as general performance measures in alphabetical order. (See also the documentation about measures and makeMeasure for available measures and their properties.)

If you find that a measure is missing, you can either open an issue or try to implement a measure yourself.

Column Minim. indicates if the measure is minimized during, e.g., tuning or feature selection. Best and Worst show the best and worst values the performance measure can attain. For classification, column Multi indicates if a measure is suitable for multi-class problems. If not, the measure can only be used for binary classification problems.

The next six columns refer to information required to calculate the performance measure.

Aggr. shows the default aggregation method tied to the measure.

Classification

ID / Name Minim. Best Worst Multi Pred. Truth Probs Model Task Feats Aggr. Note
acc
Accuracy
1 0 X X X test.mean
auc
Area under the curve
1 0 X X X test.mean
bac
Balanced accuracy
1 0 X X test.mean Mean of true positive rate and true negative rate.
ber
Balanced error rate
X 0 1 X X X test.mean Mean of misclassification error rates on all individual classes.
brier
Brier score
X 0 1 X X X test.mean
brier.scaled
Brier scaled
1 0 X X X test.mean Brier score scaled to [0,1], see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575184/.
f1
F1 measure
1 0 X X test.mean
fdr
False discovery rate
X 0 1 X X test.mean
fn
False negatives
X 0 Inf X X test.mean Also called misses.
fnr
False negative rate
X 0 1 X X test.mean
fp
False positives
X 0 Inf X X test.mean Also called false alarms.
fpr
False positive rate
X 0 1 X X test.mean Also called false alarm rate or fall-out.
gmean
G-mean
1 0 X X test.mean Geometric mean of recall and specificity.
gpr
Geometric mean of precision and recall
1 0 X X test.mean
logloss
Logarithmic loss
X 0 Inf X X X test.mean Defined as: -mean(log(p_i)), where p_i is the predicted probability of the true class of observation i. Inspired by https://www.kaggle.com/wiki/MultiClassLogLoss.
mcc
Matthews correlation coefficient
1 -1 X X test.mean
mmce
Mean misclassification error
X 0 1 X X X test.mean
multiclass.au1p
Weighted average 1 vs. 1 multiclass AUC
1 0.5 X X X X test.mean Computes AUC of c(c - 1) binary classifiers while considering the a priori distribution of the classes. See Ferri et. al paper: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf
multiclass.au1u
Average 1 vs. 1 multiclass AUC
1 0.5 X X X X test.mean Computes AUC of c(c - 1) binary classifiers (all possible pairwise combinations) while considering uniform distribution of the classes. See Ferri et. al: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf
multiclass.aunp
Weighted average multiclass AUC
1 0.5 X X X X test.mean Computes the AUC treating a c-dimensional classifier as c two-dimensional classifiers, taking into account the prior probability of each class (p(j)). See Ferri et. al: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf
multiclass.aunu
Average multiclass AUC
1 0.5 X X X X test.mean Computes the AUC treating a c-dimensional classifier as c two-dimensional classifiers, where classes are assumed to have uniform distribution, in order to have a measure which is independent of class distribution change. See Ferri et. al: https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf
multiclass.brier
Multiclass Brier score
X 0 2 X X X X test.mean Defined as: (1/n) sum_i sum_j (y_ij - p_ij)^2, where y_ij = 1 if observation i has class j (else 0), and p_ij is the predicted probablity of observation i for class j. From http://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf
npv
Negative predictive value
1 0 X X test.mean
ppv
Positive predictive value
1 0 X X test.mean Also called precision.
tn
True negatives
Inf 0 X X test.mean Also called correct rejections.
tnr
True negative rate
1 0 X X test.mean Also called specificity.
tp
True positives
Inf 0 X X test.mean
tpr
True positive rate
1 0 X X test.mean Also called hit rate or recall.

Regression

ID / Name Minim. Best Worst Pred. Truth Probs Model Task Feats Aggr. Note
adjrsq
Adjusted coefficient of determination
1 0 X X test.mean Adjusted R-squared is only defined for normal linear regression
expvar
Explained variance
1 0 X X test.mean Similar to measaure rsq (R-squared). Defined as explained_sum_of_squares / total_sum_of_squares.
mae
Mean of absolute errors
X 0 Inf X X test.mean
medae
Median of absolute errors
X 0 Inf X X test.mean
medse
Median of squared errors
X 0 Inf X X test.mean
mse
Mean of squared errors
X 0 Inf X X test.mean
rmse
Root mean square error
X 0 Inf X X test.rmse The RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean.
rsq
Coefficient of determination
1 -Inf X X test.mean Also called R-squared, which is 1 - residual_sum_of_squares / total_sum_of_squares.
sae
Sum of absolute errors
X 0 Inf X X test.mean
sse
Sum of squared errors
X 0 Inf X X test.mean

Survival analysis

ID / Name Minim. Best Worst Pred. Truth Probs Model Task Feats Aggr. Note
cindex
Concordance index
1 0 X X test.mean

Cluster analysis

ID / Name Minim. Best Worst Pred. Truth Probs Model Task Feats Aggr. Note
db
Davies-Bouldin cluster separation measure
X 0 Inf X X test.mean See ?clusterSim::index.DB.
dunn
Dunn index
Inf 0 X X test.mean See ?clValid::dunn.
G1
Calinski-Harabasz pseudo F statistic
Inf 0 X X test.mean See ?clusterSim::index.G1.
G2
Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic
Inf 0 X X test.mean See ?clusterSim::index.G2.
silhouette
Rousseeuw's silhouette internal cluster quality index
Inf 0 X X test.mean See ?clusterSim::index.S.

Cost-sensitive classification

ID / Name Minim. Best Worst Pred. Truth Probs Model Task Feats Aggr. Note
mcp
Misclassification penalty
X 0 Inf X X test.mean Average difference between costs of oracle and model prediction.
meancosts
Mean costs of the predicted choices
X 0 Inf X X test.mean

Note that in case of ordinary misclassification costs you can also generate performance measures from cost matrices by function makeCostMeasure. For details see the tutorial page on cost-sensitive classification and also the page on custom performance measures.

Multilabel classification

ID / Name Minim. Best Worst Pred. Truth Probs Model Task Feats Aggr. Note
multilabel.acc
Accuracy (multilabel)
1 0 X X test.mean Mean of proportion of correctly predicted labels with respect to the total number of labels for each instance, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf
multilabel.f1
F1 measure (multilabel)
1 0 X X test.mean Harmonic mean of precision and recall on a per instance basis (Micro-F1), following the definition by Montanes et al.: http://www.sciencedirect.com/science/article/pii/S0031320313004019
multilabel.hamloss
Hamming loss
X 0 1 X X test.mean Proportion of labels whose relevance is incorrectly predicted, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf
multilabel.ppv
Postive predicive value (multilabel)
1 0 X X test.mean Also called precision. Mean of ratio of truly predicted labels for each instance, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf
multilabel.subset01
Subset-0-1 loss
X 0 1 X X test.mean Proportion of observations where the complete multilabel set (all 0-1-lables) is not correctly predicted, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf
multilabel.tpr
TPR (multilabel)
1 0 X X test.mean Also called recall. Mean of proportion of predicted labels which are relevant for each instance, following the definition by Charte and Charte: https://journal.r-project.org/archive/2015-2/charte-charte.pdf

General performance measures

ID / Name Minim. Best Worst Pred. Truth Probs Model Task Feats Aggr. Note
featperc
Percentage of original features used for model
X 0 1 X X test.mean Useful for feature selection.
timeboth
timetrain + timepredict
X 0 Inf X X test.mean
timepredict
Time of predicting test set
X 0 Inf X test.mean
timetrain
Time of fitting the model
X 0 Inf X test.mean