Integrated Learners

This page lists the learning methods already integrated in mlr.

Columns Num., Fac., NAs, and Weights indicate if a method can cope with numerical and factor predictors, if it can deal with missing values in a meaningful way (other than simply removing observations with missing values) and if observation weights are supported.

Column Props shows further properties of the learning methods. ordered indicates that a method can deal with ordered factor features. For classification, you can see if binary and/or multi-class problems are supported and if the learner accepts class weights. For survival analysis, the censoring type is shown. For example rcens means that the learning method can deal with right censored data. Moreover, the type of prediction is displayed, where prob indicates that probabilities can be predicted. For regression, se means that additional to the mean response standard errors can be predicted. See also RLearner for details.

Classification (70)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
classif.ada
ada
ada Boosting ada X X X prob
twoclass
classif.avNNet
avNNet
Neural Network nnet X X X multiclass
prob
twoclass
size has been set to 3 by default. Doing bagging training of nnet if set bag=TRUE.
classif.bartMachine
bartmachine
Bayesian Additive Regression Trees bartMachine X X X prob
twoclass
'use_missing_data' has been set to TRUE by default to allow missing data support
classif.bdk
bdk
Bi-Directional Kohonen map kohonen X multiclass
prob
twoclass
classif.binomial
binomial
Binomial Regression stats X X X prob
twoclass
Delegates to glm with freely choosable binomial link function via learner param 'link'.
classif.blackboost
blackbst
Gradient Boosting With Regression Trees mboost
party
X X X X prob
twoclass
see ?ctree_control for possible breakage for nominal features with missingness
classif.boosting
adabag
Adabag Boosting adabag
rpart
X X X multiclass
prob
twoclass
xval has been set to 0 by default for speed.
classif.bst
bst
Gradient Boosting bst X twoclass The argument learner has been renamed to Learner due to a name conflict with setHyerPars. Learner has been set to lm by default.
classif.cforest
cforest
Random forest based on conditional inference trees party X X X X multiclass
ordered
prob
twoclass
see ?ctree_control for possible breakage for nominal features with missingness
classif.clusterSVM
clusterSVM
Clustered Support Vector Machines SwarmSVM
LiblineaR
X twoclass centers set to 2 by default
classif.ctree
ctree
Conditional Inference Trees party X X X X multiclass
ordered
prob
twoclass
see ?ctree_control for possible breakage for nominal features with missingness
classif.dbnDNN
dbn.dnn
Deep neural network with weights initialized by DBN deepnet X multiclass
prob
twoclass
output set to softmax by default
classif.dcSVM
dcSVM
Divided-Conquer Support Vector Machines SwarmSVM X twoclass
classif.extraTrees
extraTrees
Extremely Randomized Trees extraTrees X X multiclass
prob
twoclass
classif.fnn
fnn
Fast k-Nearest Neighbour FNN X multiclass
twoclass
classif.gaterSVM
gaterSVM
Mixture of SVMs with Neural Network Gater Function SwarmSVM
e1071
X twoclass m set to 3 and max.iter set to 1 by default
classif.gbm
gbm
Gradient Boosting Machine gbm X X X X multiclass
prob
twoclass
classif.geoDA
geoda
Geometric Predictive Discriminant Analysis DiscriMiner X multiclass
twoclass
classif.glmboost
glmbst
Boosting for GLMs mboost X X X prob
twoclass
family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.
classif.glmnet
glmnet
GLM with Lasso or Elasticnet Regularization glmnet X X X multiclass
prob
twoclass
Factors automatically get converted to dummy columns, ordered factors to integer
classif.hdrda
hdrda
High-Dimensional Regularized Discriminant Analysis sparsediscrim X prob
twoclass
classif.IBk
ibk
k-Nearest Neighbours RWeka X X multiclass
prob
twoclass
classif.J48
j48
J48 Decision Trees RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.JRip
jrip
Propositional Rule Learner RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.kknn
kknn
k-Nearest Neighbor kknn X X multiclass
prob
twoclass
classif.knn
knn
k-Nearest Neighbor class X multiclass
twoclass
classif.ksvm
ksvm
Support Vector Machines kernlab X X class.weights
multiclass
prob
twoclass
Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed.
classif.lda
lda
Linear Discriminant Analysis MASS X X multiclass
prob
twoclass
Learner param 'predict.method' maps to 'method' in predict.lda.
classif.LiblineaRL1L2SVC
liblinl1l2svc
L1-Regularized L2-Loss Support Vector Classification LiblineaR X class.weights
multiclass
twoclass
classif.LiblineaRL1LogReg
liblinl1logreg
L1-Regularized Logistic Regression LiblineaR X class.weights
multiclass
prob
twoclass
classif.LiblineaRL2L1SVC
liblinl2l1svc
L2-Regularized L1-Loss Support Vector Classification LiblineaR X class.weights
multiclass
twoclass
classif.LiblineaRL2LogReg
liblinl2logreg
L2-Regularized Logistic Regression LiblineaR X class.weights
multiclass
prob
twoclass
type 0 is primal and type 7 is dual problem
classif.LiblineaRL2SVC
liblinl2svc
L2-Regularized L2-Loss Support Vector Classification LiblineaR X class.weights
multiclass
twoclass
type 2 is primal and type 1 is dual problem
classif.LiblineaRMultiClassSVC
liblinmulticlasssvc
Support Vector Classification by Crammer and Singer LiblineaR X class.weights
multiclass
twoclass
classif.linDA
linda
Linear Discriminant Analysis DiscriMiner X multiclass
twoclass
classif.logreg
logreg
Logistic Regression stats X X X prob
twoclass
Delegates to glm with family binomial/logit.
classif.lqa
lqa
Fitting penalized Generalized Linear Models with the LQA algorithm lqa X X prob
twoclass
penalty has been set to lasso and lambda to 0.1 by default.
classif.lssvm
lssvm
Least Squares Support Vector Machine kernlab X X multiclass
twoclass
fitted has been set to FALSE by default for speed.
classif.lvq1
lvq1
Learning Vector Quantization class X multiclass
twoclass
classif.mda
mda
Mixture Discriminant Analysis mda X X multiclass
prob
twoclass
keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes
classif.mlp
mlp
Multi-Layer Perceptron RSNNS X multiclass
prob
twoclass
classif.multinom
multinom
Multinomial Regression nnet X X X multiclass
prob
twoclass
classif.naiveBayes
nbayes
Naive Bayes e1071 X X X multiclass
prob
twoclass
classif.neuralnet
neuralnet
Neural Network from neuralnet neuralnet X prob
twoclass
err.fct has been set to ce to do classification.
classif.nnet
nnet
Neural Network nnet X X X multiclass
prob
twoclass
size has been set to 3 by default.
classif.nnTrain
nn.train
Training Neural Network by Backpropagation deepnet X multiclass
prob
twoclass
output set to softmax by default
classif.nodeHarvest
nodeHarvest
Node Harvest nodeHarvest X X prob
twoclass
classif.OneR
oner
1-R Classifier RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.pamr
pamr
Nearest shrunken centroid pamr X prob
twoclass
threshold for prediction (threshold.predict) has been set to 1 by default
classif.PART
part
PART Decision Lists RWeka X X X multiclass
prob
twoclass
NAs are directly passed to WEKA with na.action = na.pass
classif.plr
plr
Logistic Regression with a L2 Penalty stepPlr X X X prob
twoclass
AIC and BIC penalty types can be selected via the new parameter cp.type
classif.plsdaCaret
plsdacaret
Partial Least Squares (PLS) Discriminant Analysis caret X prob
twoclass
classif.probit
probit
Probit Regression stats X X X prob
twoclass
Delegates to glm with family binomial/probit.
classif.qda
qda
Quadratic Discriminant Analysis MASS X X multiclass
prob
twoclass
Learner param 'predict.method' maps to 'method' in predict.lda.
classif.quaDA
quada
Quadratic Discriminant Analysis DiscriMiner X multiclass
twoclass
classif.randomForest
rf
Random Forest randomForest X X class.weights
multiclass
ordered
prob
twoclass
classif.randomForestSRC
rfsrc
Random Forest randomForestSRC X X X multiclass
prob
twoclass
'na.action' has been set to 'na.impute' by default to allow missing data support
classif.ranger
ranger
Random Forests ranger X X multiclass
prob
twoclass
By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.
classif.rda
rda
Regularized Discriminant Analysis klaR X X multiclass
prob
twoclass
estimate.error has been set to FALSE by default for speed.
classif.rFerns
rFerns
Random ferns rFerns X X multiclass
ordered
twoclass
classif.rknn
rknn
Random k-Nearest-Neighbors rknn X multiclass
ordered
twoclass
classif.rotationForest
rotationForest
Rotation Forest rotationForest X X ordered
prob
twoclass
classif.rpart
rpart
Decision Tree rpart X X X X multiclass
ordered
prob
twoclass
xval has been set to 0 by default for speed.
classif.rrlda
rrlda
Robust Regularized Linear Discriminant Analysis rrlda X multiclass
twoclass
classif.saeDNN
sae.dnn
Deep neural network with weights initialized by Stacked AutoEncoder deepnet X multiclass
prob
twoclass
output set to softmax by default
classif.sda
sda
Shrinkage Discriminant Analysis sda X multiclass
prob
twoclass
classif.sparseLDA
sparseLDA
Sparse Discriminant Analysis sparseLDA
MASS
elasticnet
X multiclass
prob
twoclass
Arguments Q and stop are not yet provided as they depend on the task.
classif.svm
svm
Support Vector Machines (libsvm) e1071 X X class.weights
multiclass
prob
twoclass
classif.xgboost
xgboost
eXtreme Gradient Boosting xgboost X X X multiclass
prob
twoclass
All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default
classif.xyf
xyf
X-Y fused self-organising maps kohonen X multiclass
prob
twoclass

Regression (52)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
regr.avNNet
avNNet
Neural Network nnet X X X size has been set to 3 by default.
regr.bartMachine
bartmachine
Bayesian Additive Regression Trees bartMachine X X X 'use_missing_data' has been set to TRUE by default to allow missing data support
regr.bcart
bcart
Bayesian CART tgp X X se
regr.bdk
bdk
Bi-Directional Kohonen map kohonen X
regr.bgp
bgp
Bayesian Gaussian Process tgp X se
regr.bgpllm
bgpllm
Bayesian Gaussian Process with jumps to the Limiting Linear Model tgp X se
regr.blackboost
blackbst
Gradient Boosting with Regression Trees mboost
party
X X X X see ?ctree_control for possible breakage for nominal features with missingness
regr.blm
blm
Bayesian Linear Model tgp X se
regr.brnn
brnn
Bayesian regularization for feed-forward neural networks brnn X X
regr.bst
bst
Gradient Boosting bst X The argument learner has been renamed to Learner due to a name conflict with setHyerPars
regr.btgp
btgp
Bayesian Treed Gaussian Process tgp X X se
regr.btgpllm
btgpllm
Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model tgp X X se
regr.btlm
btlm
Bayesian Treed Linear Model tgp X X se
regr.cforest
cforest
Random Forest Based on Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness
regr.crs
crs
Regression Splines crs X X X se
regr.ctree
ctree
Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness
regr.cubist
cubist
Cubist Cubist X X X
regr.earth
earth
Multivariate Adaptive Regression Splines earth X X
regr.elmNN
elmNN
Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks elmNN X nhid has been set to 1 and actfun has been set to "sig" by default
regr.extraTrees
extraTrees
Extremely Randomized Trees extraTrees X X
regr.fnn
fnn
Fast k-Nearest Neighbor FNN X
regr.frbs
frbs
Fuzzy Rule-based Systems frbs X
regr.gbm
gbm
Gradient Boosting Machine gbm X X X X distribution has been set to gaussian by default.
regr.glmboost
glmboost
Boosting for GLMs mboost X X X Maximum number of boosting iterations is set via 'mstop', the actual number used is controlled by 'm'.
regr.glmnet
glmnet
GLM with Lasso or Elasticnet Regularization glmnet X X X ordered Factors automatically get converted to dummy columns, ordered factors to integer
regr.IBk
ibk
K-Nearest Neighbours RWeka X X
regr.kknn
kknn
K-Nearest-Neighbor regression kknn X X
regr.km
km
Kriging DiceKriging X se In predict, we currently always use type = 'SK'. The extra param 'jitter' (default is FALSE) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on.
regr.ksvm
ksvm
Support Vector Machines kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed.
regr.laGP
laGP
Local Approximate Gaussian Process laGP X se
regr.LiblineaRL2L1SVR
liblinl2l1svr
L2-Regularized L1-Loss Support Vector Regression LiblineaR X
regr.LiblineaRL2L2SVR
liblinl2l2svr
L2-Regularized L2-Loss Support Vector Regression LiblineaR X type 11 is primal and 12 is dual problem
regr.lm
lm
Simple Linear Regression stats X X X se
regr.mars
mars
Multivariate Adaptive Regression Splines mda X
regr.mob
mob
Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node party X X X
regr.nnet
nnet
Neural Network nnet X X X size has been set to 3 by default.
regr.nodeHarvest
nodeHarvest
Node Harvest nodeHarvest X X
regr.pcr
pcr
Principal Component Regression pls X X
regr.penalized.lasso
lasso
Lasso Regression penalized X X
regr.penalized.ridge
ridge
Penalized Ridge Regression penalized X X
regr.plsr
plsr
Partial Least Squares Regression pls X X
regr.randomForest
rf
Random Forest randomForest X X ordered
se
regr.randomForestSRC
rfsrc
Random Forest randomForestSRC X X X na.action' has been set to 'na.impute' by default to allow missing data support
regr.ranger
ranger
Random Forests ranger X X By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.
regr.rknn
rknn
Random k-Nearest-Neighbors rknn X ordered
regr.rpart
rpart
Decision Tree rpart X X X X ordered xval has been set to 0 by default for speed.
regr.rsm
rsm
Response Surface Regression rsm X You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order)
regr.rvm
rvm
Relevance Vector Machine kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed.
regr.slim
slim
Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization flare X lambda.idx has been set to 3 by default
regr.svm
svm
Support Vector Machines (libsvm) e1071 X X
regr.xgboost
xgboost
eXtreme Gradient Boosting xgboost X X X All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default
regr.xyf
xyf
X-Y fused self-organising maps kohonen X

Survival analysis (11)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
surv.cforest
crf
Random Forest based on Conditional Inference Trees party
survival
X X X X ordered
rcens
see ?ctree_control for possible breakage for nominal features with missingness
surv.CoxBoost
coxboost
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting CoxBoost X X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.coxph
coxph
Cox Proportional Hazard Model survival X X X X prob
rcens
surv.cvglmnet
cvglmnet
GLM with Regularization (Cross Validated Lambda) glmnet X X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.glmboost
glmboost
Gradient Boosting with Componentwise Linear Models survival
mboost
X X X ordered
rcens
family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.
surv.glmnet
glmnet
GLM with Regularization glmnet X X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.optimCoxBoostPenalty
optimCoxBoostPenalty
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting, automatic tuning enabled CoxBoost X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer
surv.penalized
penalized
Penalized Regression penalized X X ordered
rcens
Factors automatically get converted to dummy columns, ordered factors to integer
surv.randomForestSRC
rfsrc
Random Forests for Survival survival
randomForestSRC
X X X ordered
rcens
'na.action' has been set to 'na.impute' by default to allow missing data support
surv.ranger
ranger
Random Forests ranger X X prob
rcens
By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.
surv.rpart
rpart
Survival Tree rpart X X X X ordered
rcens
xval has been set to 0 by default for speed.

Cluster analysis (7)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
cluster.cmeans
cmeans
Fuzzy C-Means Clustering e1071
clue
X prob The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user.
cluster.Cobweb
cobweb
Cobweb Clustering Algorithm RWeka X
cluster.EM
em
Expectation-Maximization Clustering RWeka X
cluster.FarthestFirst
farthestfirst
FarthestFirst Clustering Algorithm RWeka X
cluster.kmeans
kmeans
K-Means stats
clue
X prob The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user.
cluster.SimpleKMeans
simplekmeans
K-Means Clustering RWeka X
cluster.XMeans
xmeans
XMeans (k-means with automatic determination of k) RWeka X You may have to install the XMeans Weka package: WPM('install-package', 'XMeans').

Cost-sensitive classification

For ordinary misclassification costs you can use all the standard classification methods listed above.

For example-dependent costs there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. See section cost-sensitive classification and the documentation of makeCostSensClassifWrapper, makeCostSensRegrWrapper and makeCostSensWeightedPairsWrapper for details.

Multilabel classification (1)

ID / Short Name Name Packages Num. Fac. NAs Weights Props Note
multilabel.rFerns
rFerns
Random ferns rFerns X X ordered

Moreover, you can use the binary relevance method to apply ordinary classification learners to the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper and the tutorial section on multilabel classification for details.