Integrated Learners

This page lists the learning methods already integrated in mlr.

Columns Num., Fac., Ord., NAs, and Weights indicate if a method can cope with numerical, factor, and ordered factor predictors, if it can deal with missing values in a meaningful way (other than simply removing observations with missing values) and if observation weights are supported.

Column Props shows further properties of the learning methods specific to the type of learning task. See also RLearner for details.

Classification (72)

For classification the following additional learner properties are relevant and shown in column Props:

Class / Short Name / Name Packages Num. Fac. Ord. NAs Weights Props Note
classif.ada
ada

ada Boosting
ada X X X prob
twoclass
xval has been set to 0 by default for speed.
classif.avNNet
avNNet

Neural Network
nnet X X X prob
twoclass
multiclass
size has been set to 3 by default. Doing bagging training of nnet if set bag = TRUE.
classif.bartMachine
bartmachine

Bayesian Additive Regression Trees
bartMachine X X X prob
twoclass
use_missing_data has been set to TRUE by default to allow missing data support.
classif.bdk
bdk

Bi-Directional Kohonen map
kohonen X prob
twoclass
multiclass
classif.binomial
binomial

Binomial Regression
stats X X X prob
twoclass
Delegates to glm with freely choosable binomial link function via learner parameter link.
classif.blackboost
blackbst

Gradient Boosting With Regression Trees
mboost
party
X X X X prob
twoclass
See ?ctree_control for possible breakage for nominal features with missingness.
classif.boosting
adabag

Adabag Boosting
adabag
rpart
X X X prob
twoclass
multiclass
xval has been set to 0 by default for speed.
classif.bst
bst

Gradient Boosting
bst X twoclass Renamed parameter learner to Learner due to nameclash with setHyperPars. Default changes: Learner = "ls", xval = 0, and maxdepth = 1.
classif.cforest
cforest

Random forest based on conditional inference trees
party X X X X X prob
twoclass
multiclass
See ?ctree_control for possible breakage for nominal features with missingness.
classif.clusterSVM
clusterSVM

Clustered Support Vector Machines
SwarmSVM
LiblineaR
X twoclass centers set to 2 by default.
classif.ctree
ctree

Conditional Inference Trees
party X X X X X prob
twoclass
multiclass
See ?ctree_control for possible breakage for nominal features with missingness.
classif.cvglmnet
cvglmnet

GLM with Lasso or Elasticnet Regularization (Cross Validated Lambda)
glmnet X X X prob
twoclass
multiclass
The family parameter is set to binomial for two-class problems and to multinomial otherwise. Factors automatically get converted to dummy columns, ordered factors to integer.
classif.dbnDNN
dbn.dnn

Deep neural network with weights initialized by DBN
deepnet X prob
twoclass
multiclass
output set to "softmax" by default.
classif.dcSVM
dcSVM

Divided-Conquer Support Vector Machines
SwarmSVM X twoclass
classif.extraTrees
extraTrees

Extremely Randomized Trees
extraTrees X X prob
twoclass
multiclass
classif.fnn
fnn

Fast k-Nearest Neighbour
FNN X twoclass
multiclass
classif.gaterSVM
gaterSVM

Mixture of SVMs with Neural Network Gater Function
SwarmSVM
e1071
X twoclass m set to 3 and max.iter set to 1 by default.
classif.gbm
gbm

Gradient Boosting Machine
gbm X X X X prob
twoclass
multiclass
Note on param 'distribution': gbm will select 'bernoulli' by default for 2 classes, and 'multinomial' for multiclass problems. The latter is the only setting that works for > 2 classes.
classif.geoDA
geoda

Geometric Predictive Discriminant Analysis
DiscriMiner X twoclass
multiclass
classif.glmboost
glmbst

Boosting for GLMs
mboost X X X prob
twoclass
family has been set to Binomial() by default. Maximum number of boosting iterations is set via mstop, the actual number used for prediction is controlled by m.
classif.glmnet
glmnet

GLM with Lasso or Elasticnet Regularization
glmnet X X X prob
twoclass
multiclass
The family parameter is set to binomial for two-class problems and to multinomial otherwise. Factors automatically get converted to dummy columns, ordered factors to integer.
classif.hdrda
hdrda

High-Dimensional Regularized Discriminant Analysis
sparsediscrim X prob
twoclass
classif.IBk
ibk

k-Nearest Neighbours
RWeka X X prob
twoclass
multiclass
classif.J48
j48

J48 Decision Trees
RWeka X X X prob
twoclass
multiclass
NAs are directly passed to WEKA with na.action = na.pass.
classif.JRip
jrip

Propositional Rule Learner
RWeka X X X prob
twoclass
multiclass
NAs are directly passed to WEKA with na.action = na.pass.
classif.kknn
kknn

k-Nearest Neighbor
kknn X X prob
twoclass
multiclass
classif.knn
knn

k-Nearest Neighbor
class X twoclass
multiclass
classif.ksvm
ksvm

Support Vector Machines
kernlab X X prob
twoclass
multiclass
class.weights
Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed.
classif.lda
lda

Linear Discriminant Analysis
MASS X X prob
twoclass
multiclass
Learner parameter predict.method maps to method in predict.lda.
classif.LiblineaRL1L2SVC
liblinl1l2svc

L1-Regularized L2-Loss Support Vector Classification
LiblineaR X twoclass
multiclass
class.weights
classif.LiblineaRL1LogReg
liblinl1logreg

L1-Regularized Logistic Regression
LiblineaR X prob
twoclass
multiclass
class.weights
classif.LiblineaRL2L1SVC
liblinl2l1svc

L2-Regularized L1-Loss Support Vector Classification
LiblineaR X twoclass
multiclass
class.weights
classif.LiblineaRL2LogReg
liblinl2logreg

L2-Regularized Logistic Regression
LiblineaR X prob
twoclass
multiclass
class.weights
type = 0 (the default) is primal and type = 7 is dual problem.
classif.LiblineaRL2SVC
liblinl2svc

L2-Regularized L2-Loss Support Vector Classification
LiblineaR X twoclass
multiclass
class.weights
type = 2 (the default) is primal and type = 1 is dual problem.
classif.LiblineaRMultiClassSVC
liblinmulticlasssvc

Support Vector Classification by Crammer and Singer
LiblineaR X twoclass
multiclass
class.weights
classif.linDA
linda

Linear Discriminant Analysis
DiscriMiner X twoclass
multiclass
Set validation = NULL by default to disable internal test set validation.
classif.logreg
logreg

Logistic Regression
stats X X X prob
twoclass
Delegates to glm with family = binomial(link = "logit").
classif.lqa
lqa

Fitting penalized Generalized Linear Models with the LQA algorithm
lqa X X prob
twoclass
penalty has been set to "lasso" and lambda to 0.1 by default.
classif.lssvm
lssvm

Least Squares Support Vector Machine
kernlab X X twoclass
multiclass
fitted has been set to FALSE by default for speed.
classif.lvq1
lvq1

Learning Vector Quantization
class X twoclass
multiclass
classif.mda
mda

Mixture Discriminant Analysis
mda X X prob
twoclass
multiclass
keep.fitted has been set to FALSE by default for speed and we use start.method = "lvq" for more robust behavior / less technical crashes.
classif.mlp
mlp

Multi-Layer Perceptron
RSNNS X prob
twoclass
multiclass
classif.multinom
multinom

Multinomial Regression
nnet X X X prob
twoclass
multiclass
classif.naiveBayes
nbayes

Naive Bayes
e1071 X X X prob
twoclass
multiclass
classif.neuralnet
neuralnet

Neural Network from neuralnet
neuralnet X prob
twoclass
err.fct has been set to ce to do classification.
classif.nnet
nnet

Neural Network
nnet X X X prob
twoclass
multiclass
size has been set to 3 by default.
classif.nnTrain
nn.train

Training Neural Network by Backpropagation
deepnet X prob
twoclass
multiclass
output set to softmax by default.
classif.nodeHarvest
nodeHarvest

Node Harvest
nodeHarvest X X prob
twoclass
classif.OneR
oner

1-R Classifier
RWeka X X X prob
twoclass
multiclass
NAs are directly passed to WEKA with na.action = na.pass.
classif.pamr
pamr

Nearest shrunken centroid
pamr X prob
twoclass
Threshold for prediction (threshold.predict) has been set to 1 by default.
classif.PART
part

PART Decision Lists
RWeka X X X prob
twoclass
multiclass
NAs are directly passed to WEKA with na.action = na.pass.
classif.plr
plr

Logistic Regression with a L2 Penalty
stepPlr X X X prob
twoclass
AIC and BIC penalty types can be selected via the new parameter cp.type.
classif.plsdaCaret
plsdacaret

Partial Least Squares (PLS) Discriminant Analysis
caret X prob
twoclass
classif.probit
probit

Probit Regression
stats X X X prob
twoclass
Delegates to glm with family = binomial(link = "probit").
classif.qda
qda

Quadratic Discriminant Analysis
MASS X X prob
twoclass
multiclass
Learner parameter predict.method maps to method in predict.qda.
classif.quaDA
quada

Quadratic Discriminant Analysis
DiscriMiner X twoclass
multiclass
classif.randomForest
rf

Random Forest
randomForest X X X prob
twoclass
multiclass
class.weights
Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures.
classif.randomForestSRC
rfsrc

Random Forest
randomForestSRC X X X prob
twoclass
multiclass
na.action has been set to na.impute by default to allow missing data support.
classif.randomForestSRCSyn
rfsrcSyn

Synthetic Random Forest
randomForestSRC X X X prob
twoclass
multiclass
na.action' has been set to 'na.impute' by default to allow missing data support
classif.ranger
ranger

Random Forests
ranger X X prob
twoclass
multiclass
By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.
classif.rda
rda

Regularized Discriminant Analysis
klaR X X prob
twoclass
multiclass
estimate.error has been set to FALSE by default for speed.
classif.rFerns
rFerns

Random ferns
rFerns X X X twoclass
multiclass
classif.rknn
rknn

Random k-Nearest-Neighbors
rknn X X twoclass
multiclass
classif.rotationForest
rotationForest

Rotation Forest
rotationForest X X X prob
twoclass
classif.rpart
rpart

Decision Tree
rpart X X X X X prob
twoclass
multiclass
xval has been set to 0 by default for speed.
classif.rrlda
rrlda

Robust Regularized Linear Discriminant Analysis
rrlda X twoclass
multiclass
classif.saeDNN
sae.dnn

Deep neural network with weights initialized by Stacked AutoEncoder
deepnet X prob
twoclass
multiclass
output set to "softmax" by default.
classif.sda
sda

Shrinkage Discriminant Analysis
sda X prob
twoclass
multiclass
classif.sparseLDA
sparseLDA

Sparse Discriminant Analysis
sparseLDA
MASS
elasticnet
X prob
twoclass
multiclass
Arguments Q and stop are not yet provided as they depend on the task.
classif.svm
svm

Support Vector Machines (libsvm)
e1071 X X prob
twoclass
multiclass
class.weights
classif.xgboost
xgboost

eXtreme Gradient Boosting
xgboost X X X prob
twoclass
multiclass
All settings are passed directly, rather than through xgboost's params argument. nrounds has been set to 1 by default.
classif.xyf
xyf

X-Y fused self-organising maps
kohonen X prob
twoclass
multiclass

Regression (53)

Additional learner properties:

Class / Short Name / Name Packages Num. Fac. Ord. NAs Weights Props Note
regr.avNNet
avNNet

Neural Network
nnet X X X size has been set to 3 by default.
regr.bartMachine
bartmachine

Bayesian Additive Regression Trees
bartMachine X X X use_missing_data has been set to TRUE by default to allow missing data support.
regr.bcart
bcart

Bayesian CART
tgp X X se
regr.bdk
bdk

Bi-Directional Kohonen map
kohonen X
regr.bgp
bgp

Bayesian Gaussian Process
tgp X se
regr.bgpllm
bgpllm

Bayesian Gaussian Process with jumps to the Limiting Linear Model
tgp X se
regr.blackboost
blackbst

Gradient Boosting with Regression Trees
mboost
party
X X X X See ?ctree_control for possible breakage for nominal features with missingness.
regr.blm
blm

Bayesian Linear Model
tgp X se
regr.brnn
brnn

Bayesian regularization for feed-forward neural networks
brnn X X
regr.bst
bst

Gradient Boosting
bst X Renamed parameter learner to Learner due to nameclash with setHyperPars. Default changes: Learner = "ls", xval = 0, and maxdepth = 1.
regr.btgp
btgp

Bayesian Treed Gaussian Process
tgp X X se
regr.btgpllm
btgpllm

Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model
tgp X X se
regr.btlm
btlm

Bayesian Treed Linear Model
tgp X X se
regr.cforest
cforest

Random Forest Based on Conditional Inference Trees
party X X X X X See ?ctree_control for possible breakage for nominal features with missingness.
regr.crs
crs

Regression Splines
crs X X X se
regr.ctree
ctree

Conditional Inference Trees
party X X X X X See ?ctree_control for possible breakage for nominal features with missingness.
regr.cubist
cubist

Cubist
Cubist X X X
regr.earth
earth

Multivariate Adaptive Regression Splines
earth X X
regr.elmNN
elmNN

Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks
elmNN X nhid has been set to 1 and actfun has been set to "sig" by default.
regr.extraTrees
extraTrees

Extremely Randomized Trees
extraTrees X X
regr.fnn
fnn

Fast k-Nearest Neighbor
FNN X
regr.frbs
frbs

Fuzzy Rule-based Systems
frbs X
regr.gbm
gbm

Gradient Boosting Machine
gbm X X X X distribution has been set to "gaussian" by default.
regr.glmboost
glmboost

Boosting for GLMs
mboost X X X Maximum number of boosting iterations is set via mstop, the actual number used is controlled by m.
regr.glmnet
glmnet

GLM with Lasso or Elasticnet Regularization
glmnet X X X X Factors automatically get converted to dummy columns, ordered factors to integer.
regr.IBk
ibk

K-Nearest Neighbours
RWeka X X
regr.kknn
kknn

K-Nearest-Neighbor regression
kknn X X
regr.km
km

Kriging
DiceKriging X se In predict, we currently always use type = "SK". The extra parameter jitter (default is FALSE) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on.
regr.ksvm
ksvm

Support Vector Machines
kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed.
regr.laGP
laGP

Local Approximate Gaussian Process
laGP X se
regr.LiblineaRL2L1SVR
liblinl2l1svr

L2-Regularized L1-Loss Support Vector Regression
LiblineaR X Parameter svr_eps has been set to 0.1 by default.
regr.LiblineaRL2L2SVR
liblinl2l2svr

L2-Regularized L2-Loss Support Vector Regression
LiblineaR X type = 11 (the default) is primal and type = 12 is dual problem. Parameter svr_eps has been set to 0.1 by default.
regr.lm
lm

Simple Linear Regression
stats X X X se
regr.mars
mars

Multivariate Adaptive Regression Splines
mda X
regr.mob
mob

Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node
party X X X
regr.nnet
nnet

Neural Network
nnet X X X size has been set to 3 by default.
regr.nodeHarvest
nodeHarvest

Node Harvest
nodeHarvest X X
regr.pcr
pcr

Principal Component Regression
pls X X
regr.penalized.lasso
lasso

Lasso Regression
penalized X X
regr.penalized.ridge
ridge

Penalized Ridge Regression
penalized X X
regr.plsr
plsr

Partial Least Squares Regression
pls X X
regr.randomForest
rf

Random Forest
randomForest X X X se See ?regr.randomForest for information about se estimation. Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures.
regr.randomForestSRC
rfsrc

Random Forest
randomForestSRC X X X na.action has been set to na.impute by default to allow missing data support.
regr.randomForestSRCSyn
rfsrcSyn

Synthetic Random Forest
randomForestSRC X X X na.action' has been set to 'na.impute' by default to allow missing data support
regr.ranger
ranger

Random Forests
ranger X X By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.
regr.rknn
rknn

Random k-Nearest-Neighbors
rknn X X
regr.rpart
rpart

Decision Tree
rpart X X X X X xval has been set to 0 by default for speed.
regr.rsm
rsm

Response Surface Regression
rsm X You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order).
regr.rvm
rvm

Relevance Vector Machine
kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed.
regr.slim
slim

Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization
flare X lambda.idx has been set to 3 by default.
regr.svm
svm

Support Vector Machines (libsvm)
e1071 X X
regr.xgboost
xgboost

eXtreme Gradient Boosting
xgboost X X X All settings are passed directly, rather than through xgboost's params argument. nrounds has been set to 1 by default.
regr.xyf
xyf

X-Y fused self-organising maps
kohonen X

Survival analysis (11)

Additional learner properties:

Class / Short Name / Name Packages Num. Fac. Ord. NAs Weights Props Note
surv.cforest
crf

Random Forest based on Conditional Inference Trees
party
survival
X X X X X rcens See ?ctree_control for possible breakage for nominal features with missingness.
surv.CoxBoost
coxboost

Cox Proportional Hazards Model with Componentwise Likelihood based Boosting
CoxBoost X X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer.
surv.coxph
coxph

Cox Proportional Hazard Model
survival X X X X prob
rcens
surv.cvglmnet
cvglmnet

GLM with Regularization (Cross Validated Lambda)
glmnet X X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer.
surv.glmboost
glmboost

Gradient Boosting with Componentwise Linear Models
survival
mboost
X X X X rcens family has been set to CoxPH() by default. Maximum number of boosting iterations is set via mstop, the actual number used for prediction is controlled by m.
surv.glmnet
glmnet

GLM with Regularization
glmnet X X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer.
surv.optimCoxBoostPenalty
optimCoxBoostPenalty

Cox Proportional Hazards Model with Componentwise Likelihood based Boosting, automatic tuning enabled
CoxBoost X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer.
surv.penalized
penalized

Penalized Regression
penalized X X X rcens Factors automatically get converted to dummy columns, ordered factors to integer.
surv.randomForestSRC
rfsrc

Random Forests for Survival
survival
randomForestSRC
X X X X rcens 'na.action' has been set to 'na.impute' by default to allow missing data support
surv.ranger
ranger

Random Forests
ranger X X prob
rcens
By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.
surv.rpart
rpart

Survival Tree
rpart X X X X X rcens xval has been set to 0 by default for speed.

Cluster analysis (8)

Additional learner properties:

Class / Short Name / Name Packages Num. Fac. Ord. NAs Weights Props Note
cluster.cmeans
cmeans

Fuzzy C-Means Clustering
e1071
clue
X prob The predict method uses cl_predict from the clue package to compute the cluster memberships for new data. The default centers = 2 is added so the method runs without setting parameters, but this must in reality of course be changed by the user.
cluster.Cobweb
cobweb

Cobweb Clustering Algorithm
RWeka X
cluster.dbscan
dbscan

DBScan Clustering
fpc X A cluster index of NA indicates noise points. Specify method = "dist" if the data should be interpreted as dissimilarity matrix or object. Otherwise Euclidean distances will be used.
cluster.EM
em

Expectation-Maximization Clustering
RWeka X
cluster.FarthestFirst
farthestfirst

FarthestFirst Clustering Algorithm
RWeka X
cluster.kmeans
kmeans

K-Means
stats
clue
X prob The predict method uses cl_predict from the clue package to compute the cluster memberships for new data. The default centers = 2 is added so the method runs without setting parameters, but this must in reality of course be changed by the user.
cluster.SimpleKMeans
simplekmeans

K-Means Clustering
RWeka X
cluster.XMeans
xmeans

XMeans (k-means with automatic determination of k)
RWeka X You may have to install the XMeans Weka package: WPM('install-package', 'XMeans').

Cost-sensitive classification

For ordinary misclassification costs you can use all the standard classification methods listed above.

For example-dependent costs there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. See section cost-sensitive classification and the documentation of makeCostSensClassifWrapper, makeCostSensRegrWrapper and makeCostSensWeightedPairsWrapper for details.

Multilabel classification (1)

Class / Short Name / Name Packages Num. Fac. Ord. NAs Weights Props Note
multilabel.rFerns
rFerns

Random ferns
rFerns X X X

Moreover, you can use the binary relevance method to apply ordinary classification learners to the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper and the tutorial section on multilabel classification for details.