Integrated Learners
This page lists the learning methods already integrated in mlr.
Columns Num., Fac., NAs, and Weights indicate if a method can cope with numerical and factor predictors, if it can deal with missing values in a meaningful way (other than simply removing observations with missing values) and if observation weights are supported.
Column Props shows further properties of the learning methods. ordered indicates that a method can deal with ordered factor features. For classification, you can see if binary and/or multi-class problems are supported and if the learner accepts class weights. For survival analysis, the censoring type is shown. For example rcens means that the learning method can deal with right censored data. Moreover, the type of prediction is displayed, where prob indicates that probabilities can be predicted. For regression, se means that additional to the mean response standard errors can be predicted. See also RLearner for details.
Classification (70)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
classif.ada ada |
ada Boosting | ada | X | X | X | prob twoclass |
||
classif.avNNet avNNet |
Neural Network | nnet | X | X | X | multiclass prob twoclass |
size has been set to 3 by default. Doing bagging training of nnet if set bag=TRUE . |
|
classif.bartMachine bartmachine |
Bayesian Additive Regression Trees | bartMachine | X | X | X | prob twoclass |
'use_missing_data' has been set to TRUE by default to allow missing data support | |
classif.bdk bdk |
Bi-Directional Kohonen map | kohonen | X | multiclass prob twoclass |
||||
classif.binomial binomial |
Binomial Regression | stats | X | X | X | prob twoclass |
Delegates to glm with freely choosable binomial link function via learner param 'link'. | |
classif.blackboost blackbst |
Gradient Boosting With Regression Trees | mboost party |
X | X | X | X | prob twoclass |
see ?ctree_control for possible breakage for nominal features with missingness |
classif.boosting adabag |
Adabag Boosting | adabag rpart |
X | X | X | multiclass prob twoclass |
xval has been set to 0 by default for speed. |
|
classif.bst bst |
Gradient Boosting | bst | X | twoclass | The argument learner has been renamed to Learner due to a name conflict with setHyerPars . Learner has been set to lm by default. |
|||
classif.cforest cforest |
Random forest based on conditional inference trees | party | X | X | X | X | multiclass ordered prob twoclass |
see ?ctree_control for possible breakage for nominal features with missingness |
classif.clusterSVM clusterSVM |
Clustered Support Vector Machines | SwarmSVM LiblineaR |
X | twoclass | centers set to 2 by default |
|||
classif.ctree ctree |
Conditional Inference Trees | party | X | X | X | X | multiclass ordered prob twoclass |
see ?ctree_control for possible breakage for nominal features with missingness |
classif.dbnDNN dbn.dnn |
Deep neural network with weights initialized by DBN | deepnet | X | multiclass prob twoclass |
output set to softmax by default |
|||
classif.dcSVM dcSVM |
Divided-Conquer Support Vector Machines | SwarmSVM | X | twoclass | ||||
classif.extraTrees extraTrees |
Extremely Randomized Trees | extraTrees | X | X | multiclass prob twoclass |
|||
classif.fnn fnn |
Fast k-Nearest Neighbour | FNN | X | multiclass twoclass |
||||
classif.gaterSVM gaterSVM |
Mixture of SVMs with Neural Network Gater Function | SwarmSVM e1071 |
X | twoclass | m set to 3 and max.iter set to 1 by default | |||
classif.gbm gbm |
Gradient Boosting Machine | gbm | X | X | X | X | multiclass prob twoclass |
|
classif.geoDA geoda |
Geometric Predictive Discriminant Analysis | DiscriMiner | X | multiclass twoclass |
||||
classif.glmboost glmbst |
Boosting for GLMs | mboost | X | X | X | prob twoclass |
family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. |
|
classif.glmnet glmnet |
GLM with Lasso or Elasticnet Regularization | glmnet | X | X | X | multiclass prob twoclass |
Factors automatically get converted to dummy columns, ordered factors to integer | |
classif.hdrda hdrda |
High-Dimensional Regularized Discriminant Analysis | sparsediscrim | X | prob twoclass |
||||
classif.IBk ibk |
k-Nearest Neighbours | RWeka | X | X | multiclass prob twoclass |
|||
classif.J48 j48 |
J48 Decision Trees | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.JRip jrip |
Propositional Rule Learner | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.kknn kknn |
k-Nearest Neighbor | kknn | X | X | multiclass prob twoclass |
|||
classif.knn knn |
k-Nearest Neighbor | class | X | multiclass twoclass |
||||
classif.ksvm ksvm |
Support Vector Machines | kernlab | X | X | class.weights multiclass prob twoclass |
Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. |
||
classif.lda lda |
Linear Discriminant Analysis | MASS | X | X | multiclass prob twoclass |
Learner param 'predict.method' maps to 'method' in predict.lda. | ||
classif.LiblineaRL1L2SVC liblinl1l2svc |
L1-Regularized L2-Loss Support Vector Classification | LiblineaR | X | class.weights multiclass twoclass |
||||
classif.LiblineaRL1LogReg liblinl1logreg |
L1-Regularized Logistic Regression | LiblineaR | X | class.weights multiclass prob twoclass |
||||
classif.LiblineaRL2L1SVC liblinl2l1svc |
L2-Regularized L1-Loss Support Vector Classification | LiblineaR | X | class.weights multiclass twoclass |
||||
classif.LiblineaRL2LogReg liblinl2logreg |
L2-Regularized Logistic Regression | LiblineaR | X | class.weights multiclass prob twoclass |
type 0 is primal and type 7 is dual problem | |||
classif.LiblineaRL2SVC liblinl2svc |
L2-Regularized L2-Loss Support Vector Classification | LiblineaR | X | class.weights multiclass twoclass |
type 2 is primal and type 1 is dual problem | |||
classif.LiblineaRMultiClassSVC liblinmulticlasssvc |
Support Vector Classification by Crammer and Singer | LiblineaR | X | class.weights multiclass twoclass |
||||
classif.linDA linda |
Linear Discriminant Analysis | DiscriMiner | X | multiclass twoclass |
||||
classif.logreg logreg |
Logistic Regression | stats | X | X | X | prob twoclass |
Delegates to glm with family binomial/logit. | |
classif.lqa lqa |
Fitting penalized Generalized Linear Models with the LQA algorithm | lqa | X | X | prob twoclass |
penalty has been set to lasso and lambda to 0.1 by default. |
||
classif.lssvm lssvm |
Least Squares Support Vector Machine | kernlab | X | X | multiclass twoclass |
fitted has been set to FALSE by default for speed. |
||
classif.lvq1 lvq1 |
Learning Vector Quantization | class | X | multiclass twoclass |
||||
classif.mda mda |
Mixture Discriminant Analysis | mda | X | X | multiclass prob twoclass |
keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes |
||
classif.mlp mlp |
Multi-Layer Perceptron | RSNNS | X | multiclass prob twoclass |
||||
classif.multinom multinom |
Multinomial Regression | nnet | X | X | X | multiclass prob twoclass |
||
classif.naiveBayes nbayes |
Naive Bayes | e1071 | X | X | X | multiclass prob twoclass |
||
classif.neuralnet neuralnet |
Neural Network from neuralnet | neuralnet | X | prob twoclass |
err.fct has been set to ce to do classification. |
|||
classif.nnet nnet |
Neural Network | nnet | X | X | X | multiclass prob twoclass |
size has been set to 3 by default. |
|
classif.nnTrain nn.train |
Training Neural Network by Backpropagation | deepnet | X | multiclass prob twoclass |
output set to softmax by default |
|||
classif.nodeHarvest nodeHarvest |
Node Harvest | nodeHarvest | X | X | prob twoclass |
|||
classif.OneR oner |
1-R Classifier | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.pamr pamr |
Nearest shrunken centroid | pamr | X | prob twoclass |
threshold for prediction (threshold.predict ) has been set to 1 by default |
|||
classif.PART part |
PART Decision Lists | RWeka | X | X | X | multiclass prob twoclass |
NAs are directly passed to WEKA with na.action = na.pass |
|
classif.plr plr |
Logistic Regression with a L2 Penalty | stepPlr | X | X | X | prob twoclass |
AIC and BIC penalty types can be selected via the new parameter cp.type |
|
classif.plsdaCaret plsdacaret |
Partial Least Squares (PLS) Discriminant Analysis | caret | X | prob twoclass |
||||
classif.probit probit |
Probit Regression | stats | X | X | X | prob twoclass |
Delegates to glm with family binomial/probit. | |
classif.qda qda |
Quadratic Discriminant Analysis | MASS | X | X | multiclass prob twoclass |
Learner param 'predict.method' maps to 'method' in predict.lda. | ||
classif.quaDA quada |
Quadratic Discriminant Analysis | DiscriMiner | X | multiclass twoclass |
||||
classif.randomForest rf |
Random Forest | randomForest | X | X | class.weights multiclass ordered prob twoclass |
|||
classif.randomForestSRC rfsrc |
Random Forest | randomForestSRC | X | X | X | multiclass prob twoclass |
'na.action' has been set to 'na.impute' by default to allow missing data support | |
classif.ranger ranger |
Random Forests | ranger | X | X | multiclass prob twoclass |
By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. | ||
classif.rda rda |
Regularized Discriminant Analysis | klaR | X | X | multiclass prob twoclass |
estimate.error has been set to FALSE by default for speed. |
||
classif.rFerns rFerns |
Random ferns | rFerns | X | X | multiclass ordered twoclass |
|||
classif.rknn rknn |
Random k-Nearest-Neighbors | rknn | X | multiclass ordered twoclass |
||||
classif.rotationForest rotationForest |
Rotation Forest | rotationForest | X | X | ordered prob twoclass |
|||
classif.rpart rpart |
Decision Tree | rpart | X | X | X | X | multiclass ordered prob twoclass |
xval has been set to 0 by default for speed. |
classif.rrlda rrlda |
Robust Regularized Linear Discriminant Analysis | rrlda | X | multiclass twoclass |
||||
classif.saeDNN sae.dnn |
Deep neural network with weights initialized by Stacked AutoEncoder | deepnet | X | multiclass prob twoclass |
output set to softmax by default |
|||
classif.sda sda |
Shrinkage Discriminant Analysis | sda | X | multiclass prob twoclass |
||||
classif.sparseLDA sparseLDA |
Sparse Discriminant Analysis | sparseLDA MASS elasticnet |
X | multiclass prob twoclass |
Arguments Q and stop are not yet provided as they depend on the task. | |||
classif.svm svm |
Support Vector Machines (libsvm) | e1071 | X | X | class.weights multiclass prob twoclass |
|||
classif.xgboost xgboost |
eXtreme Gradient Boosting | xgboost | X | X | X | multiclass prob twoclass |
All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default | |
classif.xyf xyf |
X-Y fused self-organising maps | kohonen | X | multiclass prob twoclass |
Regression (52)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
regr.avNNet avNNet |
Neural Network | nnet | X | X | X | size has been set to 3 by default. |
||
regr.bartMachine bartmachine |
Bayesian Additive Regression Trees | bartMachine | X | X | X | 'use_missing_data' has been set to TRUE by default to allow missing data support | ||
regr.bcart bcart |
Bayesian CART | tgp | X | X | se | |||
regr.bdk bdk |
Bi-Directional Kohonen map | kohonen | X | |||||
regr.bgp bgp |
Bayesian Gaussian Process | tgp | X | se | ||||
regr.bgpllm bgpllm |
Bayesian Gaussian Process with jumps to the Limiting Linear Model | tgp | X | se | ||||
regr.blackboost blackbst |
Gradient Boosting with Regression Trees | mboost party |
X | X | X | X | see ?ctree_control for possible breakage for nominal features with missingness | |
regr.blm blm |
Bayesian Linear Model | tgp | X | se | ||||
regr.brnn brnn |
Bayesian regularization for feed-forward neural networks | brnn | X | X | ||||
regr.bst bst |
Gradient Boosting | bst | X | The argument learner has been renamed to Learner due to a name conflict with setHyerPars |
||||
regr.btgp btgp |
Bayesian Treed Gaussian Process | tgp | X | X | se | |||
regr.btgpllm btgpllm |
Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model | tgp | X | X | se | |||
regr.btlm btlm |
Bayesian Treed Linear Model | tgp | X | X | se | |||
regr.cforest cforest |
Random Forest Based on Conditional Inference Trees | party | X | X | X | X | ordered | see ?ctree_control for possible breakage for nominal features with missingness |
regr.crs crs |
Regression Splines | crs | X | X | X | se | ||
regr.ctree ctree |
Conditional Inference Trees | party | X | X | X | X | ordered | see ?ctree_control for possible breakage for nominal features with missingness |
regr.cubist cubist |
Cubist | Cubist | X | X | X | |||
regr.earth earth |
Multivariate Adaptive Regression Splines | earth | X | X | ||||
regr.elmNN elmNN |
Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks | elmNN | X | nhid has been set to 1 and actfun has been set to "sig" by default | ||||
regr.extraTrees extraTrees |
Extremely Randomized Trees | extraTrees | X | X | ||||
regr.fnn fnn |
Fast k-Nearest Neighbor | FNN | X | |||||
regr.frbs frbs |
Fuzzy Rule-based Systems | frbs | X | |||||
regr.gbm gbm |
Gradient Boosting Machine | gbm | X | X | X | X | distribution has been set to gaussian by default. |
|
regr.glmboost glmboost |
Boosting for GLMs | mboost | X | X | X | Maximum number of boosting iterations is set via 'mstop', the actual number used is controlled by 'm'. | ||
regr.glmnet glmnet |
GLM with Lasso or Elasticnet Regularization | glmnet | X | X | X | ordered | Factors automatically get converted to dummy columns, ordered factors to integer | |
regr.IBk ibk |
K-Nearest Neighbours | RWeka | X | X | ||||
regr.kknn kknn |
K-Nearest-Neighbor regression | kknn | X | X | ||||
regr.km km |
Kriging | DiceKriging | X | se | In predict, we currently always use type = 'SK'. The extra param 'jitter' (default is FALSE) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on. | |||
regr.ksvm ksvm |
Support Vector Machines | kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. |
|||
regr.laGP laGP |
Local Approximate Gaussian Process | laGP | X | se | ||||
regr.LiblineaRL2L1SVR liblinl2l1svr |
L2-Regularized L1-Loss Support Vector Regression | LiblineaR | X | |||||
regr.LiblineaRL2L2SVR liblinl2l2svr |
L2-Regularized L2-Loss Support Vector Regression | LiblineaR | X | type 11 is primal and 12 is dual problem | ||||
regr.lm lm |
Simple Linear Regression | stats | X | X | X | se | ||
regr.mars mars |
Multivariate Adaptive Regression Splines | mda | X | |||||
regr.mob mob |
Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node | party | X | X | X | |||
regr.nnet nnet |
Neural Network | nnet | X | X | X | size has been set to 3 by default. |
||
regr.nodeHarvest nodeHarvest |
Node Harvest | nodeHarvest | X | X | ||||
regr.pcr pcr |
Principal Component Regression | pls | X | X | ||||
regr.penalized.lasso lasso |
Lasso Regression | penalized | X | X | ||||
regr.penalized.ridge ridge |
Penalized Ridge Regression | penalized | X | X | ||||
regr.plsr plsr |
Partial Least Squares Regression | pls | X | X | ||||
regr.randomForest rf |
Random Forest | randomForest | X | X | ordered se |
|||
regr.randomForestSRC rfsrc |
Random Forest | randomForestSRC | X | X | X | na.action' has been set to 'na.impute' by default to allow missing data support | ||
regr.ranger ranger |
Random Forests | ranger | X | X | By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. | |||
regr.rknn rknn |
Random k-Nearest-Neighbors | rknn | X | ordered | ||||
regr.rpart rpart |
Decision Tree | rpart | X | X | X | X | ordered | xval has been set to 0 by default for speed. |
regr.rsm rsm |
Response Surface Regression | rsm | X | You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order) | ||||
regr.rvm rvm |
Relevance Vector Machine | kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed. |
|||
regr.slim slim |
Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization | flare | X | lambda.idx has been set to 3 by default | ||||
regr.svm svm |
Support Vector Machines (libsvm) | e1071 | X | X | ||||
regr.xgboost xgboost |
eXtreme Gradient Boosting | xgboost | X | X | X | All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default | ||
regr.xyf xyf |
X-Y fused self-organising maps | kohonen | X |
Survival analysis (11)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
surv.cforest crf |
Random Forest based on Conditional Inference Trees | party survival |
X | X | X | X | ordered rcens |
see ?ctree_control for possible breakage for nominal features with missingness |
surv.CoxBoost coxboost |
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting | CoxBoost | X | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.coxph coxph |
Cox Proportional Hazard Model | survival | X | X | X | X | prob rcens |
|
surv.cvglmnet cvglmnet |
GLM with Regularization (Cross Validated Lambda) | glmnet | X | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.glmboost glmboost |
Gradient Boosting with Componentwise Linear Models | survival mboost |
X | X | X | ordered rcens |
family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. |
|
surv.glmnet glmnet |
GLM with Regularization | glmnet | X | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.optimCoxBoostPenalty optimCoxBoostPenalty |
Cox Proportional Hazards Model with Componentwise Likelihood based Boosting, automatic tuning enabled | CoxBoost | X | X | X | rcens | Factors automatically get converted to dummy columns, ordered factors to integer | |
surv.penalized penalized |
Penalized Regression | penalized | X | X | ordered rcens |
Factors automatically get converted to dummy columns, ordered factors to integer | ||
surv.randomForestSRC rfsrc |
Random Forests for Survival | survival randomForestSRC |
X | X | X | ordered rcens |
'na.action' has been set to 'na.impute' by default to allow missing data support | |
surv.ranger ranger |
Random Forests | ranger | X | X | prob rcens |
By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. | ||
surv.rpart rpart |
Survival Tree | rpart | X | X | X | X | ordered rcens |
xval has been set to 0 by default for speed. |
Cluster analysis (7)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
cluster.cmeans cmeans |
Fuzzy C-Means Clustering | e1071 clue |
X | prob | The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user. | |||
cluster.Cobweb cobweb |
Cobweb Clustering Algorithm | RWeka | X | |||||
cluster.EM em |
Expectation-Maximization Clustering | RWeka | X | |||||
cluster.FarthestFirst farthestfirst |
FarthestFirst Clustering Algorithm | RWeka | X | |||||
cluster.kmeans kmeans |
K-Means | stats clue |
X | prob | The 'predict' method uses 'cl_predict' from the 'clue' package to compute the cluster memberships for new data. The default 'centers=2' is added so the method runs without setting params, but this must in reality of course be changed by the user. | |||
cluster.SimpleKMeans simplekmeans |
K-Means Clustering | RWeka | X | |||||
cluster.XMeans xmeans |
XMeans (k-means with automatic determination of k) | RWeka | X | You may have to install the XMeans Weka package: WPM('install-package', 'XMeans'). |
Cost-sensitive classification
For ordinary misclassification costs you can use all the standard classification methods listed above.
For example-dependent costs there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. See section cost-sensitive classification and the documentation of makeCostSensClassifWrapper, makeCostSensRegrWrapper and makeCostSensWeightedPairsWrapper for details.
Multilabel classification (1)
ID / Short Name | Name | Packages | Num. | Fac. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
multilabel.rFerns rFerns |
Random ferns | rFerns | X | X | ordered |
Moreover, you can use the binary relevance method to apply ordinary classification learners to the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper and the tutorial section on multilabel classification for details.