Random regression forest.
Calls ranger()
from package ranger.
Details
Additionally to the uncertainty estimation methods provided by the ranger package, the learner provides a ensemble variance and law of total variance uncertainty estimation. Both methods compute the empirical mean and variance of the training data points that fall into the predicted leaf nodes. The ensemble variance method calculates the variance of the mean of the leaf nodes. The law of total variance method calculates the mean of the variance of the leaf nodes plus the variance of the means of the leaf nodes. Formulas for the ensemble variance and law of total variance method are given in Hutter et al. (2015).
For these 2 methods, the parameter sigma2.threshold
can be used to set a threshold for the variance of the leaf nodes,
this is a minimal value for the variance of the leaf nodes, if the variance is below this threshold, it is set to this value (as described in the paper).
Default is 1e-2.
Dictionary
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn()
:
Meta Information
Task type: “regr”
Predict Types: “response”, “se”, “quantiles”
Feature Types: “logical”, “integer”, “numeric”, “character”, “factor”, “ordered”
Required Packages: mlr3, mlr3learners, ranger
Parameters
Id | Type | Default | Levels | Range |
always.split.variables | untyped | - | - | |
holdout | logical | FALSE | TRUE, FALSE | - |
importance | character | - | none, impurity, impurity_corrected, permutation | - |
keep.inbag | logical | FALSE | TRUE, FALSE | - |
max.depth | integer | NULL | \([1, \infty)\) | |
min.bucket | integer | 1 | \([1, \infty)\) | |
min.node.size | integer | 5 | \([1, \infty)\) | |
mtry | integer | - | \([1, \infty)\) | |
mtry.ratio | numeric | - | \([0, 1]\) | |
na.action | character | na.learn | na.learn, na.omit, na.fail | - |
node.stats | logical | FALSE | TRUE, FALSE | - |
num.random.splits | integer | 1 | \([1, \infty)\) | |
num.threads | integer | 1 | \([1, \infty)\) | |
num.trees | integer | 500 | \([1, \infty)\) | |
oob.error | logical | TRUE | TRUE, FALSE | - |
poisson.tau | numeric | 1 | \((-\infty, \infty)\) | |
regularization.factor | untyped | 1 | - | |
regularization.usedepth | logical | FALSE | TRUE, FALSE | - |
replace | logical | TRUE | TRUE, FALSE | - |
respect.unordered.factors | character | - | ignore, order, partition | - |
sample.fraction | numeric | - | \([0, 1]\) | |
save.memory | logical | FALSE | TRUE, FALSE | - |
scale.permutation.importance | logical | FALSE | TRUE, FALSE | - |
se.method | character | infjack | jack, infjack, ensemble_variance, law_of_total_variance | - |
sigma2.threshold | numeric | 0.01 | \((-\infty, \infty)\) | |
seed | integer | NULL | \((-\infty, \infty)\) | |
split.select.weights | untyped | NULL | - | |
splitrule | character | variance | variance, extratrees, maxstat, beta, poisson | - |
verbose | logical | TRUE | TRUE, FALSE | - |
write.forest | logical | TRUE | TRUE, FALSE | - |
Custom mlr3 parameters
mtry
:This hyperparameter can alternatively be set via our hyperparameter
mtry.ratio
asmtry = max(ceiling(mtry.ratio * n_features), 1)
. Note thatmtry
andmtry.ratio
are mutually exclusive.
Initial parameter values
num.threads
:Actual default:
2
, using two threads, while also respecting environment variableR_RANGER_NUM_THREADS
,options(ranger.num.threads = N)
, oroptions(Ncpus = N)
, with precedence in that order.Adjusted value: 1.
Reason for change: Conflicting with parallelization via future.
References
Wright, N. M, Ziegler, Andreas (2017). “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software, 77(1), 1–17. doi:10.18637/jss.v077.i01 .
Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565, doi:10.1023/A:1010933404324 .
Hutter, Frank, Xu, Lin, Hoos, H. H, Leyton-Brown, Kevin (2015). “Algorithm runtime prediction: methods and evaluation.” In Proceedings of the 24th International Conference on Artificial Intelligence, series IJCAI'15, 4197–4201. doi:10.5555/2832747.2832840 .
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
mlr3proba for probabilistic supervised regression and survival analysis.
mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Other Learner:
mlr_learners_classif.cv_glmnet
,
mlr_learners_classif.glmnet
,
mlr_learners_classif.kknn
,
mlr_learners_classif.lda
,
mlr_learners_classif.log_reg
,
mlr_learners_classif.multinom
,
mlr_learners_classif.naive_bayes
,
mlr_learners_classif.nnet
,
mlr_learners_classif.qda
,
mlr_learners_classif.ranger
,
mlr_learners_classif.svm
,
mlr_learners_classif.xgboost
,
mlr_learners_regr.cv_glmnet
,
mlr_learners_regr.glmnet
,
mlr_learners_regr.kknn
,
mlr_learners_regr.km
,
mlr_learners_regr.lm
,
mlr_learners_regr.nnet
,
mlr_learners_regr.svm
,
mlr_learners_regr.xgboost
Super classes
mlr3::Learner
-> mlr3::LearnerRegr
-> LearnerRegrRanger
Methods
Inherited methods
Method importance()
The importance scores are extracted from the model slot variable.importance
.
Parameter importance.mode
must be set to "impurity"
, "impurity_corrected"
, or
"permutation"
Returns
Named numeric()
.
Examples
# Define the Learner and set parameter values
learner = lrn("regr.ranger")
print(learner)
#>
#> ── <LearnerRegrRanger> (regr.ranger): Random Forest ────────────────────────────
#> • Model: -
#> • Parameters: num.threads=1, sigma2.threshold=0.01
#> • Packages: mlr3, mlr3learners, and ranger
#> • Predict Types: [response], se, and quantiles
#> • Feature Types: logical, integer, numeric, character, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: hotstart_backward, importance, missings, oob_error,
#> selected_features, and weights
#> • Other settings: use_weights = 'use'
# Define a Task
task = tsk("mtcars")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
# Print the model
print(learner$model)
#> $model
#> Ranger result
#>
#> Call:
#> ranger::ranger(dependent.variable.name = task$target_names, data = data, num.threads = 1L)
#>
#> Type: Regression
#> Number of trees: 500
#> Sample size: 21
#> Number of independent variables: 10
#> Mtry: 3
#> Target node size: 5
#> Variable importance mode: none
#> Splitrule: variance
#> OOB prediction error (MSE): 8.593097
#> R squared (OOB): 0.8063534
#>
# Importance method
if ("importance" %in% learner$properties) print(learner$importance)
#> function ()
#> .__LearnerRegrRanger__importance(self = self, private = private,
#> super = super)
#> <environment: 0x56100c2a1708>
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> regr.mse
#> 4.013859