Skip to contents

Random regression forest. Calls ranger() from package ranger.

Details

Additionally to the uncertainty estimation methods provided by the ranger package, the learner provides a ensemble variance and law of total variance uncertainty estimation. Both methods compute the empirical mean and variance of the training data points that fall into the predicted leaf nodes. The ensemble variance method calculates the variance of the mean of the leaf nodes. The law of total variance method calculates the mean of the variance of the leaf nodes plus the variance of the means of the leaf nodes. Formulas for the ensemble variance and law of total variance method are given in Hutter et al. (2015).

For these 2 methods, the parameter sigma2.threshold can be used to set a threshold for the variance of the leaf nodes, this is a minimal value for the variance of the leaf nodes, if the variance is below this threshold, it is set to this value (as described in the paper). Default is 1e-2.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("regr.ranger")
lrn("regr.ranger")

Meta Information

  • Task type: “regr”

  • Predict Types: “response”, “se”, “quantiles”

  • Feature Types: “logical”, “integer”, “numeric”, “character”, “factor”, “ordered”

  • Required Packages: mlr3, mlr3learners, ranger

Parameters

IdTypeDefaultLevelsRange
always.split.variablesuntyped--
holdoutlogicalFALSETRUE, FALSE-
importancecharacter-none, impurity, impurity_corrected, permutation-
keep.inbaglogicalFALSETRUE, FALSE-
max.depthintegerNULL\([1, \infty)\)
min.bucketinteger1\([1, \infty)\)
min.node.sizeinteger5\([1, \infty)\)
mtryinteger-\([1, \infty)\)
mtry.rationumeric-\([0, 1]\)
na.actioncharacterna.learnna.learn, na.omit, na.fail-
node.statslogicalFALSETRUE, FALSE-
num.random.splitsinteger1\([1, \infty)\)
num.threadsinteger1\([1, \infty)\)
num.treesinteger500\([1, \infty)\)
oob.errorlogicalTRUETRUE, FALSE-
poisson.taunumeric1\((-\infty, \infty)\)
regularization.factoruntyped1-
regularization.usedepthlogicalFALSETRUE, FALSE-
replacelogicalTRUETRUE, FALSE-
respect.unordered.factorscharacter-ignore, order, partition-
sample.fractionnumeric-\([0, 1]\)
save.memorylogicalFALSETRUE, FALSE-
scale.permutation.importancelogicalFALSETRUE, FALSE-
se.methodcharacterinfjackjack, infjack, ensemble_variance, law_of_total_variance-
sigma2.thresholdnumeric0.01\((-\infty, \infty)\)
seedintegerNULL\((-\infty, \infty)\)
split.select.weightsuntypedNULL-
splitrulecharactervariancevariance, extratrees, maxstat, beta, poisson-
verboselogicalTRUETRUE, FALSE-
write.forestlogicalTRUETRUE, FALSE-

Custom mlr3 parameters

  • mtry:

    • This hyperparameter can alternatively be set via our hyperparameter mtry.ratio as mtry = max(ceiling(mtry.ratio * n_features), 1). Note that mtry and mtry.ratio are mutually exclusive.

Initial parameter values

  • num.threads:

    • Actual default: 2, using two threads, while also respecting environment variable R_RANGER_NUM_THREADS, options(ranger.num.threads = N), or options(Ncpus = N), with precedence in that order.

    • Adjusted value: 1.

    • Reason for change: Conflicting with parallelization via future.

References

Wright, N. M, Ziegler, Andreas (2017). “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software, 77(1), 1–17. doi:10.18637/jss.v077.i01 .

Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565, doi:10.1023/A:1010933404324 .

Hutter, Frank, Xu, Lin, Hoos, H. H, Leyton-Brown, Kevin (2015). “Algorithm runtime prediction: methods and evaluation.” In Proceedings of the 24th International Conference on Artificial Intelligence, series IJCAI'15, 4197–4201. doi:10.5555/2832747.2832840 .

See also

Other Learner: mlr_learners_classif.cv_glmnet, mlr_learners_classif.glmnet, mlr_learners_classif.kknn, mlr_learners_classif.lda, mlr_learners_classif.log_reg, mlr_learners_classif.multinom, mlr_learners_classif.naive_bayes, mlr_learners_classif.nnet, mlr_learners_classif.qda, mlr_learners_classif.ranger, mlr_learners_classif.svm, mlr_learners_classif.xgboost, mlr_learners_regr.cv_glmnet, mlr_learners_regr.glmnet, mlr_learners_regr.kknn, mlr_learners_regr.km, mlr_learners_regr.lm, mlr_learners_regr.nnet, mlr_learners_regr.svm, mlr_learners_regr.xgboost

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrRanger

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method importance()

The importance scores are extracted from the model slot variable.importance. Parameter importance.mode must be set to "impurity", "impurity_corrected", or "permutation"

Usage

LearnerRegrRanger$importance()

Returns

Named numeric().


Method oob_error()

The out-of-bag error, extracted from model slot prediction.error.

Usage

LearnerRegrRanger$oob_error()

Returns

numeric(1)


Method selected_features()

The set of features used for node splitting in the forest.

Usage

LearnerRegrRanger$selected_features()

Returns

character().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerRegrRanger$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner and set parameter values
learner = lrn("regr.ranger")
print(learner)
#> 
#> ── <LearnerRegrRanger> (regr.ranger): Random Forest ────────────────────────────
#> • Model: -
#> • Parameters: num.threads=1, sigma2.threshold=0.01
#> • Packages: mlr3, mlr3learners, and ranger
#> • Predict Types: [response], se, and quantiles
#> • Feature Types: logical, integer, numeric, character, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: hotstart_backward, importance, missings, oob_error,
#> selected_features, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("mtcars")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# Print the model
print(learner$model)
#> $model
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(dependent.variable.name = task$target_names, data = data,      num.threads = 1L) 
#> 
#> Type:                             Regression 
#> Number of trees:                  500 
#> Sample size:                      21 
#> Number of independent variables:  10 
#> Mtry:                             3 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       8.593097 
#> R squared (OOB):                  0.8063534 
#> 

# Importance method
if ("importance" %in% learner$properties) print(learner$importance)
#> function () 
#> .__LearnerRegrRanger__importance(self = self, private = private, 
#>     super = super)
#> <environment: 0x56100c2a1708>

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> regr.mse 
#> 4.013859