Ranger Regression Learner

Random regression forest. Calls ranger() from package ranger.

Details

Additionally to the uncertainty estimation methods provided by the ranger package, the learner provides a ensemble variance and law of total variance uncertainty estimation. Both methods compute the empirical mean and variance of the training data points that fall into the predicted leaf nodes. The ensemble variance method calculates the variance of the mean of the leaf nodes. The law of total variance method calculates the mean of the variance of the leaf nodes plus the variance of the means of the leaf nodes. Formulas for the ensemble variance and law of total variance method are given in Hutter et al. (2015).

For these 2 methods, the parameter sigma2.threshold can be used to set a threshold for the variance of the leaf nodes, this is a minimal value for the variance of the leaf nodes, if the variance is below this threshold, it is set to this value (as described in the paper). Default is 1e-2.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("regr.ranger")
lrn("regr.ranger")

Meta Information

Task type: “regr”
Predict Types: “response”, “se”, “quantiles”
Feature Types: “logical”, “integer”, “numeric”, “character”, “factor”, “ordered”
Required Packages: mlr3, mlr3learners, ranger

Parameters

Id	Type	Default	Levels	Range
always.split.variables	untyped	-		-
holdout	logical	FALSE	TRUE, FALSE	-
importance	character	-	none, impurity, impurity_corrected, permutation	-
keep.inbag	logical	FALSE	TRUE, FALSE	-
max.depth	integer	NULL		$[1, \infty)$
min.bucket	integer	1		$[1, \infty)$
min.node.size	integer	5		$[1, \infty)$
mtry	integer	-		$[1, \infty)$
mtry.ratio	numeric	-		$[0, 1]$
na.action	character	na.learn	na.learn, na.omit, na.fail	-
node.stats	logical	FALSE	TRUE, FALSE	-
num.random.splits	integer	1		$[1, \infty)$
num.threads	integer	1		$[1, \infty)$
num.trees	integer	500		$[1, \infty)$
oob.error	logical	TRUE	TRUE, FALSE	-
poisson.tau	numeric	1		$(-\infty, \infty)$
regularization.factor	untyped	1		-
regularization.usedepth	logical	FALSE	TRUE, FALSE	-
replace	logical	TRUE	TRUE, FALSE	-
respect.unordered.factors	character	-	ignore, order, partition	-
sample.fraction	numeric	-		$[0, 1]$
save.memory	logical	FALSE	TRUE, FALSE	-
scale.permutation.importance	logical	FALSE	TRUE, FALSE	-
se.method	character	infjack	jack, infjack, ensemble_variance, law_of_total_variance	-
sigma2.threshold	numeric	0.01		$(-\infty, \infty)$
seed	integer	NULL		$(-\infty, \infty)$
split.select.weights	untyped	NULL		-
splitrule	character	variance	variance, extratrees, maxstat, beta, poisson	-
verbose	logical	TRUE	TRUE, FALSE	-
write.forest	logical	TRUE	TRUE, FALSE	-

Custom mlr3 parameters

mtry:
- This hyperparameter can alternatively be set via our hyperparameter mtry.ratio as mtry = max(ceiling(mtry.ratio * n_features), 1). Note that mtry and mtry.ratio are mutually exclusive.

Initial parameter values

num.threads:
- Actual default: 2, using two threads, while also respecting environment variable R_RANGER_NUM_THREADS, options(ranger.num.threads = N), or options(Ncpus = N), with precedence in that order.
- Adjusted value: 1.
- Reason for change: Conflicting with parallelization via future.

References

Wright, N. M, Ziegler, Andreas (2017). “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software, 77(1), 1–17. doi:10.18637/jss.v077.i01 .

Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565, doi:10.1023/A:1010933404324 .

Hutter, Frank, Xu, Lin, Hoos, H. H, Leyton-Brown, Kevin (2015). “Algorithm runtime prediction: methods and evaluation.” In Proceedings of the 24th International Conference on Artificial Intelligence, series IJCAI'15, 4197–4201. doi:10.5555/2832747.2832840 .

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrRanger

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

LearnerRegrRanger$new()

Method `importance()`

The importance scores are extracted from the model slot variable.importance. Parameter importance.mode must be set to "impurity", "impurity_corrected", or "permutation"

Usage

LearnerRegrRanger$importance()

Returns

Named numeric().

Method `oob_error()`

The out-of-bag error, extracted from model slot prediction.error.

Usage

LearnerRegrRanger$oob_error()

Returns

numeric(1)

Method `selected_features()`

The set of features used for node splitting in the forest.

Usage

LearnerRegrRanger$selected_features()

Returns

character().

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerRegrRanger$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Define the Learner and set parameter values
learner = lrn("regr.ranger")
print(learner)
#> 
#> ── <LearnerRegrRanger> (regr.ranger): Random Forest ────────────────────────────
#> • Model: -
#> • Parameters: num.threads=1, sigma2.threshold=0.01
#> • Packages: mlr3, mlr3learners, and ranger
#> • Predict Types: [response], se, and quantiles
#> • Feature Types: logical, integer, numeric, character, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: hotstart_backward, importance, missings, oob_error,
#> selected_features, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("mtcars")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# Print the model
print(learner$model)
#> $model
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(dependent.variable.name = task$target_names, data = data,      num.threads = 1L) 
#> 
#> Type:                             Regression 
#> Number of trees:                  500 
#> Sample size:                      21 
#> Number of independent variables:  10 
#> Mtry:                             3 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       8.593097 
#> R squared (OOB):                  0.8063534 
#> 

# Importance method
if ("importance" %in% learner$properties) print(learner$importance)
#> function () 
#> .__LearnerRegrRanger__importance(self = self, private = private, 
#>     super = super)
#> <environment: 0x5622a4669b98>

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> regr.mse 
#> 4.013859

Details

Dictionary

Meta Information

Parameters

Custom mlr3 parameters

Initial parameter values

References

See also

Super classes

Methods

Public methods

Method new()

Usage

Method importance()

Usage

Returns

Method oob_error()

Usage

Returns

Method selected_features()

Usage

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `importance()`

Method `oob_error()`

Method `selected_features()`

Method `clone()`