Extreme Gradient Boosting Classification Learner

eXtreme Gradient Boosting classification. Calls xgboost::xgb.train() from package xgboost.

If not specified otherwise, the evaluation metric is set to the default "logloss" for binary classification problems and set to "mlogloss" for multiclass problems. This was necessary to silence a deprecation warning.

Note that using the watchlist parameter directly will lead to problems when wrapping this mlr3::Learner in a mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist. See the section Early Stopping and Validation on how to do this.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Initial parameter values

nrounds:
- Actual default: no default.
- Adjusted default: 1000.
- Reason for change: Without a default construction of the learner would error. The lightgbm learner has a default of 1000, so we use the same here.
nthread:
- Actual value: Undefined, triggering auto-detection of the number of CPUs.
- Adjusted value: 1.
- Reason for change: Conflicting with parallelization via future.
verbose:
- Actual default: 1.
- Adjusted default: 0.
- Reason for change: Reduce verbosity.

Early Stopping and Validation

In order to monitor the validation performance during the training, you can set the $validate field of the Learner. For information on how to configure the validation set, see the Validation section of mlr3::Learner. This validation data can also be used for early stopping, which can be enabled by setting the early_stopping_rounds parameter. The final (or in the case of early stopping best) validation scores can be accessed via $internal_valid_scores, and the optimal nrounds via $internal_tuned_values. The internal validation measure can be set via the eval_metric parameter that can be a mlr3::Measure, a function, or a character string for the internal xgboost measures. Using an mlr3::Measure is slower than the internal xgboost measures, but allows to use the same measure for tuning and validation.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("classif.xgboost")
lrn("classif.xgboost")

Meta Information

Task type: “classif”
Predict Types: “response”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3learners, xgboost

Parameters

Id	Type	Default	Levels	Range
alpha	numeric	0		$[0, \infty)$
approxcontrib	logical	FALSE	TRUE, FALSE	-
base_score	numeric	0.5		$(-\infty, \infty)$
booster	character	gbtree	gbtree, gblinear, dart	-
callbacks	untyped	list()		-
colsample_bylevel	numeric	1		$[0, 1]$
colsample_bynode	numeric	1		$[0, 1]$
colsample_bytree	numeric	1		$[0, 1]$
device	untyped	"cpu"		-
disable_default_eval_metric	logical	FALSE	TRUE, FALSE	-
early_stopping_rounds	integer	NULL		$[1, \infty)$
eta	numeric	0.3		$[0, 1]$
eval_metric	untyped	-		-
feature_selector	character	cyclic	cyclic, shuffle, random, greedy, thrifty	-
gamma	numeric	0		$[0, \infty)$
grow_policy	character	depthwise	depthwise, lossguide	-
interaction_constraints	untyped	-		-
iterationrange	untyped	-		-
lambda	numeric	1		$[0, \infty)$
lambda_bias	numeric	0		$[0, \infty)$
max_bin	integer	256		$[2, \infty)$
max_delta_step	numeric	0		$[0, \infty)$
max_depth	integer	6		$[0, \infty)$
max_leaves	integer	0		$[0, \infty)$
maximize	logical	NULL	TRUE, FALSE	-
min_child_weight	numeric	1		$[0, \infty)$
missing	numeric	NA		$(-\infty, \infty)$
monotone_constraints	untyped	0		-
nrounds	integer	-		$[1, \infty)$
normalize_type	character	tree	tree, forest	-
nthread	integer	1		$[1, \infty)$
ntreelimit	integer	NULL		$[1, \infty)$
num_parallel_tree	integer	1		$[1, \infty)$
objective	untyped	"binary:logistic"		-
one_drop	logical	FALSE	TRUE, FALSE	-
outputmargin	logical	FALSE	TRUE, FALSE	-
predcontrib	logical	FALSE	TRUE, FALSE	-
predinteraction	logical	FALSE	TRUE, FALSE	-
predleaf	logical	FALSE	TRUE, FALSE	-
print_every_n	integer	1		$[1, \infty)$
process_type	character	default	default, update	-
rate_drop	numeric	0		$[0, 1]$
refresh_leaf	logical	TRUE	TRUE, FALSE	-
reshape	logical	FALSE	TRUE, FALSE	-
seed_per_iteration	logical	FALSE	TRUE, FALSE	-
sampling_method	character	uniform	uniform, gradient_based	-
sample_type	character	uniform	uniform, weighted	-
save_name	untyped	NULL		-
save_period	integer	NULL		$[0, \infty)$
scale_pos_weight	numeric	1		$(-\infty, \infty)$
skip_drop	numeric	0		$[0, 1]$
strict_shape	logical	FALSE	TRUE, FALSE	-
subsample	numeric	1		$[0, 1]$
top_k	integer	0		$[0, \infty)$
training	logical	FALSE	TRUE, FALSE	-
tree_method	character	auto	auto, exact, approx, hist, gpu_hist	-
tweedie_variance_power	numeric	1.5		$[1, 2]$
updater	untyped	-		-
verbose	integer	1		$[0, 2]$
watchlist	untyped	NULL		-
xgb_model	untyped	NULL		-

Offset

If a Task has a column with the role offset, it will automatically be used during training. The offset is incorporated through the xgboost::xgb.DMatrix interface, using the base_margin field. No offset is applied during prediction for this learner.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. doi:10.1145/2939672.2939785 .

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifXgboost

Active bindings

internal_valid_scores: (named list() or NULL) The validation scores extracted from model$evaluation_log. If early stopping is activated, this contains the validation scores of the model for the optimal nrounds, otherwise the nrounds for the final model.
internal_tuned_values: (named list() or NULL) If early stopping is activated, this returns a list with nrounds, which is extracted from $best_iteration of the model and otherwise NULL.
validate: (numeric(1) or character(1) or NULL) How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

LearnerClassifXgboost$new()

Method `importance()`

The importance scores are calculated with xgboost::xgb.importance().

Usage

LearnerClassifXgboost$importance()

Returns

Named numeric().

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClassifXgboost$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

if (FALSE) { # \dontrun{
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("classif.xgboost")
print(learner)

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

# print the model
print(learner$model)

# importance method
if("importance" %in% learner$properties) print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
}
} # }

if (FALSE) { # \dontrun{
# Train learner with early stopping on spam data set
task = tsk("spam")

# use 30 percent for validation
# Set early stopping parameter
learner = lrn("classif.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  validate = 0.3
)

# Train learner with early stopping
learner$train(task)

# Inspect optimal nrounds and validation performance
learner$internal_tuned_values
learner$internal_valid_scores
} # }

Note

Initial parameter values

Early Stopping and Validation

Dictionary

Meta Information

Parameters

Offset

References

See also

Super classes

Active bindings

Methods

Public methods

Method new()

Usage

Method importance()

Usage

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `importance()`

Method `clone()`