SKLearner Home | About | Contact | Examples

Configure GradientBoostingClassifier "init" Parameter

The init parameter in scikit-learn’s GradientBoostingClassifier allows you to specify an initial estimator to be used as the first estimator in the boosting ensemble.

By default, init is set to None, which means the initial estimator is a DummyEstimator that predicts the mean of the training data. You can set init to any estimator that implements the fit and predict methods.

Using a more sophisticated initial estimator can sometimes improve the performance of the ensemble, especially if the initial estimator is a good fit for the problem.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.dummy import DummyClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different init values
init_values = [None,
               DummyClassifier(strategy='most_frequent'),
               DummyClassifier(strategy='stratified'),
               DecisionTreeClassifier(max_depth=1),
               DecisionTreeClassifier(max_depth=2),
               DecisionTreeClassifier(max_depth=3)]

accuracies = []

for init in init_values:
    gb = GradientBoostingClassifier(n_estimators=100, init=init, random_state=42)
    gb.fit(X_train, y_train)
    y_pred = gb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"init={init}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

init=None, Accuracy: 0.785
init=DummyClassifier(strategy='most_frequent'), Accuracy: 0.575
init=DummyClassifier(strategy='stratified'), Accuracy: 0.645
init=DecisionTreeClassifier(max_depth=1), Accuracy: 0.800
init=DecisionTreeClassifier(max_depth=2), Accuracy: 0.775
init=DecisionTreeClassifier(max_depth=3), Accuracy: 0.740

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train GradientBoostingClassifier models with different init values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting init:

Issues to consider:



See Also