SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "validation_fraction" Parameter

The validation_fraction parameter in scikit-learn’s GradientBoostingRegressor controls the fraction of training data to set aside for validation during training.

Gradient Boosting is a machine learning technique for regression problems, which builds an ensemble of trees sequentially to minimize the loss function. It is known for its accuracy and ability to handle various types of data.

The validation_fraction parameter specifies the fraction of training data to set aside as validation data for early stopping. It helps monitor the validation error during training to prevent overfitting.

The default value for validation_fraction is 0.1 (10% of the training data).

In practice, values between 0.1 and 0.3 are commonly used depending on the dataset size and complexity.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different validation_fraction values
validation_fraction_values = [0.1, 0.2, 0.3]
mse_scores = []

for vf in validation_fraction_values:
    gbr = GradientBoostingRegressor(validation_fraction=vf, n_estimators=100, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"validation_fraction={vf}, MSE: {mse:.3f}")

Running the example gives an output like:

validation_fraction=0.1, MSE: 3052.375
validation_fraction=0.2, MSE: 3052.375
validation_fraction=0.3, MSE: 3052.375

The key steps in this example are:

  1. Generate a synthetic regression dataset with noise
  2. Split the data into train and test sets
  3. Train GradientBoostingRegressor models with different validation_fraction values
  4. Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting validation_fraction:

Issues to consider:



See Also