SKLearner Home | About | Contact | Examples

Configure ExtraTreesRegressor "random_state" Parameter

The random_state parameter in scikit-learn’s ExtraTreesRegressor controls the randomness of the model, affecting both the sampling of features and data points for each tree.

ExtraTreesRegressor is an ensemble method that builds multiple decision trees and combines their predictions to improve performance and reduce overfitting. It’s similar to Random Forest but uses random thresholds for splitting features.

The random_state parameter ensures reproducibility of results. When set to a fixed value, it guarantees that the model will produce the same results given the same input data and parameters.

The default value for random_state is None, which means the regressor will use a random number generator. In practice, it’s common to set random_state to a fixed integer (e.g., 42) for reproducibility.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 0, 42, 100]
mse_scores = []

for rs in random_state_values:
    etr = ExtraTreesRegressor(n_estimators=100, random_state=rs)
    etr.fit(X_train, y_train)
    y_pred = etr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"random_state={rs}, MSE: {mse:.4f}")

# Check variability with random_state=None
none_mse_scores = []
for _ in range(5):
    etr = ExtraTreesRegressor(n_estimators=100, random_state=None)
    etr.fit(X_train, y_train)
    y_pred = etr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    none_mse_scores.append(mse)

print(f"random_state=None, MSE range: {min(none_mse_scores):.4f} - {max(none_mse_scores):.4f}")

Running the example gives an output like:

random_state=None, MSE: 1948.6322
random_state=0, MSE: 1916.8944
random_state=42, MSE: 2036.1826
random_state=100, MSE: 1906.2123
random_state=None, MSE range: 1886.5176 - 2018.3518

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train ExtraTreesRegressor models with different random_state values
  4. Evaluate the Mean Squared Error (MSE) of each model on the test set
  5. Demonstrate the variability of results when random_state=None

Some tips for setting random_state:

Issues to consider:



See Also