Configure ExtraTreesClassifier "random_state" Parameter

The random_state parameter in scikit-learn’s ExtraTreesClassifier controls the random number generator used for various random operations within the model.

ExtraTreesClassifier is an ensemble method that builds multiple decision trees using randomized feature splitting. It combines predictions from these trees to make final classifications.

The random_state parameter ensures reproducibility of results by fixing the random number generation. When set to a specific integer, it guarantees that the same sequence of random numbers is generated each time the code is run.

By default, random_state is set to None, which means a different random seed is used each time the model is initialized. For reproducible results, it’s common to set random_state to a fixed integer value.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_states = [None, 42, 123, 456]
accuracies = []

for rs in random_states:
    etc = ExtraTreesClassifier(n_estimators=100, random_state=rs)
    etc.fit(X_train, y_train)
    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"random_state={rs}, Accuracy: {accuracy:.4f}")

Running the example gives an output like:

random_state=None, Accuracy: 0.8600
random_state=42, Accuracy: 0.8450
random_state=123, Accuracy: 0.8350
random_state=456, Accuracy: 0.8550

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train ExtraTreesClassifier models with different random_state values
Evaluate the accuracy of each model on the test set

Some tips for setting random_state:

Use a fixed integer value for reproducibility in research or production environments
Experiment with different random states to assess model stability
Keep the random state consistent across model comparisons for fair evaluations

Issues to consider:

Different random states can lead to variations in model performance
Using None as the random state may result in different outcomes each run
The impact of random state can vary depending on dataset characteristics and model parameters

See Also