SKLearner Home | About | Contact | Examples

Configure ExtraTreesClassifier "random_state" Parameter

The random_state parameter in scikit-learn’s ExtraTreesClassifier controls the random number generator used for various random operations within the model.

ExtraTreesClassifier is an ensemble method that builds multiple decision trees using randomized feature splitting. It combines predictions from these trees to make final classifications.

The random_state parameter ensures reproducibility of results by fixing the random number generation. When set to a specific integer, it guarantees that the same sequence of random numbers is generated each time the code is run.

By default, random_state is set to None, which means a different random seed is used each time the model is initialized. For reproducible results, it’s common to set random_state to a fixed integer value.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_states = [None, 42, 123, 456]
accuracies = []

for rs in random_states:
    etc = ExtraTreesClassifier(n_estimators=100, random_state=rs)
    etc.fit(X_train, y_train)
    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"random_state={rs}, Accuracy: {accuracy:.4f}")

Running the example gives an output like:

random_state=None, Accuracy: 0.8600
random_state=42, Accuracy: 0.8450
random_state=123, Accuracy: 0.8350
random_state=456, Accuracy: 0.8550

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train ExtraTreesClassifier models with different random_state values
  4. Evaluate the accuracy of each model on the test set

Some tips for setting random_state:

Issues to consider:



See Also