SKLearner Home | About | Contact | Examples

Configure DecisionTreeClassifier "random_state" Parameter

The random_state parameter in scikit-learn’s DecisionTreeClassifier controls the randomness of the model training process.

Decision trees involve making random choices at various points during training, such as selecting features to split on. Setting random_state to a fixed value ensures that the same random choices are made each time the model is trained, leading to reproducible results.

If random_state is not set (or set to None), the random choices will be different each time, resulting in slightly different models even with the same training data and parameters.

The default value for random_state is None.

In practice, it’s common to set random_state to an arbitrary fixed value (e.g., 42) to ensure reproducibility while still allowing for randomness in the model.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=10,
                           n_informative=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 42, 123, 456]
accuracies = []

for rs in random_state_values:
    dt = DecisionTreeClassifier(random_state=rs)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"random_state={rs}, Accuracy: {accuracy:.3f}")

The output would look like:

random_state=None, Accuracy: 0.890
random_state=42, Accuracy: 0.895
random_state=123, Accuracy: 0.895
random_state=456, Accuracy: 0.900

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train DecisionTreeClassifier models with different random_state values
  4. Evaluate the accuracy of each model on the test set

Tips and heuristics for setting random_state:

Issues to consider:



See Also