SKLearner Home | About | Contact | Examples

Configure DecisionTreeClassifier "splitter" Parameter

The splitter parameter in scikit-learn’s DecisionTreeClassifier controls the strategy used for splitting nodes when building the decision tree.

Decision Trees are a non-parametric supervised learning method used for classification and regression. The splitter parameter determines how the splits are made at each node.

The default value for splitter is “best”, which chooses the best split based on impurity criteria (Gini for classification, MSE for regression). The alternative value is “random”, which selects random splits.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different splitter values
splitter_values = ["best", "random"]
accuracies = []

for splitter in splitter_values:
    dt = DecisionTreeClassifier(splitter=splitter, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"splitter={splitter}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

splitter=best, Accuracy: 0.875
splitter=random, Accuracy: 0.815

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train DecisionTreeClassifier models with different splitter values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting splitter:

Issues to consider:



See Also