SKLearner Home | About | Contact | Examples

Configure DecisionTreeRegressor "splitter" Parameter

The splitter parameter in scikit-learn’s DecisionTreeRegressor controls the strategy used to split nodes when building the tree.

Decision Tree Regression is a non-parametric supervised learning algorithm that learns a hierarchy of if-then-else decision rules to predict a target variable. The splitter parameter determines how the splits are made at each node.

The splitter parameter can be set to either "best" or "random". When set to "best", the algorithm chooses the best split based on a criterion such as mean squared error (MSE) or mean absolute error (MAE). When set to "random", the algorithm selects a random split from the top max_features features.

The default value for splitter is "best", which generally leads to better performance but may overfit on some datasets.

Using "random" can help reduce overfitting by introducing randomness in the tree-building process, but it may also lead to lower performance.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different splitter values
splitter_values = ["best", "random"]
mse_scores = []

for splitter in splitter_values:
    dt = DecisionTreeRegressor(splitter=splitter, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"splitter='{splitter}', MSE: {mse:.3f}")

Running the example gives an output like:

splitter='best', MSE: 6350.428
splitter='random', MSE: 6464.066

The key steps in this example are:

  1. Generate a synthetic regression dataset with noise
  2. Split the data into train and test sets
  3. Train DecisionTreeRegressor models with different splitter values
  4. Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting splitter:

Issues to consider:



See Also