Configure ExtraTreesClassifier "warm_start" Parameter

The warm_start parameter in scikit-learn’s ExtraTreesClassifier allows for incremental learning by adding trees to an existing forest.

Extra Trees Classifier is an ensemble method that builds a forest of unpruned decision trees. It’s similar to Random Forest but with two key differences: it splits nodes by choosing cut-points fully at random and uses the whole learning sample to grow the trees.

The warm_start parameter, when set to True, allows you to fit additional trees to an existing forest, rather than creating a new forest from scratch each time you fit the model.

By default, warm_start is set to False. It’s commonly set to True when you want to incrementally train your model on new data without discarding previously learned information.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=0, random_state=42)

# Split into initial training set and new data
X_initial, X_new, y_initial, y_new = train_test_split(X, y, test_size=0.5, random_state=42)

# Create ExtraTreesClassifier with warm_start=False
clf = ExtraTreesClassifier(n_estimators=50, random_state=42, warm_start=False)

# Train on initial data and evaluate
clf.fit(X_initial, y_initial)
y_pred = clf.predict(X_new)
initial_accuracy = accuracy_score(y_new, y_pred)
initial_n_trees = len(clf.estimators_)

print(f"Initial training - Accuracy: {initial_accuracy:.3f}, Trees: {initial_n_trees}")

# Set warm_start=True and add more trees
clf.set_params(warm_start=True, n_estimators=100)
clf.fit(X_new, y_new)
y_pred = clf.predict(X_new)
final_accuracy = accuracy_score(y_new, y_pred)
final_n_trees = len(clf.estimators_)

print(f"After incremental learning - Accuracy: {final_accuracy:.3f}, Trees: {final_n_trees}")

Running the example gives an output like:

Initial training - Accuracy: 0.912, Trees: 50
After incremental learning - Accuracy: 1.000, Trees: 100

The key steps in this example are:

Generate a synthetic classification dataset
Split the data into an initial training set and new data
Train an ExtraTreesClassifier with warm_start=False on the initial data
Set warm_start=True and train on new data, adding more trees
Compare the accuracy and number of trees before and after incremental learning

Some tips for using warm_start:

Use warm_start=True when you want to add trees to an existing forest
Increase n_estimators to specify how many trees to add
Monitor performance to ensure the model is still improving with new data

Issues to consider:

Memory usage increases with the number of trees
The model may become biased towards more recently seen data
It’s important to shuffle your data when using warm_start to avoid order-dependent results

See Also