SKLearner Home | About | Contact | Examples

Configure RandomForestRegressor "n_estimators" Parameter

The n_estimators parameter in scikit-learn’s RandomForestRegressor controls the number of decision trees in the ensemble.

Random Forest is an ensemble learning method that combines predictions from multiple decision trees to improve generalization performance. The n_estimators parameter determines how many trees are created in the forest.

Generally, using more trees leads to better performance, as it reduces the variance of the model without increasing the bias. However, there are diminishing returns and higher computational costs to using a very large number of trees.

The default value for n_estimators is 100. In practice, values between 100 and 1000 are commonly used depending on the size and complexity of the dataset.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
                       random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_estimators values
n_estimators_values = [10, 100, 500, 1000]
mse_scores = []

for n in n_estimators_values:
    rf = RandomForestRegressor(n_estimators=n, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"n_estimators={n}, MSE: {mse:.3f}")

The output from running this code would look like:

n_estimators=10, MSE: 225.568
n_estimators=100, MSE: 176.680
n_estimators=500, MSE: 172.514
n_estimators=1000, MSE: 171.448

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative features
  2. Split the data into train and test sets
  3. Train RandomForestRegressor models with different n_estimators values
  4. Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting n_estimators:

Issues to consider:



See Also