Configure Ridge "random_state" Parameter

The random_state parameter in scikit-learn’s Ridge class is used to control the pseudo-random number generation for reproducibility of results across multiple runs.

Ridge Regression is a linear regression technique that adds L2 regularization to ordinary least squares. The random_state parameter sets the seed of the pseudo-random number generator used when shuffling the data.

By default, random_state is set to None, which means the global random state from numpy.random is used. This can cause different results each time the model is run.

To ensure reproducibility, random_state should be set to an integer value. This will guarantee that the same results are generated each time the model is run with the same data and parameters.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score

# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 42, 42, 128]
r2_scores = []

for rs in random_state_values:
    ridge = Ridge(alpha=1.0, random_state=rs)
    ridge.fit(X_train, y_train)
    y_pred = ridge.predict(X_test)
    r2 = r2_score(y_test, y_pred)
    r2_scores.append(r2)
    print(f"random_state={rs}, R-squared: {r2:.3f}, Coefficients: {ridge.coef_}")

Running the example gives an output like:

random_state=None, R-squared: 0.800, Coefficients: [46.04348947]
random_state=42, R-squared: 0.800, Coefficients: [46.04348947]
random_state=42, R-squared: 0.800, Coefficients: [46.04348947]
random_state=128, R-squared: 0.800, Coefficients: [46.04348947]

The key steps in this example are:

Generate a synthetic regression dataset with noise
Split the data into train and test sets
Train Ridge models with different random_state values
Evaluate the R-squared of each model on the test set
Compare the model coefficients and scores

Some tips for setting random_state:

Use an integer value for reproducibility across runs
Models trained with the same random state and data will be identical

Issues to consider:

The default None value will produce different results each time
Consistent seeding is important when comparing models or for reproducibility

See Also