The random_state parameter in scikit-learn’s Ridge class is used to control the pseudo-random number generation for reproducibility of results across multiple runs.
Ridge Regression is a linear regression technique that adds L2 regularization to ordinary least squares. The random_state parameter sets the seed of the pseudo-random number generator used when shuffling the data.
By default, random_state is set to None, which means the global random state from numpy.random is used. This can cause different results each time the model is run.
To ensure reproducibility, random_state should be set to an integer value. This will guarantee that the same results are generated each time the model is run with the same data and parameters.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score
# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different random_state values
random_state_values = [None, 42, 42, 128]
r2_scores = []
for rs in random_state_values:
ridge = Ridge(alpha=1.0, random_state=rs)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)
r2 = r2_score(y_test, y_pred)
r2_scores.append(r2)
print(f"random_state={rs}, R-squared: {r2:.3f}, Coefficients: {ridge.coef_}")
Running the example gives an output like:
random_state=None, R-squared: 0.800, Coefficients: [46.04348947]
random_state=42, R-squared: 0.800, Coefficients: [46.04348947]
random_state=42, R-squared: 0.800, Coefficients: [46.04348947]
random_state=128, R-squared: 0.800, Coefficients: [46.04348947]
The key steps in this example are:
- Generate a synthetic regression dataset with noise
- Split the data into train and test sets
- Train
Ridgemodels with differentrandom_statevalues - Evaluate the R-squared of each model on the test set
- Compare the model coefficients and scores
Some tips for setting random_state:
- Use an integer value for reproducibility across runs
- Models trained with the same random state and data will be identical
Issues to consider:
- The default
Nonevalue will produce different results each time - Consistent seeding is important when comparing models or for reproducibility