SKLearner Home | About | Contact | Examples

Scikit-Learn RANSACRegressor Model

RANSAC (RANdom SAmple Consensus) is a robust regression algorithm that iteratively fits a model to a subset of the data while identifying inliers and excluding outliers. This makes it ideal for regression problems with potential outliers.

The key hyperparameters of RANSACRegressor include the estimator (the model used for fitting), min_samples (minimum number of samples required to fit the model), residual_threshold (maximum residual for a sample to be classified as an inlier), and max_trials (maximum number of iterations for random sampling).

This algorithm is suitable for regression problems, especially when dealing with datasets containing outliers.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RANSACRegressor, LinearRegression
from sklearn.metrics import mean_absolute_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=1)

# introduce outliers
import numpy as np
np.random.seed(0)
n_outliers = 10
X[:n_outliers] = 3 + 0.5 * np.random.normal(size=(n_outliers, 1))
y[:n_outliers] = -3 + 10 * np.random.normal(size=n_outliers)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
ransac = RANSACRegressor(estimator=LinearRegression())

# fit model
ransac.fit(X_train, y_train)

# evaluate model
yhat = ransac.predict(X_test)
mae = mean_absolute_error(y_test, yhat)
print('Mean Absolute Error: %.3f' % mae)

# make a prediction
row = [[1.5]]
yhat = ransac.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Absolute Error: 15.119
Predicted: 125.165

The steps are as follows:

  1. First, a synthetic regression dataset is generated using the make_regression() function with added noise to simulate real-world data. Outliers are introduced manually to the dataset.

  2. The dataset is split into training and test sets using train_test_split().

  3. A RANSACRegressor model is instantiated with LinearRegression as the base estimator. The model is then fit on the training data using the fit() method.

  4. The performance of the model is evaluated by predicting on the test set and calculating the Mean Absolute Error (MAE) using mean_absolute_error().

  5. A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates the robustness of the RANSACRegressor in handling outliers effectively in a regression task. The model is fit to the inlier data, ignoring the outliers, leading to a more reliable regression model.



See Also