Theil-Sen Regression is a robust linear regression algorithm that is resistant to outliers. It is particularly useful when the dataset contains a significant amount of noise or outliers.
Key hyperparameters include:
max_subpopulation
: the maximum number of subpopulations considered in the estimator.n_subsamples
: the number of samples to draw.max_iter
: maximum number of iterations for the random sampling.tol
: tolerance to declare convergence.
The algorithm is appropriate for regression problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import TheilSenRegressor
from sklearn.metrics import mean_absolute_error
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = TheilSenRegressor()
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mae = mean_absolute_error(y_test, yhat)
print('Mean Absolute Error: %.3f' % mae)
# make a prediction
row = [[0.5, -1.2, 0.3, 0.8, 1.5]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
Mean Absolute Error: 0.191
Predicted: 5.554
The steps are as follows:
First, a synthetic regression dataset is generated using the
make_regression()
function. This creates a dataset with a specified number of samples (n_samples
), features (n_features
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, a
TheilSenRegressor
model is instantiated with default hyperparameters. The model is then fit on the training data using thefit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the mean absolute error metric.A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to set up and use a TheilSenRegressor
model for regression tasks, highlighting the robustness of this algorithm in handling datasets with outliers and noise.