Multi-task regression involves predicting multiple targets simultaneously. In scikit-learn, MultiTaskLasso
and MultiTaskLassoCV
are two algorithms designed for this task. This example compares their performance and highlights their key differences.
MultiTaskLasso
is a linear model trained with L1 regularization, which encourages sparsity in the model weights. Key hyperparameters include alpha
(regularization strength) and fit_intercept
(whether to calculate the intercept for the model). Tuning alpha
manually can be time-consuming and requires domain knowledge.
MultiTaskLassoCV
automates the hyperparameter tuning process using cross-validation. Its key hyperparameters include alphas
(list of alphas to try) and cv
(number of cross-validation folds). This automation ensures the best alpha
is selected but increases computational cost due to training multiple models during cross-validation.
The main difference between these algorithms is that MultiTaskLasso
requires manual tuning of alpha
, while MultiTaskLassoCV
automates this process. MultiTaskLasso
is ideal for quick prototyping when you have prior knowledge of good alpha
values, whereas MultiTaskLassoCV
is preferred for thorough model selection and hyperparameter tuning, especially with new datasets.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import MultiTaskLasso, MultiTaskLassoCV
from sklearn.metrics import mean_squared_error
# Generate synthetic multi-target regression dataset
X, y = make_regression(n_samples=1000, n_targets=3, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit and evaluate MultiTaskLasso with default alpha
mt_lasso = MultiTaskLasso(alpha=1.0, random_state=42)
mt_lasso.fit(X_train, y_train)
y_pred_mt_lasso = mt_lasso.predict(X_test)
print(f"MultiTaskLasso MSE: {mean_squared_error(y_test, y_pred_mt_lasso):.3f}")
# Fit and evaluate MultiTaskLassoCV with cross-validation
mt_lasso_cv = MultiTaskLassoCV(cv=5, random_state=42)
mt_lasso_cv.fit(X_train, y_train)
y_pred_mt_lasso_cv = mt_lasso_cv.predict(X_test)
print(f"\nMultiTaskLassoCV MSE: {mean_squared_error(y_test, y_pred_mt_lasso_cv):.3f}")
print(f"Best alpha: {mt_lasso_cv.alpha_}")
Running the example gives an output like:
MultiTaskLasso MSE: 3.530
MultiTaskLassoCV MSE: 0.054
Best alpha: 0.1131858327284376
The steps are as follows:
- Generate a synthetic multi-target regression dataset using
make_regression
. - Split the data into training and testing sets using
train_test_split
. - Instantiate
MultiTaskLasso
with a defaultalpha
, fit it on the training data, and evaluate its performance on the test set. - Instantiate
MultiTaskLassoCV
with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set. - Compare the test set performance (MSE) of both models and print the best
alpha
found byMultiTaskLassoCV
.