The multi_class parameter in LogisticRegression controls the strategy used to handle multiclass classification problems.
Logistic Regression is a linear model used for binary and multiclass classification problems. It predicts the probability of different classes based on a linear combination of input features.
The multi_class parameter in LogisticRegression specifies the strategy to use when handling multiclass classification problems.
The default value is auto, which chooses ovr if the data is binary and multinomial if it is multiclass and the solver supports it. Common values include ovr (one-vs-rest) and multinomial (multinomial logistic regression).
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
n_redundant=0, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different multi_class values
multi_class_values = ['ovr', 'multinomial']
accuracies = []
for multi_class in multi_class_values:
lr = LogisticRegression(multi_class=multi_class, solver='lbfgs', max_iter=200, random_state=42)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"multi_class={multi_class}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
multi_class=ovr, Accuracy: 0.695
multi_class=multinomial, Accuracy: 0.715
Note that the multi_class parameter is deprecated. You may get a warning like:
FutureWarning: 'multi_class' was deprecated in version 1.5 and will be removed in 1.7. Use OneVsRestClassifier(LogisticRegression(..)) instead. Leave it to its default value to avoid this warning.
The key steps in this example are:
- Generate a synthetic multiclass classification dataset.
- Split the data into training and testing sets.
- Train
LogisticRegressionmodels with differentmulti_classvalues. - Evaluate the accuracy of each model on the test set.
Some tips and heuristics for setting multi_class:
ovris suitable for binary and multiclass classification but can be less accurate for multiclass problems compared tomultinomial.multinomialis preferable for multiclass problems if the solver supports it, as it considers the joint probability of all classes.- Ensure the solver used (
lbfgsin this case) supports themultinomialoption.
Issues to consider:
- The choice between
ovrandmultinomialmay affect computational efficiency and memory usage. - For highly imbalanced datasets, the choice of
multi_classstrategy can impact model performance significantly. - Check the compatibility of the
multi_classparameter with the solver being used.