LogisticRegression
is a linear model used for binary classification. It models the probability of the default class using a logistic function.
The feature_names_in_
attribute of a fitted LogisticRegression
model stores the feature names that were passed during model fitting. This is useful for mapping coefficients or understanding feature importance.
Accessing feature_names_in_
helps in interpreting the model by relating coefficients back to the original feature names. This is crucial when analyzing the model’s decisions and presenting results.
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd
# Generate a synthetic binary classification dataset with named features
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
random_state=42, shuffle=False)
feature_names = ['feature1', 'feature2', 'feature3', 'feature4']
X = pd.DataFrame(X, columns=feature_names)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and fit the LogisticRegression model
lr = LogisticRegression()
lr.fit(X_train, y_train)
# Access and print the feature_names_in_ attribute
print(f"Feature names used in model fitting: {lr.feature_names_in_}")
Running the example gives an output like:
Feature names used in model fitting: ['feature1' 'feature2' 'feature3' 'feature4']
The key steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification
and create a DataFrame with named features. - Split the dataset into training and testing sets using
train_test_split
. - Initialize and fit a
LogisticRegression
model on the training data. - Access and print the
feature_names_in_
attribute to see the feature names used in model fitting.