Get LogisticRegression "feature_names_in_" Attribute

LogisticRegression is a linear model used for binary classification. It models the probability of the default class using a logistic function.

The feature_names_in_ attribute of a fitted LogisticRegression model stores the feature names that were passed during model fitting. This is useful for mapping coefficients or understanding feature importance.

Accessing feature_names_in_ helps in interpreting the model by relating coefficients back to the original feature names. This is crucial when analyzing the model’s decisions and presenting results.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd

# Generate a synthetic binary classification dataset with named features
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
                           random_state=42, shuffle=False)
feature_names = ['feature1', 'feature2', 'feature3', 'feature4']
X = pd.DataFrame(X, columns=feature_names)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and fit the LogisticRegression model
lr = LogisticRegression()
lr.fit(X_train, y_train)

# Access and print the feature_names_in_ attribute
print(f"Feature names used in model fitting: {lr.feature_names_in_}")

Running the example gives an output like:

Feature names used in model fitting: ['feature1' 'feature2' 'feature3' 'feature4']

The key steps in this example are:

Generate a synthetic binary classification dataset using make_classification and create a DataFrame with named features.
Split the dataset into training and testing sets using train_test_split.
Initialize and fit a LogisticRegression model on the training data.
Access and print the feature_names_in_ attribute to see the feature names used in model fitting.

See Also