SKLearner Home | About | Contact | Examples

Get LogisticRegression "n_features_in_" Attribute

The LogisticRegression algorithm is a linear model used for binary classification tasks. It estimates the probability that a given input belongs to a particular class and makes predictions based on this probability.

The n_features_in_ attribute of a fitted LogisticRegression model indicates the number of features seen during the fit of the model. This attribute is set when the fit method is called and provides a simple way to verify the dimensionality of the input data.

Accessing the n_features_in_ attribute is useful to confirm the number of features that were used to train the model, ensuring that the data preprocessing steps were correctly applied. This can be particularly important when dealing with datasets that have undergone feature selection or transformation.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42, shuffle=False)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize a LogisticRegression model
lr = LogisticRegression(random_state=42)

# Fit the model on the training data
lr.fit(X_train, y_train)

# Access and print the n_features_in_ attribute
print(f"Number of features seen during fit: {lr.n_features_in_}")

Running the example gives an output like:

Number of features seen during fit: 10

The key steps in this example are:

  1. Generate a synthetic binary classification dataset using make_classification with a predefined number of features.
  2. Split the dataset into training and testing sets using train_test_split.
  3. Initialize a LogisticRegression model and fit it on the training data.
  4. Access the n_features_in_ attribute of the fitted model to get the number of input features.
  5. Print the value of n_features_in_ to confirm the dimensionality of the input data seen by the model.


See Also