The LogisticRegression
algorithm is a linear model used for binary classification tasks. It estimates the probability that a given input belongs to a particular class and makes predictions based on this probability.
The n_features_in_
attribute of a fitted LogisticRegression
model indicates the number of features seen during the fit of the model. This attribute is set when the fit
method is called and provides a simple way to verify the dimensionality of the input data.
Accessing the n_features_in_
attribute is useful to confirm the number of features that were used to train the model, ensuring that the data preprocessing steps were correctly applied. This can be particularly important when dealing with datasets that have undergone feature selection or transformation.
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42, shuffle=False)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a LogisticRegression model
lr = LogisticRegression(random_state=42)
# Fit the model on the training data
lr.fit(X_train, y_train)
# Access and print the n_features_in_ attribute
print(f"Number of features seen during fit: {lr.n_features_in_}")
Running the example gives an output like:
Number of features seen during fit: 10
The key steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification
with a predefined number of features. - Split the dataset into training and testing sets using
train_test_split
. - Initialize a
LogisticRegression
model and fit it on the training data. - Access the
n_features_in_
attribute of the fitted model to get the number of input features. - Print the value of
n_features_in_
to confirm the dimensionality of the input data seen by the model.