The flatten_transform
parameter in scikit-learn’s VotingClassifier
controls how the feature sets from individual classifiers are combined when using the transform
method.
VotingClassifier
is an ensemble method that combines predictions from multiple base classifiers. The flatten_transform
parameter affects how the transformed feature sets are structured when using the transform
method with voting='soft'
.
When flatten_transform=True
, the method returns a 2D array, where each classifier’s probabilities are concatenated horizontally. When False
, it returns a 3D array with separate probability arrays for each classifier.
The default value for flatten_transform
is True
.
In practice, the choice depends on how you plan to use the transformed data. True
is often more convenient for further processing, while False
can be useful for analyzing individual classifier outputs.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3,
n_informative=10, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = RandomForestClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)
# Create VotingClassifiers with different flatten_transform values
vc_flat = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
voting='soft', flatten_transform=True)
vc_3d = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
voting='soft', flatten_transform=False)
# Fit both VotingClassifiers
vc_flat.fit(X_train, y_train)
vc_3d.fit(X_train, y_train)
# Transform test set
X_flat = vc_flat.transform(X_test)
X_3d = vc_3d.transform(X_test)
print(f"Flattened shape: {X_flat.shape}")
print(f"3D shape: {X_3d.shape}")
# Evaluate models
y_pred_flat = vc_flat.predict(X_test)
y_pred_3d = vc_3d.predict(X_test)
print(f"Accuracy (flatten_transform=True): {accuracy_score(y_test, y_pred_flat):.3f}")
print(f"Accuracy (flatten_transform=False): {accuracy_score(y_test, y_pred_3d):.3f}")
Running the example gives an output like:
Flattened shape: (200, 9)
3D shape: (3, 200, 3)
Accuracy (flatten_transform=True): 0.850
Accuracy (flatten_transform=False): 0.850
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Create base classifiers (LogisticRegression, RandomForestClassifier, SVC)
- Initialize two VotingClassifier instances with different flatten_transform values
- Fit both VotingClassifier models
- Transform the test set using both models and compare output shapes
- Evaluate the accuracy of both models
Some tips for using flatten_transform
:
- Use
True
when you need a 2D array for further processing or as input to other sklearn estimators - Use
False
when you want to analyze or process each classifier’s probabilities separately - Consider memory usage:
True
may use less memory for large datasets or many classifiers
Issues to consider:
- The choice doesn’t affect the final predictions, only the structure of transformed data
- Using
False
with many classifiers or large datasets might lead to high memory usage - The parameter only has an effect when
voting='soft'
andtransform
method is used