Configure VotingClassifier "flatten_transform" Parameter

The flatten_transform parameter in scikit-learn’s VotingClassifier controls how the feature sets from individual classifiers are combined when using the transform method.

VotingClassifier is an ensemble method that combines predictions from multiple base classifiers. The flatten_transform parameter affects how the transformed feature sets are structured when using the transform method with voting='soft'.

When flatten_transform=True, the method returns a 2D array, where each classifier’s probabilities are concatenated horizontally. When False, it returns a 3D array with separate probability arrays for each classifier.

The default value for flatten_transform is True.

In practice, the choice depends on how you plan to use the transformed data. True is often more convenient for further processing, while False can be useful for analyzing individual classifier outputs.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3,
                           n_informative=10, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = RandomForestClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Create VotingClassifiers with different flatten_transform values
vc_flat = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
                           voting='soft', flatten_transform=True)
vc_3d = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
                         voting='soft', flatten_transform=False)

# Fit both VotingClassifiers
vc_flat.fit(X_train, y_train)
vc_3d.fit(X_train, y_train)

# Transform test set
X_flat = vc_flat.transform(X_test)
X_3d = vc_3d.transform(X_test)

print(f"Flattened shape: {X_flat.shape}")
print(f"3D shape: {X_3d.shape}")

# Evaluate models
y_pred_flat = vc_flat.predict(X_test)
y_pred_3d = vc_3d.predict(X_test)

print(f"Accuracy (flatten_transform=True): {accuracy_score(y_test, y_pred_flat):.3f}")
print(f"Accuracy (flatten_transform=False): {accuracy_score(y_test, y_pred_3d):.3f}")

Running the example gives an output like:

Flattened shape: (200, 9)
3D shape: (3, 200, 3)
Accuracy (flatten_transform=True): 0.850
Accuracy (flatten_transform=False): 0.850

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Create base classifiers (LogisticRegression, RandomForestClassifier, SVC)
Initialize two VotingClassifier instances with different flatten_transform values
Fit both VotingClassifier models
Transform the test set using both models and compare output shapes
Evaluate the accuracy of both models

Some tips for using flatten_transform:

Use True when you need a 2D array for further processing or as input to other sklearn estimators
Use False when you want to analyze or process each classifier’s probabilities separately
Consider memory usage: True may use less memory for large datasets or many classifiers

Issues to consider:

The choice doesn’t affect the final predictions, only the structure of transformed data
Using False with many classifiers or large datasets might lead to high memory usage
The parameter only has an effect when voting='soft' and transform method is used

See Also