SKLearner Home | About | Contact | Examples

Configure VotingClassifier "flatten_transform" Parameter

The flatten_transform parameter in scikit-learn’s VotingClassifier controls how the feature sets from individual classifiers are combined when using the transform method.

VotingClassifier is an ensemble method that combines predictions from multiple base classifiers. The flatten_transform parameter affects how the transformed feature sets are structured when using the transform method with voting='soft'.

When flatten_transform=True, the method returns a 2D array, where each classifier’s probabilities are concatenated horizontally. When False, it returns a 3D array with separate probability arrays for each classifier.

The default value for flatten_transform is True.

In practice, the choice depends on how you plan to use the transformed data. True is often more convenient for further processing, while False can be useful for analyzing individual classifier outputs.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3,
                           n_informative=10, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = RandomForestClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Create VotingClassifiers with different flatten_transform values
vc_flat = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
                           voting='soft', flatten_transform=True)
vc_3d = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
                         voting='soft', flatten_transform=False)

# Fit both VotingClassifiers
vc_flat.fit(X_train, y_train)
vc_3d.fit(X_train, y_train)

# Transform test set
X_flat = vc_flat.transform(X_test)
X_3d = vc_3d.transform(X_test)

print(f"Flattened shape: {X_flat.shape}")
print(f"3D shape: {X_3d.shape}")

# Evaluate models
y_pred_flat = vc_flat.predict(X_test)
y_pred_3d = vc_3d.predict(X_test)

print(f"Accuracy (flatten_transform=True): {accuracy_score(y_test, y_pred_flat):.3f}")
print(f"Accuracy (flatten_transform=False): {accuracy_score(y_test, y_pred_3d):.3f}")

Running the example gives an output like:

Flattened shape: (200, 9)
3D shape: (3, 200, 3)
Accuracy (flatten_transform=True): 0.850
Accuracy (flatten_transform=False): 0.850

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Create base classifiers (LogisticRegression, RandomForestClassifier, SVC)
  4. Initialize two VotingClassifier instances with different flatten_transform values
  5. Fit both VotingClassifier models
  6. Transform the test set using both models and compare output shapes
  7. Evaluate the accuracy of both models

Some tips for using flatten_transform:

Issues to consider:



See Also