-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does vecstack.StackingTransformer
differ from sklearn.ensemble.StackingClassifier
?
#37
Comments
Hi, Both classes are based on the same algorithm described in the paper Stacked Generalization by David H. Wolpert. But each of them has very different conceptual implementation and application. The most important difference is transformer vs. predictor architecture.
Below I put together self-contained example which depicts the common ground between two implementations (where results are exactly the same). You can easily iterate over it to compare other different aspects which are important for your use cases: import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.pipeline import Pipeline
from vecstack import StackingTransformer
X, y = make_classification(n_samples=500, n_features=5,
n_informative=3, n_redundant=1,
n_classes=3, flip_y=0, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=0)
estimators = [
('et', ExtraTreesClassifier(n_estimators=100, random_state=0)),
('rf', RandomForestClassifier(n_estimators=100, random_state=0))]
final_estimator = LogisticRegression(random_state=0)
#-------------------------------------------------------------------------------
# vecstack.StackingTransformer
#-------------------------------------------------------------------------------
stack = StackingTransformer(estimators=estimators,
regression=False,
variant='B',
n_folds=5,
shuffle=False,
stratified=True,
needs_proba=True)
steps = [('stack', stack),
('final_estimator', final_estimator)]
pipe = Pipeline(steps)
y_pred_vecstack = pipe.fit(X_train, y_train).predict_proba(X_test)
#-------------------------------------------------------------------------------
# sklearn.ensemble.StackingClassifier
#-------------------------------------------------------------------------------
clf = StackingClassifier(estimators=estimators,
final_estimator=final_estimator,
stack_method='predict_proba')
y_pred_sklearn = clf.fit(X_train, y_train).predict_proba(X_test)
print((y_pred_vecstack == y_pred_sklearn).all()) # True
#-------------------------------------------------------------------------------
# Compare transformation
#-------------------------------------------------------------------------------
S_test_vecstack = stack.transform(X_test)
S_test_sklearn = clf.transform(X_test)
print((S_test_vecstack == S_test_sklearn).all()) # True
S_train_vecstack = stack.transform(X_train)
S_train_sklearn = clf.transform(X_train)
print((S_train_vecstack == S_train_sklearn).all()) # False
et = ExtraTreesClassifier(random_state=0, n_estimators=100)
rf = RandomForestClassifier(random_state=0, n_estimators=100)
y_pred_et = et.fit(X_train, y_train).predict_proba(X_train)
y_pred_rf = rf.fit(X_train, y_train).predict_proba(X_train)
print((S_train_sklearn == np.hstack([y_pred_et, y_pred_rf])).all()) # True |
This is really informative! Thank you for the writeup, and thank you for the great package! (I think your comment would be a great addition to the readme btw). Have you considered adding you package to sklearn-contrib? |
|
Thanks for your kind words! |
This might be useful to add to the readme
The text was updated successfully, but these errors were encountered: