take() got an unexpected keyword argument 'axis' #84

JiaLeXian · 2021-05-29T06:55:19Z

Got error with code:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from deepforest import CascadeForestClassifier

model = CascadeForestClassifier(random_state=1)
model.fit(X_train, y_train)

TypeError Traceback (most recent call last)
in
6
7 model = CascadeForestClassifier(random_state=1)
----> 8 model.fit(X_train, y_train.values.ravel())

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/cascade.py in fit(self, X, y, sample_weight)
1395 y = self._encode_class_labels(y)
1396
-> 1397 super().fit(X, y, sample_weight)
1398
1399 def predict_proba(self, X):

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/cascade.py in fit(self, X, y, sample_weight)
754
755 # Bin the training data
--> 756 X_train_ = self.bin_data(binner, X, is_training_data=True)
757 X_train_ = self.buffer_.cache_data(0, X_train_, is_training_data=True)
758

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/cascade.py in _bin_data(self, binner, X, is_training_data)
665 tic = time.time()
666 if is_training_data:
--> 667 X_binned = binner.fit_transform(X)
668 else:
669 X_binned = binner.transform(X)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
697 if y is None:
698 # fit method of arity 1 (unsupervised transformation)
--> 699 return self.fit(X, **fit_params).transform(X)
700 else:
701 # fit method of arity 2 (supervised transformation)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/_binner.py in fit(self, X)
128 self.validate_params()
129
--> 130 self.bin_thresholds = _find_binning_thresholds(
131 X,
132 self.n_bins - 1,

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deepforest/_binner.py in _find_binning_thresholds(X, n_bins, bin_subsample, bin_type, random_state)
75 if n_samples > bin_subsample:
76 subset = rng.choice(np.arange(n_samples), bin_subsample, replace=False)
---> 77 X = X.take(subset, axis=0)
78
79 binning_thresholds = []

TypeError: take() got an unexpected keyword argument 'axis'

Dataset is loaded with vaex, is this a problem particular for vaex?

xuyxu · 2021-05-29T13:00:07Z

Hi @JiaLeXian, thanks for reporting! I will take a look at vaex when get a moment. For now, you can manually convert your data into the form of numpy array in order to use deep fprest.

xuyxu · 2021-05-30T02:53:19Z

It looks like vaex does not support slicing (vaexio/vaex#911), which is an essential operation in deep forest, e.g., bootstrap sampling when building random forests. At least for now, this problem cannot be solved :-(

Thanks for reporting anyway.

JiaLeXian · 2021-06-01T23:04:43Z

Hi @xuyxu, thanks for investigating the problem. Appreciated! So, for DF, it's best to use numpy array or original pandas dataframe?

In our case, we have more than 100 million rows of data. That's why we use vaex to load the data to reduce memory occupation. We still want to try DF on our dataset. We will explore other ways to try. Thank you!

xuyxu · 2021-06-02T01:45:43Z

Could you take a look at numpy.memmap, it looks like there is also no need to load the entire dataset into the memory with memmap.

Besides, feel free to tell me if you have any problem when trying out this solution ;-). We are willing to further improve the functionality of DF when faced with such large datasets.

JiaLeXian · 2021-06-02T06:32:26Z

@xuyxu thanks for the quick reply. Thanks for suggesting numpy.memmap. We will try this option in the following days. Will keep you posted. Thank you!

xuyxu added the needtriage Further information is requested label May 29, 2021

xuyxu added wontfix This will not be worked on and removed needtriage Further information is requested labels May 30, 2021

xuyxu added enhancement Miscellaneous improvements and removed wontfix This will not be worked on labels Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

take() got an unexpected keyword argument 'axis' #84

take() got an unexpected keyword argument 'axis' #84

JiaLeXian commented May 29, 2021

xuyxu commented May 29, 2021 •

edited

Loading

xuyxu commented May 30, 2021

JiaLeXian commented Jun 1, 2021

xuyxu commented Jun 2, 2021

JiaLeXian commented Jun 2, 2021

take() got an unexpected keyword argument 'axis' #84

take() got an unexpected keyword argument 'axis' #84

Comments

JiaLeXian commented May 29, 2021

xuyxu commented May 29, 2021 • edited Loading

xuyxu commented May 30, 2021

JiaLeXian commented Jun 1, 2021

xuyxu commented Jun 2, 2021

JiaLeXian commented Jun 2, 2021

xuyxu commented May 29, 2021 •

edited

Loading