-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Project producing zero columns #912
Comments
Martin, we are exploring if we can add constraints to the planner after using the Lale Project operators to customize the search space for the dataset's characteristics. If that works out, this has lower priority. However we very much would like the ability to project text. Thanks much! |
One thing that is not clear to me is what is the expected behaviour here. scikit-learn's answer is to explicitly fail because we are doing something that is not valid here. Do we want to automatically correct the pipeline in a data-dependent manner? Also +1 on text and maybe datetime. I wonder what pandas data types we can leverage here. |
It would be nice if the user could provide a pipeline with more preprocessing subpipelines than necessary. For example, if a pipeline contains a branch with one-hot encoding for string columns, but the data only has numeric columns, it would be convenient if it worked anyway. Unfortunately, some sklearn operators raise an exception when their input data has zero columns. This issue proposes preventing that exception during fit, and possibly even pruning them from the pipeline returned by fit.
Example:
This prints:
The text was updated successfully, but these errors were encountered: