OHE: allow encoding of specific, user desired categories #303

solegalli · 2021-08-30T13:22:10Z

As per this thread, the user may want to encode certain categories, that may not be the most frequent.

Morgan-Sell · 2022-02-19T01:21:12Z

Hi @solegalli,

This looks interesting!

Are you envisioning that OHE allows the following functionality?

init has a param called variables_w_new_category that represents which variables may contain the new category(ies).
new_categories - a list of categories that should be encoded in the variables included in the variables_w_new_category
the new_categories values would be added as values to the respective keys listed in variables_w_new_category in self.encoder_dict_

Morgan-Sell · 2022-12-06T16:00:49Z

@solegalli, should this task be closed? It seems like task #403 resolved this issue

solegalli · 2022-12-07T10:50:39Z

#403 is in essence asking for the same functionality. That's probably why I closed it. I flagged it as duped now. Still open.

Morgan-Sell · 2023-04-18T20:12:01Z

@solegalli, resurrecting this issue ;)

When someone selects this functionality, do we want to limit the user to one variable?

I imagine that the user will select values that are specific to one variable. It seems odd for multiple categorical variables to have the same values.

solegalli · 2023-04-20T14:07:41Z

I think the most straight forward would be to add a new parameter, or perhaps even better, extend top_categories to take a dictionary with the variable as key and the categories to encode as values. Then, for each variable, the transformer will create dummies only for the categories indicated by the user.

Will you pick this one up?

Morgan-Sell · 2023-04-23T16:41:42Z

@solegalli,

I like the idea of using a dictionary. However, I'm unsure if the dictionary should be accepted by top_categories.

Would it be cleaner to have a separate param called custom_categories? We would check that both top_categories and custom_categories do not have values.

solegalli · 2023-04-27T08:28:15Z

Sounds good to me!

Morgan-Sell mentioned this issue Feb 24, 2022

time series forecasting: window features #343

Closed

solegalli mentioned this issue Mar 30, 2022

Adding default categories in OneHotEncoder #403

Closed

Morgan-Sell linked a pull request Apr 28, 2023 that will close this issue

OHE: User can select which categories to encoded for selected variables #667

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OHE: allow encoding of specific, user desired categories #303

OHE: allow encoding of specific, user desired categories #303

solegalli commented Aug 30, 2021

Morgan-Sell commented Feb 19, 2022

Morgan-Sell commented Dec 6, 2022

solegalli commented Dec 7, 2022

Morgan-Sell commented Apr 18, 2023

solegalli commented Apr 20, 2023

Morgan-Sell commented Apr 23, 2023

solegalli commented Apr 27, 2023

OHE: allow encoding of specific, user desired categories #303

OHE: allow encoding of specific, user desired categories #303

Comments

solegalli commented Aug 30, 2021

Morgan-Sell commented Feb 19, 2022

Morgan-Sell commented Dec 6, 2022

solegalli commented Dec 7, 2022

Morgan-Sell commented Apr 18, 2023

solegalli commented Apr 20, 2023

Morgan-Sell commented Apr 23, 2023

solegalli commented Apr 27, 2023