Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Python UDFs in Weld. #523

Open
kchasialis opened this issue May 9, 2022 · 0 comments
Open

Running Python UDFs in Weld. #523

kchasialis opened this issue May 9, 2022 · 0 comments

Comments

@kchasialis
Copy link

I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).

Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.

I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.

The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.

How can I apply a python UDF on a column of the dataset in an optimized way using Weld?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant