Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashJoin has problematic interaction with Merge #61

Open
oscar-stripe opened this issue Feb 2, 2018 · 5 comments
Open

HashJoin has problematic interaction with Merge #61

oscar-stripe opened this issue Feb 2, 2018 · 5 comments

Comments

@oscar-stripe
Copy link

oscar-stripe commented Feb 2, 2018

see this graph on 3.2.1
https://www.dropbox.com/s/iffadh9x7unrg5w/01-BalanceAssembly-init.dot.png?dl=0

You can see the full planner logs here:
https://www.dropbox.com/s/7qyc4a9pxtstwio/E552D2.tgz?dl=0\

We are merging two HashJoins after some Each operations. In this particular graph, it is possible to fix the issue by adding Checkpoints after all but one of the HashJoins it seems. This is not a great solution since even knowing what a graph will look like when you combine many pipes with functions is not very clear.

It would be great to have either a clear rule that we need to follow in generating the graphs, or to remove this restriction since we would like to using cascading 3 in scalding by default.

Thanks.

@oscar-stripe oscar-stripe changed the title HashJoin have problematic interaction with Merge HashJoin has problematic interaction with Merge Feb 2, 2018
@cwensel
Copy link
Owner

cwensel commented Feb 2, 2018

I can reproduce this on MapReduce and have a tentative fix, but the new tests also fail on Tez, so will need more time to find a resolution.

This fix for MR is here: https://gist.github.com/cwensel/0b116bc7196af667d736f552ab8cb358

@oscar-stripe
Copy link
Author

@cwensel thanks a ton for looking at this!

@cwensel
Copy link
Owner

cwensel commented Feb 3, 2018

I went ahead and pushed the fix for MR (it worked fine in local). I’ll have to address the Tez issue at a later date, after any additional MR issues are resolved. Should see 3.3-wip-18 in 9 hours or so assuming no test failures.

Keep an eye out here: http://conjars.org/search?q=cascading-3.3

@cwensel
Copy link
Owner

cwensel commented Feb 3, 2018

Also watch ‘recent versions’ here, they show when published: http://conjars.org/cascading/cascading-core

@cwensel
Copy link
Owner

cwensel commented Feb 8, 2018

if resolved for MR, feel free to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants