Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

Improve GROUP BY on columns that are sorted #414

Open
smola opened this issue Oct 5, 2018 · 2 comments
Open

Improve GROUP BY on columns that are sorted #414

smola opened this issue Oct 5, 2018 · 2 comments
Labels
proposal proposal for new additions or changes

Comments

@smola
Copy link
Collaborator

smola commented Oct 5, 2018

We have some cases where data has some notion of order. For example, AFAIK in gitbase repositories, all rows with the same repository_id will be together, so it's possible to output its result as soon as the repository_id changes.

Taking this into account, a lot of aggregations do not need to keep full results in memory.

@smola smola added the proposal proposal for new additions or changes label Oct 5, 2018
@erizocosmico
Copy link
Contributor

That is only true if parallelism is 1, otherwise rows are just returned as they come from several partitions.

@smola
Copy link
Collaborator Author

smola commented Oct 8, 2018

@erizocosmico Then we would need info about the particion each row cromes from 😕

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
proposal proposal for new additions or changes
Projects
None yet
Development

No branches or pull requests

2 participants