You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I have many documents (200M+) with complex hierarchical structure. It cannot be expressed with nested structures like array due to limitations of indexing support. Also it doesn't make sense to search using all fields of these structures. So natural way to decouple would be to use parent-child relationship. Unfortunately, parent documents are always copied globally which means huge overhead on content nodes.
Describe the solution you'd like
Some way of describing sharding logic where parent IDs are used as keys. This lets us not copy all parent documents to all content nodes.
Describe alternatives you've considered
Considered all alternatives available today, each of them has certain limitations:
nested structures
parent-child with fewer beafy content nodes to reduce storage overhead
full denormalisation.
The text was updated successfully, but these errors were encountered:
This has been considered before and does make sense. We need to use some variant of document id's with groups and distribute the same groups to the same nodes across those document types. I think the main issue with it is that we'll easily end up with badly balanced clusters.
This has been considered before and does make sense. We need to use some variant of document id's with groups and distribute the same groups to the same nodes across those document types. I think the main issue with it is that we'll easily end up with badly balanced clusters.
Can be done in theory with consistent hashing. Zookeeper is already used in vespa anyway.
Alternatively this concern can be shifted to the user. E.g. "you've been warned, this grouping is supposed to be used with well balanced data"
Yes, we do use something similar to consistent hashing in Vespa (the CRUSH algorithm), but here we need to distribute each group to a limited set of nodes to avoid needing to place all global documents on all nodes, while we have no control over the size of each group. I'm not sure how well this can be solved but it for sure adds new complexity to the balancing problem.
Is your feature request related to a problem? Please describe.
I have many documents (200M+) with complex hierarchical structure. It cannot be expressed with nested structures like array due to limitations of indexing support. Also it doesn't make sense to search using all fields of these structures. So natural way to decouple would be to use parent-child relationship. Unfortunately, parent documents are always copied globally which means huge overhead on content nodes.
Describe the solution you'd like
Some way of describing sharding logic where parent IDs are used as keys. This lets us not copy all parent documents to all content nodes.
Describe alternatives you've considered
Considered all alternatives available today, each of them has certain limitations:
The text was updated successfully, but these errors were encountered: