-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autocomplete: extend additional name fields used in multimatch queries #1620
base: master
Are you sure you want to change the base?
Conversation
I'm not sure if we want to keep using best_fields, maybe cross_fields is better if it doesn't suffer the same norms issue. |
Interestingly, the For example, this |
These tests have not had great pass rate until pelias/api#1620, so we didn't know that the `distanceThresh` value for the coordinate checks weren't quite correct. Hopefully they will be passing soon!
As discussed offline, I've pushed a new commit which changes this behaviour to use wildcards instead of explicit field names, I feel like this is more flexible. The |
as-is this PR is safe to merge since it's backward compatible. |
this PR is an experiment with splitting up the
name.*
fields in order to avoid the negative effects of field norms due to field length, reported in pelias/openstreetmap#507 and better explained in pelias/pelias#862in particular we see this issue in
OSM
andWOF
due to those sources having more alt names than others, although it applies to all sources.as discussed on our call today, it might be that pelias/openstreetmap#435 exacerbated the issue (albeit unknown at the time) so reversing that method and moving back to multiple fields using a
multi_match
query should result in a significant reduction in the effects of the field norms issue on scoring.although fairly arbitrary, I've identified 4 new fields to begin with:
alt
- this field will contain all alternative names, so the norms penalty will no longer apply to the primary name. this includes variants, colloquialisms & other alternativesabbr
- abbreviations, ie. succulent representations of the primary namecode
- similar to above but distinct in the case of airports, stop IDs etc.org
- brands, operators etc.we may very well change these, maybe
abbr
andcode
can be merged, ororg
omitted, that's up for discussion.the main difference is that we attempt to have only a single token indexed per field.