Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to Ignore MissingDocument when calling reindex on a collection #1565

Open
mltsy opened this issue May 31, 2022 · 2 comments
Open

Option to Ignore MissingDocument when calling reindex on a collection #1565

mltsy opened this issue May 31, 2022 · 2 comments

Comments

@mltsy
Copy link

mltsy commented May 31, 2022

Is your feature request related to a problem? Please describe.
We have workers that run reindex on a specific collection of records in some cases where it doesn't automatically happen, but sometimes, by the time searchkick gets around to indexing a particular record, the corresponding document has been deleted, and ElasticSearch returns a MissingDocument error, which, in our case, we don't care about - if we're trying to update a record that has been deleted, we can ignore that. This error reporting shouldn't affect the rest of the reindex (since it's a bulk query), but what it does affect, since only the first error is reported in the raised exception, is the fact that we don't know if there were any other errors that we do care about.

Describe the solution you'd like
It would be nice to have an option to ignore MissingDocuments for update operations during a reindex, so that if an error we actually care about is reported, we'll see it. That could look be a new parameter to reindex, something like ignoreMissingOnUpdate: true.

Alternatively, it would work to simply embed a full list of errors in the raised exception, or add a new option to return a list of errors rather than raising an exception if any errors are present. That way, if an error does occur, there would be a way to filter them for ones that actually matter in the given use case. However, I think the former solution is simpler and probably covers the most common use cases.

@ankane
Copy link
Owner

ankane commented Jun 16, 2022

Hey @mltsy, thanks for the suggestion! Added an allow_missing option to reindex in the allow_missing branch. It adds a bit of complexity to the code, so want to think about it before deciding on merging.

@mltsy
Copy link
Author

mltsy commented Jun 17, 2022

Wow, awesome! Oof, you're right, that is more complex than I imagined 😅 Especially the bit where you have to set an instance variable on the RecordData to track the setting individually for each record in a bulk operation...

I wonder if it would make more sense to just make this a global searchkick config option (or at least apply it at a higher level, similar to how mode works)? Because I would think this is actually a sensible default (i.e. if you're trying to reindex a model collection, and a record is missing, the most likely explanation is that the model instance has been deleted, and therefore doesn't need reindexing, right?), so it's not something anyone should want to configure on a per-record basis anyway. But you wouldn't want to change the default behavior without a major version bump of course, so a global configuration setting seems worth consideration - and simpler! Unless of course there are reasons someone would want to handle that case differently for a particular model/operation...?

The other thing I noticed is that it looks like you added the option for the instance method Model#reindex, which... probably isn't necessary, right? If we think about a sensible default, I think it makes sense to raise an exception or return an error if you try to update a single record and the record is missing (in that case the user has the option to handle the error however they want to, since there's only one error to handle, as opposed to the bulk operation where there may be multiple errors, but you only see the first one)

Thanks for taking a look at this! 💚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants