You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We have workers that run reindex on a specific collection of records in some cases where it doesn't automatically happen, but sometimes, by the time searchkick gets around to indexing a particular record, the corresponding document has been deleted, and ElasticSearch returns a MissingDocument error, which, in our case, we don't care about - if we're trying to update a record that has been deleted, we can ignore that. This error reporting shouldn't affect the rest of the reindex (since it's a bulk query), but what it does affect, since only the first error is reported in the raised exception, is the fact that we don't know if there were any other errors that we do care about.
Describe the solution you'd like
It would be nice to have an option to ignore MissingDocuments for update operations during a reindex, so that if an error we actually care about is reported, we'll see it. That could look be a new parameter to reindex, something like ignoreMissingOnUpdate: true.
Alternatively, it would work to simply embed a full list of errors in the raised exception, or add a new option to return a list of errors rather than raising an exception if any errors are present. That way, if an error does occur, there would be a way to filter them for ones that actually matter in the given use case. However, I think the former solution is simpler and probably covers the most common use cases.
The text was updated successfully, but these errors were encountered:
Hey @mltsy, thanks for the suggestion! Added an allow_missing option to reindex in the allow_missing branch. It adds a bit of complexity to the code, so want to think about it before deciding on merging.
Wow, awesome! Oof, you're right, that is more complex than I imagined 😅 Especially the bit where you have to set an instance variable on the RecordData to track the setting individually for each record in a bulk operation...
I wonder if it would make more sense to just make this a global searchkick config option (or at least apply it at a higher level, similar to how mode works)? Because I would think this is actually a sensible default (i.e. if you're trying to reindex a model collection, and a record is missing, the most likely explanation is that the model instance has been deleted, and therefore doesn't need reindexing, right?), so it's not something anyone should want to configure on a per-record basis anyway. But you wouldn't want to change the default behavior without a major version bump of course, so a global configuration setting seems worth consideration - and simpler! Unless of course there are reasons someone would want to handle that case differently for a particular model/operation...?
The other thing I noticed is that it looks like you added the option for the instance method Model#reindex, which... probably isn't necessary, right? If we think about a sensible default, I think it makes sense to raise an exception or return an error if you try to update a single record and the record is missing (in that case the user has the option to handle the error however they want to, since there's only one error to handle, as opposed to the bulk operation where there may be multiple errors, but you only see the first one)
Is your feature request related to a problem? Please describe.
We have workers that run
reindex
on a specific collection of records in some cases where it doesn't automatically happen, but sometimes, by the time searchkick gets around to indexing a particular record, the corresponding document has been deleted, and ElasticSearch returns a MissingDocument error, which, in our case, we don't care about - if we're trying to update a record that has been deleted, we can ignore that. This error reporting shouldn't affect the rest of the reindex (since it's a bulk query), but what it does affect, since only the first error is reported in the raised exception, is the fact that we don't know if there were any other errors that we do care about.Describe the solution you'd like
It would be nice to have an option to ignore MissingDocuments for
update
operations during a reindex, so that if an error we actually care about is reported, we'll see it. That could look be a new parameter toreindex
, something likeignoreMissingOnUpdate: true
.Alternatively, it would work to simply embed a full list of errors in the raised exception, or add a new option to return a list of errors rather than raising an exception if any errors are present. That way, if an error does occur, there would be a way to filter them for ones that actually matter in the given use case. However, I think the former solution is simpler and probably covers the most common use cases.
The text was updated successfully, but these errors were encountered: