Cursor timeouts common #35

mkhorton · 2018-08-22T17:16:54Z

Anybody else having this issue? Seems to be if a set of process_items takes too long -- it's possible to pass the no_cursor_timeout=True kwarg to Mongolike.query but I'm not sure if that should be done by default.

I'm not sure what the default timeout is, but we could also catch this exception with a recommendation to user on how to increase the timeout...

Edit: Also I'm not sure if the server will respect an increased timeout even if we set it in maggma.

The text was updated successfully, but these errors were encountered:

montoyjh · 2018-08-22T17:22:15Z

Yep this is occasionally an issue for me as well. It's also a bit of a pain if you use a command cursor (e. g. aggregation pipelines), because it's a bit harder to ensure that command cursors don't time out.

mkhorton · 2018-08-22T17:26:52Z

Not sure if an alternative option might be to just re-run get_items() ? If the builder has been written appropriately it should be harmless to just generate a new cursor.

montoyjh · 2018-08-22T17:34:23Z

Would that not break incremental builds? Presumably if documents have been added to a target collection with the current timestamp, the last_updated filters would filter out all of the documents included in the initial cursor upon re-execution. Also, it seems like most cursor timeouts would probably recur.

mkhorton · 2018-08-22T17:48:58Z

Yeah, I see that. It depends how the incremental builds are structured I guess -- but in this case it also means that any interrupted incremental build will mean that a complete re-build will have to be performed? What if a build in a cron job fails silently one day? This seems like a recipe for something bad to happen.

You could store the initial cursor query, but that feels a bit hacky (and wouldn't prevent the interrupted build issue either, just cursor timeouts).

montoyjh · 2018-08-22T18:06:36Z

Agreed, incremental builds from date filters have this general issue at present (this has been an issue for a long time). I've thought a bit about how to make this more robust, but haven't implemented anything yet. One thought I've had would be to build in a temporary staging collection and then copy to the target collection, which would presumably only fail if the copying step was interrupted (rather than the process_item->update targets pipeline). This puts a bit more of a space load on the database, though, and could still fail if the final copying step is interrupted.

I think there's probably a more effective strategy via some additional logic in the build bookkeeping or a validation step, perhaps all of the new documents in a collection have an "incremental_validation" field or something that only gets added collection-wide when an incremental build completes.

mkhorton · 2018-08-22T18:11:44Z

Well, MongoDB 4.0 supports multi-document ACID transactions, so in principle you could wrap all the update_targets into a single session/transaction. This is a very Mongo-specific thing however, so probably wouldn't play nicely with maggma's Stores abstraction(?)

I've thought about a secondary book-keeping collection too, which has the benefit of making it easy to display builder analytics, but that's another layer we'd have to implement.

I agree a staging collection could work, but is a bit unfortunate -- i.e. will the staging collection need to be cleaned/emptied automatically in the case of failed builds? What happens if someone specifies the wrong staging collection and accidentally wipes data? etc. I prefer the "incremental_validation" field idea, in that it's easiest to implement and quite lightweight, or perhaps a builder_in_progress field that gets set to True at the start and False when it completes gracefully... Still not a big fan though but might be the best we can do right now :/

dwinston · 2018-08-22T22:05:04Z

I recently added a get_criteria helper to support more robust incremental building, and I use it for XAS building in emmet by subclassing MapBuilder. Additional indexes are required, but builder runs should be more resilient to interruption. A builder with a get_items that uses get_criteria should be amenable to @mkhorton's suggestion for the runner to catch cursor timeout errors and re-call the builder's get_items. The basic idea is that last-updated filtering is done at the document (uniquely identified by key) rather than collection/store level -- this is why additional (compound) indexes are needed on the stores. @montoyjh I'd be particularly interested if this approach works for your command-cursor-based builder(s) because, as you mentioned, guarding against timeouts is more involved.

shyamd · 2019-08-28T22:55:18Z

We have to actively deal with this unfortunately by making sure our get_items doesn't stall between cursor calls. The best way is to either group by key, get data from a new cursor, and yield this data in get items. Not pretty, but pymongo doesn't give us much of a choice.

shyamd closed this as completed Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cursor timeouts common #35

Cursor timeouts common #35

mkhorton commented Aug 22, 2018 •

edited

Loading

montoyjh commented Aug 22, 2018

mkhorton commented Aug 22, 2018

montoyjh commented Aug 22, 2018

mkhorton commented Aug 22, 2018

montoyjh commented Aug 22, 2018

mkhorton commented Aug 22, 2018 •

edited

Loading

dwinston commented Aug 22, 2018

shyamd commented Aug 28, 2019

Cursor timeouts common #35

Cursor timeouts common #35

Comments

mkhorton commented Aug 22, 2018 • edited Loading

montoyjh commented Aug 22, 2018

mkhorton commented Aug 22, 2018

montoyjh commented Aug 22, 2018

mkhorton commented Aug 22, 2018

montoyjh commented Aug 22, 2018

mkhorton commented Aug 22, 2018 • edited Loading

dwinston commented Aug 22, 2018

shyamd commented Aug 28, 2019

mkhorton commented Aug 22, 2018 •

edited

Loading

mkhorton commented Aug 22, 2018 •

edited

Loading