Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor timeouts common #35

Closed
mkhorton opened this issue Aug 22, 2018 · 8 comments
Closed

Cursor timeouts common #35

mkhorton opened this issue Aug 22, 2018 · 8 comments

Comments

@mkhorton
Copy link
Member

mkhorton commented Aug 22, 2018

Anybody else having this issue? Seems to be if a set of process_items takes too long -- it's possible to pass the no_cursor_timeout=True kwarg to Mongolike.query but I'm not sure if that should be done by default.

I'm not sure what the default timeout is, but we could also catch this exception with a recommendation to user on how to increase the timeout...

Edit: Also I'm not sure if the server will respect an increased timeout even if we set it in maggma.

@montoyjh
Copy link
Contributor

Yep this is occasionally an issue for me as well. It's also a bit of a pain if you use a command cursor (e. g. aggregation pipelines), because it's a bit harder to ensure that command cursors don't time out.

@mkhorton
Copy link
Member Author

Not sure if an alternative option might be to just re-run get_items() ? If the builder has been written appropriately it should be harmless to just generate a new cursor.

@montoyjh
Copy link
Contributor

Would that not break incremental builds? Presumably if documents have been added to a target collection with the current timestamp, the last_updated filters would filter out all of the documents included in the initial cursor upon re-execution. Also, it seems like most cursor timeouts would probably recur.

@mkhorton
Copy link
Member Author

Yeah, I see that. It depends how the incremental builds are structured I guess -- but in this case it also means that any interrupted incremental build will mean that a complete re-build will have to be performed? What if a build in a cron job fails silently one day? This seems like a recipe for something bad to happen.

You could store the initial cursor query, but that feels a bit hacky (and wouldn't prevent the interrupted build issue either, just cursor timeouts).

@montoyjh
Copy link
Contributor

Agreed, incremental builds from date filters have this general issue at present (this has been an issue for a long time). I've thought a bit about how to make this more robust, but haven't implemented anything yet. One thought I've had would be to build in a temporary staging collection and then copy to the target collection, which would presumably only fail if the copying step was interrupted (rather than the process_item->update targets pipeline). This puts a bit more of a space load on the database, though, and could still fail if the final copying step is interrupted.

I think there's probably a more effective strategy via some additional logic in the build bookkeeping or a validation step, perhaps all of the new documents in a collection have an "incremental_validation" field or something that only gets added collection-wide when an incremental build completes.

@mkhorton
Copy link
Member Author

mkhorton commented Aug 22, 2018

Well, MongoDB 4.0 supports multi-document ACID transactions, so in principle you could wrap all the update_targets into a single session/transaction. This is a very Mongo-specific thing however, so probably wouldn't play nicely with maggma's Stores abstraction(?)

I've thought about a secondary book-keeping collection too, which has the benefit of making it easy to display builder analytics, but that's another layer we'd have to implement.

I agree a staging collection could work, but is a bit unfortunate -- i.e. will the staging collection need to be cleaned/emptied automatically in the case of failed builds? What happens if someone specifies the wrong staging collection and accidentally wipes data? etc. I prefer the "incremental_validation" field idea, in that it's easiest to implement and quite lightweight, or perhaps a builder_in_progress field that gets set to True at the start and False when it completes gracefully... Still not a big fan though but might be the best we can do right now :/

@dwinston
Copy link
Member

I recently added a get_criteria helper to support more robust incremental building, and I use it for XAS building in emmet by subclassing MapBuilder. Additional indexes are required, but builder runs should be more resilient to interruption. A builder with a get_items that uses get_criteria should be amenable to @mkhorton's suggestion for the runner to catch cursor timeout errors and re-call the builder's get_items. The basic idea is that last-updated filtering is done at the document (uniquely identified by key) rather than collection/store level -- this is why additional (compound) indexes are needed on the stores. @montoyjh I'd be particularly interested if this approach works for your command-cursor-based builder(s) because, as you mentioned, guarding against timeouts is more involved.

@shyamd
Copy link
Contributor

shyamd commented Aug 28, 2019

We have to actively deal with this unfortunately by making sure our get_items doesn't stall between cursor calls. The best way is to either group by key, get data from a new cursor, and yield this data in get items. Not pretty, but pymongo doesn't give us much of a choice.

@shyamd shyamd closed this as completed Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants