Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research: Investigate SQLite/SQLAlchemy locking & session termination behavior #1108

Open
eloquence opened this issue Jun 9, 2020 · 7 comments
Labels
database SQLite, SQLAlchemy, and data model research

Comments

@eloquence
Copy link
Member

eloquence commented Jun 9, 2020

In today's client architecture discussion, we agreed that it would be useful to dedicate some time to investigating the behavior of and interaction between SQLite and SQLAlchemy, in order to approve our ability to reason about certain categories of errors. Key questions include:

  • What is the transactional locking behavior of SQLite driver in Python vs. what SQLAlchemy is doing?
  • Are there cases where SQLite/SQLAlchemy will terminate a transaction, e.g., 5 second timeout?

On the first point, we're interested in this to avoid issues like #888 in future. We're also investigating changes to session management to mitigate race conditions and crashes; understanding this behavior better will help us avoid new regressions.

@eloquence
Copy link
Member Author

We're adding this to the current sprint, with a 4 hour timebox. @rmol will do an initial technical spike to provoke locking issues, and share some findings with the rest of the team for additional joint learning and investigation.

@rmol
Copy link
Contributor

rmol commented Jun 24, 2020

I've put together a few tests to illustrate some of the things you can trip over with SQLAlchemy, SQLite, and threads. That gist can be dropped into the securedrop-client tests directory and run with make test TESTS="tests/test_db.py".

Things I think these tests show:

  • I don't think SQLAlchemy is doing anything to exacerbate transactional issues; it's just not doing anything about the underlying problems documented in the SQLite Python driver. Otherwise the locking behavior is pretty much as documented for SQLite; you could replicate the contention shown here by running the SQLite CLI tool on the same database in two different terminal sessions.
  • SQLAlchemy's handling of related objects can be a little tricky, but I don't think this has bitten us like the write lock timeouts have, and there are some suggested solutions.
  • It's pretty clear that @creviera's work on more tightly-scoped sessions is a good idea.

Things I think are worth further investigation:

@eloquence
Copy link
Member Author

Thanks @rmol for the research & write-up. :) Additional comments here welcome; otherwise let's touch base in a future tech mtg on follow-up issues we may want to create. Leaving this issue open until said follow-up work is tracked and cross-referenced.

@sssoleileraaa
Copy link
Contributor

I really appreciate how you used test code to document and communicate the issues. It's so clear!

Why is the query needed in add_source to trigger contention? I suspect this is due to the second-guessing done around transaction boundaries.

https://gist.github.com/rmol/f8dc3278d95b853d834057028d7a3b5b#file-test_db-py-L32 <- so when this line is removed the issue goes away? interesting...

Do we want to adopt one of the suggested solutions to problems when handling collections? Or would it be better to reduce or eliminate collections held, and just fetch them more often as needed, in a fresh session?

Just to double check, is the problem you're referring to when handling collections the sqlalchemy.exc.InvalidRequestError that we have to catch when trying to access a collection attribute on the deleted object? If so, I don't see a solution to that problem listed there, which is why I've been leaning towards the second solution you mentioned: eliminate collections held, and just fetch them more often as needed, in a fresh session.

@rmol
Copy link
Contributor

rmol commented Jun 24, 2020

https://gist.github.com/rmol/f8dc3278d95b853d834057028d7a3b5b#file-test_db-py-L32 <- so when this line is removed the issue goes away? interesting...

Yeah. 🤔

Just to double check, is the problem you're referring to when handling collections the sqlalchemy.exc.InvalidRequestError that we have to catch when trying to access a collection attribute on the deleted object?

I didn't test that, so never got that error -- test_collection_and_deletion manipulates a boolean attribute of the deleted object in the collection with no indication anything's wrong. I'd like to see if it's different for a collection of related objects, when we pick this up again.

@sssoleileraaa
Copy link
Contributor

i added this as a discussion item on the project board to help us keep track of ongoing questions and this as a knowledge sharing opportunity

@cfm cfm added the database SQLite, SQLAlchemy, and data model label Apr 20, 2022
@sssoleileraaa
Copy link
Contributor

@rocodes - throwing this on your radar since you're thinking about database improvements in the client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database SQLite, SQLAlchemy, and data model research
Projects
None yet
Development

No branches or pull requests

4 participants