Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Generator in _quarantine_media_txn() #17813

Merged
merged 2 commits into from
Oct 31, 2024

Conversation

realtyem
Copy link
Contributor

@realtyem realtyem commented Oct 10, 2024

when updating the remote_media_cache table

Context: matrix-org/synapse#15439 (#15439)

Pull Request Checklist

@CLAassistant
Copy link

CLAassistant commented Oct 10, 2024

CLA assistant check
All committers have signed the CLA.

@realtyem realtyem marked this pull request as ready for review October 10, 2024 17:44
@realtyem realtyem requested a review from a team as a code owner October 10, 2024 17:44
@anoadragon453
Copy link
Member

@realtyem do you have a reference for why providing generators to txn.executemany or txn.execute_batch may be a bad idea?

@realtyem
Copy link
Contributor Author

@realtyem do you have a reference for why providing generators to txn.executemany or txn.execute_batch may be a bad idea?

matrix-org/synapse#15439

Is a lot of the basis. From that, I distill the TLDR as "Generators may cause inadvertent data loss during a db query retry by exhausting an iterable"

@realtyem
Copy link
Contributor Author

realtyem commented Oct 15, 2024

The comments at spot seem to be relevant. Although this set of PR's may not necessarily be (part of|used by) this infrastructure, still seems prudent to not use generators as a good habit/safety precaution. It may be they end up used in the future, so don't let our tomorrow selves find the foot gun. (And if they do get used in this infrastructure, then is the 'inspection' infrastructure in this function working?) (edit: yes it's working as intended)

"""Start a new database transaction with the given connection.
Note: The given func may be called multiple times under certain
failure modes. This is normally fine when in a standard transaction,
but care must be taken if the connection is in `autocommit` mode that
the function will correctly handle being aborted and retried half way
through its execution.
Similarly, the arguments to `func` (`args`, `kwargs`) should not be generators,
since they could be evaluated multiple times (which would produce an empty
result on the second or subsequent evaluation). Likewise, the closure of `func`
must not reference any generators. This method attempts to detect such usage
and will log an error.

@realtyem
Copy link
Contributor Author

I believe I misunderstood your question, my apologizes.

In and of itself, there is no reason I know of why you can not use a Generator to produce the arguments needed for a call to executemany(), for either the sqlite3 library or psycopg2.

However, coming shortly in the future is a bit of a rework for a fair chunk of the database interfacing code and some tightening up of the Typing around it would be a really really good idea. I propose a continuation of work done previously(linked in the issue above, but several links deep into it, I believe). It would make the arguments used for executemany() and execute_batch() need to be a Sequence as it's outer container instead of an Iterable. There is some thoughts around using a Collection instead, but I believe that using a grouping of arguments that enforces 'deterministic order' to be the correct way forward.

Since that is as simple as a List or, preferably, a Tuple, I believe it is not an especially hard hill to climb.

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Oct 31, 2024

This seems fine to do given Cursor.executemany(..., parameters: Sequence[SQLQueryParameters]), a type further down the line of LoggingTransaction.executemany(...)

def executemany(
self, sql: str, parameters: Sequence[SQLQueryParameters]
) -> Any: ...

And the goal is to add proper type hints to LoggingTransaction.executemany(...). We can see these generator lints when trying to add the types, see matrix-org/synapse#15439 (comment)

@MadLittleMods MadLittleMods merged commit 7987d5e into element-hq:develop Oct 31, 2024
39 checks passed
MadLittleMods pushed a commit that referenced this pull request Oct 31, 2024
MadLittleMods pushed a commit that referenced this pull request Oct 31, 2024
MadLittleMods pushed a commit that referenced this pull request Oct 31, 2024
MadLittleMods pushed a commit that referenced this pull request Oct 31, 2024
MadLittleMods pushed a commit that referenced this pull request Oct 31, 2024
devonh pushed a commit that referenced this pull request Nov 5, 2024
…17890)

Use more correct changelog entries for refactoring `Generator` usage

 - #17813
 - #17814
 - #17815
 - #17816
 - #17817

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants