-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(backends): clean up resources produced by memtable
#10055
Conversation
@@ -382,31 +384,27 @@ def create_table( | |||
else: | |||
temp_name = name | |||
|
|||
table = sg.table(temp_name, catalog=database, quoted=quoted) | |||
target = sge.Schema(this=table, expressions=column_defs) | |||
table_expr = sg.table(temp_name, catalog=database, quoted=quoted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed these variable names because table = <not the memtable>
causes it to get GC'd. I gave another example of this in the PR description.
Still need to add at least one cross-backend test to verify the behavior. |
fb642a4
to
c6c0677
Compare
I'll split out the faster memtable checks into a separate PR once #10053 is merged. |
Going to split this out into in-memory versus not-in-memory backends since the former are more important to clean up more aggressively. |
009261a
to
837e3d2
Compare
a4530d1
to
2c5f4dc
Compare
After hours of trying to understand what is happening with MySQL I was able to get a reproducer: I can only reproduce this with a Python 3.12 environment.
Still not sure why executing SQL in a finalizer causes the socket on the connection to get set to We won't be able to prevent build up of disk space when constructing large memtables in a loop, but we won't keep all the storage around when the session ends. |
8bc8b6a
to
22cec69
Compare
I know I said I would break out the in-memory backends into their own PR, but that would require no-op-ing the non-in-memory backends and then changing that code again. That doesn't seem worth the trouble, given the hopefully manageable small number of changes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A could questions/suggestions around error handling, but otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presuming tests pass, LGTM.
Running the clouds locally now. |
…session dataset usage
a98a0f5
to
b1aab14
Compare
BigQuery is good:
|
Snowflake is good:
|
Looks like the new test might be a bit flaky on bigquery: https://github.com/ibis-project/ibis/actions/runs/10798011642/job/29950561913 |
Blah, okay, looking into it. |
Replaces pymysql with mysqlclient, mostly out of frustration with bizarre GC behavior discovered during #10055. I think this is probably a breaking change due to some changes in how types are inferred for JSON, INET and UUID types. BREAKING CHANGE: Ibis now uses the `MySQLdb` driver. You may need to install MySQL client libraries to **build** the extension.
Replaces pymysql with mysqlclient, mostly out of frustration with bizarre GC behavior discovered during ibis-project#10055. I think this is probably a breaking change due to some changes in how types are inferred for JSON, INET and UUID types. BREAKING CHANGE: Ibis now uses the `MySQLdb` driver. You may need to install MySQL client libraries to **build** the extension.
Description of changes
PR on top of #10053 to clean up temporary tables produced by
ibis.memtable
.The approach is a generalization of the one taken in #10042.
The main caveat here, which I don't think is a blocker is that the following
code will start to fail:
Issues closed
Closes #10044.