Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python model specific session handling to prevent using invalid sessions #547

Merged
merged 7 commits into from
Jan 12, 2024

Conversation

rcypher-databricks
Copy link
Collaborator

@rcypher-databricks rcypher-databricks commented Jan 10, 2024

Resolves:
ES-985588: dbt-databricks data models timeout

Description

A long running python model can leave a connection/session idle for long enough that it times out and is closed.
Subsequent use of the connection results in an 'Invalid Sessionhandle' error.

Cause

For non-python data models the model is executed using sql statements. So even if the sql statement takes a long time to run, the connection is in active use and doesn't timeout on the server. For a python model a connection is acquired, but the model is run using rpc calls. Thus, if the python model takes a long time to run, the session can timeout on the back end.
The code to cleanup idle connections was not catching this because when a connection is released, after running a model, the last used time is set. So even though running the python model didn't use the connection as far as the cleanup code was concerned it had been in active use.
There are two cases where the invalid connection would be used.

  • Depending on the python model additional sql needs to be run before the model is considered complete. So there would be an attempt to use the connection within the bounds of the connection being acquired/released to run the python model.
  • After the python model is finished the connection is re-used by another model.

Fix

Updated the DatabricksDBTConnection class to track the language of the model being executed. If the language is python we do not update the last used time when the connection is released. This is to reflect the fact that the python model isn't necessarily using the connection.
This change allows the idle cleanup code to correctly cleanup the idle connection.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

Overriding close() in connection manager to catch/absorb errors on closing a session.  For example closing a session that has timed out on the back end.
Signed-off-by: Raymond Cypher <[email protected]>
benc-db
benc-db previously approved these changes Jan 10, 2024
Fixed up type annotations and removed a duplicate function definition.
Signed-off-by: Raymond Cypher <[email protected]>
@rcypher-databricks rcypher-databricks changed the title Stale sessions Python model specific session handling to prevent using invalid sessions Jan 11, 2024
CHANGELOG.md Outdated Show resolved Hide resolved
@rcypher-databricks rcypher-databricks merged commit 212d046 into main Jan 12, 2024
18 checks passed
@rcypher-databricks rcypher-databricks deleted the stale_sessions branch January 22, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants