feat: refactor all `get_sqla_engine` to use contextmanager in codebase #21943

hughhhh · 2022-10-26T18:08:15Z

SUMMARY

In this PR, we moving get_sqla_engine to be leveraged as a contextmanager through out the entire codebase. What that means is any time we want to instantiate an engine we'll have to use a with ___ as ___ block. For example:

with get_sqla_engine() as engine: 
     # use engine ...

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2022-10-28T18:38:56Z

Codecov Report

Merging #21943 (89020b5) into master (736b534) will increase coverage by 0.96%.
The diff coverage is 38.54%.

@@            Coverage Diff             @@
##           master   #21943      +/-   ##
==========================================
+ Coverage   67.09%   68.06%   +0.96%     
==========================================
  Files        1827     1827              
  Lines       69866    72646    +2780     
  Branches     7548     7548              
==========================================
+ Hits        46875    49444    +2569     
- Misses      21035    21246     +211     
  Partials     1956     1956

Flag	Coverage Δ
hive	`53.46% <19.79%> (+0.62%)`	⬆️
javascript	`53.72% <ø> (ø)`
mysql	`79.38% <38.54%> (+0.97%)`	⬆️
postgres	`79.43% <38.54%> (+0.96%)`	⬆️
presto	`53.37% <19.79%> (+0.62%)`	⬆️
python	`82.40% <38.54%> (+0.82%)`	⬆️
sqlite	`77.87% <38.54%> (+0.93%)`	⬆️
unit	`51.40% <5.20%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/datasets/commands/importers/v1/utils.py	`57.64% <0.00%> (-0.69%)`	⬇️
superset/db_engine_specs/gsheets.py	`75.91% <0.00%> (ø)`
superset/examples/bart_lines.py	`0.00% <0.00%> (ø)`
superset/examples/birth_names.py	`71.42% <0.00%> (ø)`
superset/examples/country_map.py	`0.00% <0.00%> (ø)`
superset/examples/energy.py	`0.00% <0.00%> (ø)`
superset/examples/flights.py	`0.00% <0.00%> (ø)`
superset/examples/long_lat.py	`0.00% <0.00%> (ø)`
superset/examples/multiformat_time_series.py	`0.00% <0.00%> (ø)`
superset/examples/paris.py	`0.00% <0.00%> (ø)`
... and 38 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

hughhhh · 2022-11-02T17:03:52Z

/testenv up

github-actions · 2022-11-02T19:36:58Z

@hughhhh Ephemeral environment spinning up at http://34.222.64.144:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

betodealmeida · 2022-11-05T22:50:54Z

superset/db_engine_specs/base.py

+        with database.get_sqla_engine_with_context(
+            schema=schema, source=source
+        ) as engine:
+            return engine


This won't work; the context manager will be exited before the functions returns. It's equivalent to:

with database.get_sqla_engine_with_context(...) as engine: a = engine return a

so should i just return database._ get_sqla_engine()

so should i just return database._ get_sqla_engine()

If you do that, then the SSH tunneling wouldn't be setup in places that call DBEngineSpec.get_engine(), defeating the purpose of the changes in this PR. You need to convert this into a context manager as well (and update everything downstream), so that the setup/teardown works as expected.

You can play with a simple Python script to understand how this works:

from contextlib import contextmanager @contextmanager def get_sqla_engine_with_context(): print("enabling ssh tunnel") yield 42 print("disabling ssh tunnel") def get_engine(): with get_sqla_engine_with_context() as engine: return engine print("start") engine = get_engine() print("I have the engine, I can now work on it") print("end")

Running the code above prints:

start enabling ssh tunnel disabling ssh tunnel I have the engine, I can now work on it end

…get-sqla-engine-2

betodealmeida

I think we should add a get_raw_connection context manager (that uses closing), it should simplify the code a lot. Could be done in this PR or a separate one.

We should also make the use engine.raw_connection().execute() vs. engine.execute() more consistent, standardizing on the latter, but we can do that in a separate PR.

betodealmeida · 2022-11-10T22:11:23Z

superset/connectors/sqla/utils.py

+        with dataset.database.get_sqla_engine_with_context(
+            schema=dataset.schema
+        ) as engine:
+            with closing(engine.raw_connection()) as conn:


We don't need to do this now, but eventually we could have a database.get_raw_connection context manager, to simplify things a little bit. This way we could rewrite this as:

with dataset.database.get_raw_connection(schema=dataset.schema) as conn: cursor = conn.cursor() ...

And the implementation of get_raw_connection() would take care of closing the connection:

@contextmanager def get_raw_connection(...): with get_sqla_engine_with_context(...) as engine: with closing(engine.raw_connection()) as conn: yield conn

betodealmeida · 2022-11-10T22:14:30Z

superset/databases/commands/test_connection.py

+                        int(
+                            app.config[
+                                "TEST_DATABASE_CONNECTION_TIMEOUT"
+                            ].total_seconds()
+                        ),


Nit, I'd compute this outside the with block just to improve readability:

timeout = app.config["TEST_DATABASE_CONNECTION_TIMEOUT"].total_seconds() with database.get_sqla_engine_with_context() as engine: try: alive = func_timeout(int(timeout), ping, args=(engine,))

Also, func_timeout should take floats. :-P

betodealmeida · 2022-11-10T22:16:49Z

superset/databases/commands/validate.py

+            if not alive:
+                raise DatabaseOfflineError(
+                    SupersetError(
+                        message=__("Database is offline."),
+                        error_type=SupersetErrorType.GENERIC_DB_ENGINE_ERROR,
+                        level=ErrorLevel.ERROR,
+                    ),
+                )


We can dedent this block, no?

superset/datasets/commands/importers/v1/utils.py

betodealmeida · 2022-11-10T22:29:30Z

superset/db_engine_specs/base.py

-            to_sql_kwargs["method"] = "multi"
+        with cls.get_engine(database) as engine:
+            if engine.dialect.supports_multivalues_insert:
+                to_sql_kwargs["method"] = "multi"


I think we need to improve this a little bit... here we're building an engine just to check an attribute in the dialect, which means we're setting up and tearing down an SSH connection just to read an attribute. :-(

Maybe we should add a get_dialect method to the DB engine spec, that builds the engine without the context manager:

@classmethod def get_dialect(database, schema, source): engine = database.get_sqla_engine(schema=schema, source=source) return engine.dialect

Then when we only need the dialect we can call this method, which is cheaper.

in this case we still need the engine though, so it makes more sense to just use get_engine instead of just the dialect

https://github.com/apache/superset/pull/21943/files/7ce583678e1770d472527abb8270dd22e666b9c0#diff-2e62d64ef1113e48efdfeb2acbaa522fca13e49e6a00c2cfd4f74efc4ae1b45cR916

Ah, good point, I missed that.

betodealmeida · 2022-11-10T22:30:54Z

superset/db_engine_specs/bigquery.py

+        with cls.get_engine(database) as engine:
+            to_gbq_kwargs = {
+                "destination_table": str(table),
+                "project_id": engine.url.host,


I wonder if we should add an attribute to DB engine specs annotating if they support SSH tunnel or not? BigQuery, eg, will probably never support it.

created a ticket for this

betodealmeida · 2022-11-10T22:33:05Z

superset/db_engine_specs/hive.py

@@ -205,7 +203,8 @@ def df_to_sql(
            if table_exists:
                raise SupersetException("Table already exists")
        elif to_sql_kwargs["if_exists"] == "replace":
-            engine.execute(f"DROP TABLE IF EXISTS {str(table)}")
+            with cls.get_engine(database) as engine:
+                engine.execute(f"DROP TABLE IF EXISTS {str(table)}")


It's interesting that sometimes we use engine.raw_connection().execute, and others we use engine.execute. Ideally we should standardize in the latter wherever possible, since it's more concise.

betodealmeida · 2022-11-10T22:37:05Z

superset/examples/paris.py

-    engine = database.get_sqla_engine()
-    schema = inspect(engine).default_schema_name
-    table_exists = database.has_table_by_name(tbl_name)
+    with database.get_sqla_engine_with_context() as engine:


We really should convert these old examples into the new format (YAML + CSV)...

betodealmeida · 2022-11-10T23:12:35Z

superset/models/core.py

+            yield self._get_sqla_engine(schema=schema, nullpool=nullpool, source=source)
        except Exception as ex:
-            raise self.db_engine_spec.get_dbapi_mapped_exception(ex)
+            raise ex


You don't need try/except here if you're raising all exceptions.

betodealmeida · 2022-11-10T23:17:27Z

tests/integration_tests/sqllab_tests.py

@@ -733,7 +733,7 @@ def test_execute_sql_statements(self, mock_execute_sql_statement, mock_get_query
        mock_query = mock.MagicMock()
        mock_query.database.allow_run_async = False
        mock_cursor = mock.MagicMock()
-        mock_query.database.get_sqla_engine.return_value.raw_connection.return_value.cursor.return_value = (
+        mock_query.database.get_sqla_engine_with_context.return_value.__enter__.return_value.raw_connection.return_value.cursor.return_value = (


You can replace all intermediary .return_value with () (but not the last):

Suggested change

mock_query.database.get_sqla_engine_with_context.return_value.__enter__.return_value.raw_connection.return_value.cursor.return_value = (

mock_query.database.get_sqla_engine_with_context().__enter__().raw_connection().cursor.return_value = (

…erset into ref-get-sqla-engine-2

betodealmeida

Looks good, there's just one place that needs to be fixed.

betodealmeida · 2022-11-14T21:58:27Z

superset/datasets/commands/importers/v1/utils.py

+        with database.get_sqla_engine_with_context() as engine:
+            connection = engine
+
+            df.to_sql(


This will only run in the else block, but it needs to run in the if database.sqlalchemy_uri == current_app.config.get("SQLALCHEMY_DATABASE_URI"): block as well.

betodealmeida

github-actions · 2022-11-15T18:45:38Z

Ephemeral environment shutdown and build artifacts deleted.

apache#21943)

init

158da8d

pull-request-size bot added the size/L label Oct 26, 2022

update all the examples

face73f

pull-request-size bot added size/XL and removed size/L labels Oct 26, 2022

change remaining bits

95d079e

hughhhh marked this pull request as ready for review October 28, 2022 17:50

Merge branch 'master' into ref-get-sqla-engine-2

87c0d79

hughhhh added 5 commits October 28, 2022 16:20

fix confict

11b240b

fix conflict

1bfdbda

setup return value for contextmanager

4146d5a

updates test

54fc147

fix linting

fdc6ca3

AAfghahi approved these changes Nov 2, 2022

View reviewed changes

renaming function

66c0801

pull-request-size bot added size/XXL and removed size/XL labels Nov 3, 2022

fix test

1f9ec5e

betodealmeida requested changes Nov 5, 2022

View reviewed changes

hughhhh added 8 commits November 7, 2022 14:01

fix get engine to return contextmanager

8811a99

why

82d7532

yerp

1f829ac

update typing

d53d116

update comment

752161d

Merge branch 'master' of https://github.com/apache/superset into ref-…

0ac6fb1

…get-sqla-engine-2

fix pylint

31f3c1d

last one

e089a8d

hughhhh requested a review from betodealmeida November 8, 2022 22:01

Merge branch 'master' into ref-get-sqla-engine-2

7ce5836

betodealmeida requested changes Nov 10, 2022

View reviewed changes

hughhhh added 2 commits November 14, 2022 14:49

address all concerns

b05f0e8

Merge branch 'ref-get-sqla-engine-2' of https://github.com/apache/sup…

12b05bd

…erset into ref-get-sqla-engine-2

betodealmeida requested changes Nov 14, 2022

View reviewed changes

add to else

89020b5

betodealmeida approved these changes Nov 15, 2022

View reviewed changes

hughhhh merged commit e23efef into master Nov 15, 2022

diegomedina248 pushed a commit to preset-io/superset that referenced this pull request Dec 3, 2022

feat: refactor all get_sqla_engine to use contextmanager in codebase (

9c06aed

apache#21943)

john-bodley added a commit that referenced this pull request Dec 15, 2022

fix(hive): Fix regression from #21943

f7afd8b

john-bodley mentioned this pull request Dec 15, 2022

fix(hive): Fix regression from #21943 #22431

Merged

9 tasks

john-bodley added a commit that referenced this pull request Dec 15, 2022

fix(hive): Fix regression from #21943 (#22431)

4f9c2c8

john-bodley added a commit to airbnb/superset-fork that referenced this pull request Dec 15, 2022

fix(hive): Fix regression from apache#21943 (apache#22431)

8d778f1

This was referenced Jan 3, 2023

chore: Migrate /superset/search_queries to API v1 #22579

Merged

chore: Migrate /superset/queries/<last_updated_ms> to API v1 #22611

Merged

john-bodley added a commit to airbnb/superset-fork that referenced this pull request Jan 17, 2023

fix(hive): Fix regression from apache#21943 (apache#22431)

422c233

mistercrunch added the 🚢 2.1.3 label Feb 18, 2024

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024

mistercrunch deleted the ref-get-sqla-engine-2 branch March 26, 2024 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: refactor all `get_sqla_engine` to use contextmanager in codebase #21943

feat: refactor all `get_sqla_engine` to use contextmanager in codebase #21943

hughhhh commented Oct 26, 2022 •

edited

Loading

codecov bot commented Oct 28, 2022 •

edited

Loading

hughhhh commented Nov 2, 2022

github-actions bot commented Nov 2, 2022

betodealmeida Nov 5, 2022 •

edited

Loading

hughhhh Nov 6, 2022

betodealmeida Nov 7, 2022

betodealmeida left a comment

betodealmeida Nov 10, 2022

betodealmeida Nov 10, 2022

betodealmeida Nov 10, 2022

betodealmeida Nov 10, 2022

hughhhh Nov 14, 2022

betodealmeida Nov 14, 2022

betodealmeida Nov 10, 2022

hughhhh Nov 14, 2022

betodealmeida Nov 10, 2022

betodealmeida Nov 10, 2022

betodealmeida Nov 10, 2022

betodealmeida Nov 10, 2022

betodealmeida left a comment

betodealmeida Nov 14, 2022

betodealmeida left a comment

github-actions bot commented Nov 15, 2022

	mock_query.database.get_sqla_engine_with_context.return_value.__enter__.return_value.raw_connection.return_value.cursor.return_value = (
	mock_query.database.get_sqla_engine_with_context().__enter__().raw_connection().cursor.return_value = (

feat: refactor all get_sqla_engine to use contextmanager in codebase #21943

feat: refactor all get_sqla_engine to use contextmanager in codebase #21943

Conversation

hughhhh commented Oct 26, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Oct 28, 2022 • edited Loading

Codecov Report

hughhhh commented Nov 2, 2022

github-actions bot commented Nov 2, 2022

betodealmeida Nov 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betodealmeida left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betodealmeida left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betodealmeida left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 15, 2022

feat: refactor all `get_sqla_engine` to use contextmanager in codebase #21943

feat: refactor all `get_sqla_engine` to use contextmanager in codebase #21943

hughhhh commented Oct 26, 2022 •

edited

Loading

codecov bot commented Oct 28, 2022 •

edited

Loading

betodealmeida Nov 5, 2022 •

edited

Loading