feat!: pass datasource_type and datasource_id to form_data #19981

eschutho · 2022-05-06T19:45:28Z

SUMMARY

With SIP68 we will be deprecating the ConnectorRegistry and instead having a fixed set of datasources. We currently pass both a dataset_id and datasource_type in the form_data, but we are only passing in the dataset_id to the api itself. In cases where there is no form data, we usually default to a "table" datasource, but this PR allows us to be more flexible about having different types of datasources in the future and changes the api to pass in both the datasource_type and id. We are introducing the intent to start using more datasources in SIP81.

The goal with the cache keys and temporary explore state is to be able to read from the existing format (dataset_id) but write to the new format (datasource_id and datasource_type). Any existing keys in the old format would default to a type of 'table'.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No visual changes

TESTING INSTRUCTIONS

Existing tests have been updated and a few new tests added to account for existing form_data cache structure.
To test, you should be able to go through the entire explore flow without any issues.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2022-05-06T21:48:45Z

Codecov Report

Merging #19981 (079747f) into master (74c5479) will decrease coverage by 0.11%.
The diff coverage is 86.07%.

❗ Current head 079747f differs from pull request most recent head 917dcf4. Consider uploading reports for the commit 917dcf4 to get more accurate results

@@            Coverage Diff             @@
##           master   #19981      +/-   ##
==========================================
- Coverage   66.47%   66.36%   -0.12%     
==========================================
  Files        1726     1726              
  Lines       64767    64853      +86     
  Branches     6828     6828              
==========================================
- Hits        43055    43038      -17     
- Misses      19977    20080     +103     
  Partials     1735     1735

Flag	Coverage Δ
hive	`?`
mysql	`82.23% <94.11%> (+0.03%)`	⬆️
postgres	`82.30% <94.11%> (+0.03%)`	⬆️
presto	`?`
python	`82.38% <94.11%> (-0.29%)`	⬇️
sqlite	`82.03% <94.11%> (?)`
unit	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../src/dashboard/components/gridComponents/Chart.jsx	`60.60% <ø> (ø)`
...rc/explore/components/ExploreChartHeader/index.jsx	`51.35% <ø> (ø)`
...re/components/controls/DatasourceControl/index.jsx	`72.83% <0.00%> (ø)`
...mponents/useExploreAdditionalActionsMenu/index.jsx	`62.92% <ø> (ø)`
...perset-frontend/src/views/CRUD/chart/ChartCard.tsx	`47.61% <ø> (ø)`
superset-frontend/src/views/CRUD/utils.tsx	`64.80% <ø> (ø)`
superset/charts/schemas.py	`99.35% <ø> (ø)`
superset/connectors/base/models.py	`86.74% <ø> (ø)`
superset/examples/country_map.py	`0.00% <0.00%> (ø)`
superset/examples/deck.py	`0.00% <0.00%> (ø)`
... and 57 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74c5479...917dcf4. Read the comment docs.

eschutho · 2022-05-07T00:25:48Z

superset/explore/form_data/commands/parameters.py

@@ -23,7 +23,8 @@
 @dataclass
 class CommandParameters:
    actor: User
-    dataset_id: int = 0
+    datasource_type: str = ""
+    datasource_id: int = 0


I'm not sure why we have datasource_id as required with a default to 0, so I kept with the same pattern and set datasource_type to an empty string. It almost seems like we should have two different classes here when a key exists and when it doesn't so that we can be more explicit about what is required in each case.

I think it's a good idea to break it into multiple classes 👍🏼

I agree, but if it's ok, I'll put this into a different PR since it's currently one class, and outside this scope.

superset/explore/permalink/commands/create.py

hughhhh · 2022-05-07T00:45:45Z

superset/explore/form_data/commands/delete.py

                if state["owner"] != actor.get_user_id():
                    raise TemporaryCacheAccessDeniedError()
                tab_id = self._cmd_params.tab_id
                contextual_key = cache_key(
-                    session.get("_id"), tab_id, dataset_id, chart_id
+                    session.get("_id"), tab_id, datasource_id, chart_id


wouldn't we need to add datasource_type or we would get the same cache hit for datasource_id but different datasource_type?

Suggested change

session.get("_id"), tab_id, datasource_id, chart_id

session.get("_id"), tab_id, datasource_id, datasource_id, chart_id

yes, thanks for catching!

hughhhh · 2022-05-07T00:47:14Z

tests/integration_tests/explore/form_data/api_tests.py

        "chart_id": chart_id,
        "form_data": INITIAL_FORM_DATA,
    }
    resp = client.post("api/v1/explore/form_data", json=payload)
+    print(resp.data)


remove this

michael-s-molina

Thanks for the PR @eschutho! I left some comments as a result of a first-pass review.

michael-s-molina · 2022-05-11T12:39:22Z

superset-frontend/src/explore/components/controls/DatasourceControl/index.jsx

      const sliceId = getUrlParam(URL_PARAMS.sliceId);
-      if (!datasetId && !sliceId) {
+      if (!datasourceId && !sliceId && !datasourceType) {


We need to consider old URLs that use the datasetId. Maybe read the old parameter and assign it to datasourceId and set datasourceType to table.

We also need to change the alert message 'The URL is missing the dataset_id or slice_id parameters.' to also include the dataset_type.

Thanks, sounds good. I think we set a default for type then we won't be able to validate if it's missing.

michael-s-molina · 2022-05-11T13:01:01Z

superset/explore/utils.py

+            return check_dataset_access(datasource_id)
+    raise DatasourceNotFoundValidationError
+
+
 def check_dataset_access(dataset_id: int) -> Optional[bool]:


The check_dataset_access is not being used anymore, right? If so, can we remove it?

michael-s-molina · 2022-05-11T16:21:56Z

superset/explore/utils.py

 from superset.views.base import is_user_admin
 from superset.views.utils import is_owner


+def check_datasource_access(datasource_id: int, datasource_type: str) -> Optional[bool]:
+    if datasource_id:
+        if datasource_type == DatasourceType.TABLE:


I'm not sure if check_dataset_access depends on the datasource_type. Shouldn’t the check occur independently of the data source type?

Right now, it's not dependent because there's only one type, but when we start adding other types, we'll do something like this:

def check_query_access(query_id: int) -> Optional[bool]: if query_id: query = QueryDAO.find_by_id(query_id) if query: can_access_datasource = security_manager.raise_for_access( query=query) if can_access_datasource: return True raise QueryNotFoundError

I removed this code because the query functionality isn't ready yet, but I think it may make sense to have it. I'll add it back in.

michael-s-molina · 2022-05-11T16:26:00Z

superset/explore/form_data/commands/parameters.py

@@ -23,7 +23,8 @@
 @dataclass
 class CommandParameters:
    actor: User
-    dataset_id: int = 0
+    datasource_type: str = ""
+    datasource_id: int = 0


I think it's a good idea to break it into multiple classes 👍🏼

michael-s-molina · 2022-05-11T16:34:56Z

superset/utils/cache_manager.py


 logger = logging.getLogger(__name__)

 CACHE_IMPORT_PATH = "superset.extensions.metastore_cache.SupersetMetastoreCache"


+class ExploreFormDataCache(Cache):
+    def get(self, *args: Any, **kwargs: Any) -> Optional[Union[str, Markup]]:
+        cache = self.cache.get(*args, **kwargs)


There's a problem here with old keys because the contextual_key is now being generated with the datasource_type but the old keys didn't have it so it will miss the cache and return None for old keys.

Ok, I added a check using the old keys for update and delete as well. I don't really think there's a good way to remap these, unfortunately. The cache persistence is configurable but the default is 7 days.. maybe we can delete this fallback code after three months or so.

michael-s-molina · 2022-05-11T16:42:12Z

superset/explore/form_data/commands/update.py

@@ -64,7 +65,7 @@ def run(self) -> Optional[str]:
                # Generate a new key if tab_id changes or equals 0
                tab_id = self._cmd_params.tab_id
                contextual_key = cache_key(
-                    session.get("_id"), tab_id, dataset_id, chart_id
+                    session.get("_id"), tab_id, datasource_id, chart_id, datasource_type


It will miss the cache here for old keys because they were generated without the datasource_type. We can add logic to also query for the old format or assume that new keys will be created and think about how to clean the old entries.

So currently on update, if the first key can't be found, it will create a new key and return it. That seems ok to me.. do you think that could work for this temporary cache?

michael-s-molina · 2022-05-11T16:47:29Z

tests/integration_tests/explore/form_data/api_tests.py

@@ -67,65 +67,91 @@ def dataset_id() -> int:
        return dataset.id


+@pytest.fixture


Maybe have one fixture that returns the datasource with both id and type?

yeah, good call

michael-s-molina · 2022-05-11T16:49:55Z

tests/integration_tests/utils/cache_manager_tests.py

+
+from superset.extensions import cache_manager
+from superset.utils.core import DatasourceType
+


Can we add a test that checks for retrieval using keys that were generated without a datasource type?

yeah, got it on line 38. 👍

eschutho · 2022-05-13T23:25:26Z

superset-frontend/src/constants.ts

@@ -84,7 +88,8 @@ export const URL_PARAMS = {
 export const RESERVED_CHART_URL_PARAMS: string[] = [
  URL_PARAMS.formDataKey.name,
  URL_PARAMS.sliceId.name,
-  URL_PARAMS.datasetId.name,
+  URL_PARAMS.datasourceId.name,
+  URL_PARAMS.datasourceType.name,


@michael-s-molina this could be an old URL, too? I'll datasetId back in here..

eschutho · 2022-05-14T00:25:08Z

Thanks for the review and added context @michael-s-molina. I made some small changes based on the feedback, but still have to come back to this and add in the check access query logic that I removed.

scripts/tests/run.sh

eschutho · 2022-05-27T01:05:22Z

superset/explore/form_data/commands/delete.py

                if state["owner"] != get_owner(actor):
                    raise TemporaryCacheAccessDeniedError()
                tab_id = self._cmd_params.tab_id
                contextual_key = cache_key(
-                    session.get("_id"), tab_id, dataset_id, chart_id
+                    session.get("_id"), tab_id, datasource_id, chart_id, datasource_type


todo: I still need to write up something here for deleting old keys. Unless it's ok that it just expires on it's own. The user will get a 404 though.

Since we are generating new keys for misses, i think it's fine to let the keys expire

Yeah, let's do that, it's only for delete and update. Unless anyone else has thoughts.

eschutho · 2022-05-27T01:06:07Z

tests/integration_tests/explore/form_data/commands_tests.py

+        assert isinstance(command.run(), str)
+
+
+# TODO


I'd like to finish writing up these tests, too.

eschutho · 2022-05-27T01:06:54Z

This is mostly done.. I just found one other thing last minute that I'd like to fix. I'll still welcome any feedback.

hughhhh

🚢

2 things i want to call out though is having a regression testing to make sure this works intended and we aren't deploying an huge bugs into the explore view.

Also is possible for us to add some statsd metrics to track hit vs. misses with the cache. I think this will help us understand how the logic is effecting the end user with pulling data. (this can be another ticket)

jinghua-qa · 2022-05-31T15:45:47Z

/testenv up

github-actions · 2022-05-31T16:00:23Z

@jinghua-qa Ephemeral environment spinning up at http://54.213.76.156:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

jinghua-qa · 2022-06-01T20:15:46Z

Testing are mostly done, tested regression for the following behavior:
1, go from list view to explore and run a chart,
2, change the visualization and filters and run (update chart) for different chart type and then refresh the page.
3, check that the form fields for chart updates
4, Save chart, and add it to a dashboard

did not find major issues in explore, found 1 issue with chart share in dashboard:
1, when i use the permalink of chart share to enter dashboard, the chart is not highlighted, i think this is the PR issue since i checked master, master have the chart highlighted effect

Screen.Recording.2022-06-01.at.12.53.19.PM.mov

.

eschutho · 2022-06-02T16:16:46Z

Thanks @jinghua-qa, I didn't know about that feature! I checked it on my local and then on the ephemeral and I see the highlighting.. is this the expected behavior?

Screen.Recording.2022-06-02.at.9.15.38.AM.mov

hughhhh · 2022-06-02T18:07:34Z

🚢 🚢 🚢

github-actions · 2022-06-02T23:48:38Z

Ephemeral environment shutdown and build artifacts deleted.

) * pass datasource_type and datasource_id to form_data * add datasource_type to delete command * add datasource_type to delete command * fix old keys implementation * add more tests

superset-github-bot bot added the preset-io label May 6, 2022

pull-request-size bot added the size/XL label May 6, 2022

superset-github-bot bot added the Superset-Community-Partners Preset community partner program participants label May 6, 2022

eschutho added risk:breaking-change Issues or PRs that will introduce breaking changes and removed Superset-Community-Partners Preset community partner program participants labels May 6, 2022

eschutho marked this pull request as draft May 6, 2022 19:45

eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from 51c96ba to 026c064 Compare May 6, 2022 21:39

eschutho marked this pull request as ready for review May 6, 2022 21:41

eschutho force-pushed the elizabeth/form-data-api-update branch from 026c064 to 70339a3 Compare May 6, 2022 21:45

eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from eeec7eb to 7e2331b Compare May 7, 2022 00:18

eschutho commented May 7, 2022

View reviewed changes

superset/explore/permalink/commands/create.py Outdated Show resolved Hide resolved

hughhhh reviewed May 7, 2022

View reviewed changes

eschutho force-pushed the elizabeth/form-data-api-update branch 4 times, most recently from f6e162c to d6228fb Compare May 10, 2022 00:40

michael-s-molina requested review from michael-s-molina and villebro May 10, 2022 10:53

eschutho closed this May 10, 2022

eschutho reopened this May 10, 2022

eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from dcba962 to c17eef2 Compare May 11, 2022 16:38

michael-s-molina reviewed May 11, 2022

View reviewed changes

eschutho commented May 13, 2022

View reviewed changes

eschutho force-pushed the elizabeth/form-data-api-update branch from 0027b4a to 2bfdb1b Compare May 26, 2022 23:39

eschutho commented May 27, 2022

View reviewed changes

scripts/tests/run.sh Outdated Show resolved Hide resolved

eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from ce09ce1 to 0deb744 Compare May 27, 2022 01:02

eschutho commented May 27, 2022

View reviewed changes

hughhhh approved these changes May 31, 2022

View reviewed changes

eschutho added 4 commits June 1, 2022 12:55

pass datasource_type and datasource_id to form_data

a2db3ed

add datasource_type to delete command

868c69e

add datasource_type to delete command

021b797

fix old keys implementation

a07a6dc

add more tests

917dcf4

eschutho force-pushed the elizabeth/form-data-api-update branch from 38ec60f to 917dcf4 Compare June 2, 2022 17:49

pull-request-size bot added size/XXL and removed size/XL labels Jun 2, 2022

eschutho merged commit 32bb1ce into apache:master Jun 2, 2022

eschutho mentioned this pull request Jun 16, 2022

fix: key error on permalink fetch for old permalinks #20414

Merged

9 tasks

michael-s-molina mentioned this pull request Jun 22, 2022

chore: Restructure explore redux state #20448

Merged

9 tasks

simonvanderveldt mentioned this pull request Sep 6, 2022

[SIP-81] - Chart creation without a dataset #19953

Closed

ktmud mentioned this pull request Sep 9, 2022

fix(explore): expired form data for SQL Lab queries #21430

Closed

9 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024

	session.get("_id"), tab_id, datasource_id, chart_id
	session.get("_id"), tab_id, datasource_id, datasource_id, chart_id

		@@ -67,65 +67,91 @@ def dataset_id() -> int:
		return dataset.id


		@pytest.fixture


		from superset.extensions import cache_manager
		from superset.utils.core import DatasourceType

feat!: pass datasource_type and datasource_id to form_data #19981

feat!: pass datasource_type and datasource_id to form_data #19981

Conversation

eschutho commented May 6, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented May 6, 2022 • edited Loading

Codecov Report

eschutho May 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michael-s-molina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michael-s-molina May 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho commented May 14, 2022

Choose a reason for hiding this comment

hughhhh May 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho commented May 27, 2022

hughhhh left a comment

Choose a reason for hiding this comment

jinghua-qa commented May 31, 2022

github-actions bot commented May 31, 2022

jinghua-qa commented Jun 1, 2022 • edited Loading

eschutho commented Jun 2, 2022

hughhhh commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

eschutho commented May 6, 2022 •

edited

Loading

codecov bot commented May 6, 2022 •

edited

Loading

eschutho May 7, 2022 •

edited

Loading

eschutho May 13, 2022 •

edited

Loading

michael-s-molina May 11, 2022 •

edited

Loading

eschutho May 13, 2022 •

edited

Loading

hughhhh May 31, 2022 •

edited

Loading

jinghua-qa commented Jun 1, 2022 •

edited

Loading