Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: pass datasource_type and datasource_id to form_data #19981

Merged
merged 5 commits into from
Jun 2, 2022

Conversation

eschutho
Copy link
Member

@eschutho eschutho commented May 6, 2022

SUMMARY

With SIP68 we will be deprecating the ConnectorRegistry and instead having a fixed set of datasources. We currently pass both a dataset_id and datasource_type in the form_data, but we are only passing in the dataset_id to the api itself. In cases where there is no form data, we usually default to a "table" datasource, but this PR allows us to be more flexible about having different types of datasources in the future and changes the api to pass in both the datasource_type and id. We are introducing the intent to start using more datasources in SIP81.

The goal with the cache keys and temporary explore state is to be able to read from the existing format (dataset_id) but write to the new format (datasource_id and datasource_type). Any existing keys in the old format would default to a type of 'table'.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No visual changes

TESTING INSTRUCTIONS

Existing tests have been updated and a few new tests added to account for existing form_data cache structure.
To test, you should be able to go through the entire explore flow without any issues.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@superset-github-bot superset-github-bot bot added the Superset-Community-Partners Preset community partner program participants label May 6, 2022
@eschutho eschutho added risk:breaking-change Issues or PRs that will introduce breaking changes and removed Superset-Community-Partners Preset community partner program participants labels May 6, 2022
@eschutho eschutho marked this pull request as draft May 6, 2022 19:45
@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from 51c96ba to 026c064 Compare May 6, 2022 21:39
@eschutho eschutho marked this pull request as ready for review May 6, 2022 21:41
@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch from 026c064 to 70339a3 Compare May 6, 2022 21:45
@codecov
Copy link

codecov bot commented May 6, 2022

Codecov Report

Merging #19981 (079747f) into master (74c5479) will decrease coverage by 0.11%.
The diff coverage is 86.07%.

❗ Current head 079747f differs from pull request most recent head 917dcf4. Consider uploading reports for the commit 917dcf4 to get more accurate results

@@            Coverage Diff             @@
##           master   #19981      +/-   ##
==========================================
- Coverage   66.47%   66.36%   -0.12%     
==========================================
  Files        1726     1726              
  Lines       64767    64853      +86     
  Branches     6828     6828              
==========================================
- Hits        43055    43038      -17     
- Misses      19977    20080     +103     
  Partials     1735     1735              
Flag Coverage Δ
hive ?
mysql 82.23% <94.11%> (+0.03%) ⬆️
postgres 82.30% <94.11%> (+0.03%) ⬆️
presto ?
python 82.38% <94.11%> (-0.29%) ⬇️
sqlite 82.03% <94.11%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../src/dashboard/components/gridComponents/Chart.jsx 60.60% <ø> (ø)
...rc/explore/components/ExploreChartHeader/index.jsx 51.35% <ø> (ø)
...re/components/controls/DatasourceControl/index.jsx 72.83% <0.00%> (ø)
...mponents/useExploreAdditionalActionsMenu/index.jsx 62.92% <ø> (ø)
...perset-frontend/src/views/CRUD/chart/ChartCard.tsx 47.61% <ø> (ø)
superset-frontend/src/views/CRUD/utils.tsx 64.80% <ø> (ø)
superset/charts/schemas.py 99.35% <ø> (ø)
superset/connectors/base/models.py 86.74% <ø> (ø)
superset/examples/country_map.py 0.00% <0.00%> (ø)
superset/examples/deck.py 0.00% <0.00%> (ø)
... and 57 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74c5479...917dcf4. Read the comment docs.

@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from eeec7eb to 7e2331b Compare May 7, 2022 00:18
@@ -23,7 +23,8 @@
@dataclass
class CommandParameters:
actor: User
dataset_id: int = 0
datasource_type: str = ""
datasource_id: int = 0
Copy link
Member Author

@eschutho eschutho May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we have datasource_id as required with a default to 0, so I kept with the same pattern and set datasource_type to an empty string. It almost seems like we should have two different classes here when a key exists and when it doesn't so that we can be more explicit about what is required in each case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to break it into multiple classes 👍🏼

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but if it's ok, I'll put this into a different PR since it's currently one class, and outside this scope.

if state["owner"] != actor.get_user_id():
raise TemporaryCacheAccessDeniedError()
tab_id = self._cmd_params.tab_id
contextual_key = cache_key(
session.get("_id"), tab_id, dataset_id, chart_id
session.get("_id"), tab_id, datasource_id, chart_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we need to add datasource_type or we would get the same cache hit for datasource_id but different datasource_type?

Suggested change
session.get("_id"), tab_id, datasource_id, chart_id
session.get("_id"), tab_id, datasource_id, datasource_id, chart_id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks for catching!

"chart_id": chart_id,
"form_data": INITIAL_FORM_DATA,
}
resp = client.post("api/v1/explore/form_data", json=payload)
print(resp.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty

@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch 4 times, most recently from f6e162c to d6228fb Compare May 10, 2022 00:40
@eschutho eschutho closed this May 10, 2022
@eschutho eschutho reopened this May 10, 2022
@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from dcba962 to c17eef2 Compare May 11, 2022 16:38
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @eschutho! I left some comments as a result of a first-pass review.

const sliceId = getUrlParam(URL_PARAMS.sliceId);
if (!datasetId && !sliceId) {
if (!datasourceId && !sliceId && !datasourceType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider old URLs that use the datasetId. Maybe read the old parameter and assign it to datasourceId and set datasourceType to table.

We also need to change the alert message 'The URL is missing the dataset_id or slice_id parameters.' to also include the dataset_type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds good. I think we set a default for type then we won't be able to validate if it's missing.

return check_dataset_access(datasource_id)
raise DatasourceNotFoundValidationError


def check_dataset_access(dataset_id: int) -> Optional[bool]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_dataset_access is not being used anymore, right? If so, can we remove it?

from superset.views.base import is_user_admin
from superset.views.utils import is_owner


def check_datasource_access(datasource_id: int, datasource_type: str) -> Optional[bool]:
if datasource_id:
if datasource_type == DatasourceType.TABLE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if check_dataset_access depends on the datasource_type. Shouldn’t the check occur independently of the data source type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, it's not dependent because there's only one type, but when we start adding other types, we'll do something like this:

def check_query_access(query_id: int) -> Optional[bool]:
    if query_id:
        query = QueryDAO.find_by_id(query_id)
        if query:
            can_access_datasource = security_manager.raise_for_access(
                query=query)
            if can_access_datasource:
                return True
    raise QueryNotFoundError

Copy link
Member Author

@eschutho eschutho May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this code because the query functionality isn't ready yet, but I think it may make sense to have it. I'll add it back in.

@@ -23,7 +23,8 @@
@dataclass
class CommandParameters:
actor: User
dataset_id: int = 0
datasource_type: str = ""
datasource_id: int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to break it into multiple classes 👍🏼


logger = logging.getLogger(__name__)

CACHE_IMPORT_PATH = "superset.extensions.metastore_cache.SupersetMetastoreCache"


class ExploreFormDataCache(Cache):
def get(self, *args: Any, **kwargs: Any) -> Optional[Union[str, Markup]]:
cache = self.cache.get(*args, **kwargs)
Copy link
Member

@michael-s-molina michael-s-molina May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem here with old keys because the contextual_key is now being generated with the datasource_type but the old keys didn't have it so it will miss the cache and return None for old keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added a check using the old keys for update and delete as well. I don't really think there's a good way to remap these, unfortunately. The cache persistence is configurable but the default is 7 days.. maybe we can delete this fallback code after three months or so.

@@ -64,7 +65,7 @@ def run(self) -> Optional[str]:
# Generate a new key if tab_id changes or equals 0
tab_id = self._cmd_params.tab_id
contextual_key = cache_key(
session.get("_id"), tab_id, dataset_id, chart_id
session.get("_id"), tab_id, datasource_id, chart_id, datasource_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will miss the cache here for old keys because they were generated without the datasource_type. We can add logic to also query for the old format or assume that new keys will be created and think about how to clean the old entries.

Copy link
Member Author

@eschutho eschutho May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently on update, if the first key can't be found, it will create a new key and return it. That seems ok to me.. do you think that could work for this temporary cache?

@@ -67,65 +67,91 @@ def dataset_id() -> int:
return dataset.id


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have one fixture that returns the datasource with both id and type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good call


from superset.extensions import cache_manager
from superset.utils.core import DatasourceType

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that checks for retrieval using keys that were generated without a datasource type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, got it on line 38. 👍

@@ -84,7 +88,8 @@ export const URL_PARAMS = {
export const RESERVED_CHART_URL_PARAMS: string[] = [
URL_PARAMS.formDataKey.name,
URL_PARAMS.sliceId.name,
URL_PARAMS.datasetId.name,
URL_PARAMS.datasourceId.name,
URL_PARAMS.datasourceType.name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-s-molina this could be an old URL, too? I'll datasetId back in here..

@eschutho
Copy link
Member Author

Thanks for the review and added context @michael-s-molina. I made some small changes based on the feedback, but still have to come back to this and add in the check access query logic that I removed.

@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch from 0027b4a to 2bfdb1b Compare May 26, 2022 23:39
scripts/tests/run.sh Outdated Show resolved Hide resolved
@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch 2 times, most recently from ce09ce1 to 0deb744 Compare May 27, 2022 01:02
if state["owner"] != get_owner(actor):
raise TemporaryCacheAccessDeniedError()
tab_id = self._cmd_params.tab_id
contextual_key = cache_key(
session.get("_id"), tab_id, dataset_id, chart_id
session.get("_id"), tab_id, datasource_id, chart_id, datasource_type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I still need to write up something here for deleting old keys. Unless it's ok that it just expires on it's own. The user will get a 404 though.

Copy link
Member

@hughhhh hughhhh May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are generating new keys for misses, i think it's fine to let the keys expire

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's do that, it's only for delete and update. Unless anyone else has thoughts.

assert isinstance(command.run(), str)


# TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to finish writing up these tests, too.

@eschutho
Copy link
Member Author

This is mostly done.. I just found one other thing last minute that I'd like to fix. I'll still welcome any feedback.

Copy link
Member

@hughhhh hughhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

2 things i want to call out though is having a regression testing to make sure this works intended and we aren't deploying an huge bugs into the explore view.

Also is possible for us to add some statsd metrics to track hit vs. misses with the cache. I think this will help us understand how the logic is effecting the end user with pulling data. (this can be another ticket)

@jinghua-qa
Copy link
Member

/testenv up

@github-actions
Copy link
Contributor

@jinghua-qa Ephemeral environment spinning up at http://54.213.76.156:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@jinghua-qa
Copy link
Member

jinghua-qa commented Jun 1, 2022

Testing are mostly done, tested regression for the following behavior:
1, go from list view to explore and run a chart,
2, change the visualization and filters and run (update chart) for different chart type and then refresh the page.
3, check that the form fields for chart updates
4, Save chart, and add it to a dashboard

did not find major issues in explore, found 1 issue with chart share in dashboard:
1, when i use the permalink of chart share to enter dashboard, the chart is not highlighted, i think this is the PR issue since i checked master, master have the chart highlighted effect

Screen.Recording.2022-06-01.at.12.53.19.PM.mov

.

@eschutho
Copy link
Member Author

eschutho commented Jun 2, 2022

Thanks @jinghua-qa, I didn't know about that feature! I checked it on my local and then on the ephemeral and I see the highlighting.. is this the expected behavior?

Screen.Recording.2022-06-02.at.9.15.38.AM.mov

@eschutho eschutho force-pushed the elizabeth/form-data-api-update branch from 38ec60f to 917dcf4 Compare June 2, 2022 17:49
@hughhhh
Copy link
Member

hughhhh commented Jun 2, 2022

🚢 🚢 🚢

@eschutho eschutho merged commit 32bb1ce into apache:master Jun 2, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2022

Ephemeral environment shutdown and build artifacts deleted.

philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
)

* pass datasource_type and datasource_id to form_data

* add datasource_type to delete command

* add datasource_type to delete command

* fix old keys implementation

* add more tests
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset-io risk:breaking-change Issues or PRs that will introduce breaking changes size/XXL 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants