Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set correct schema on config import #16041

Merged
merged 12 commits into from
Nov 4, 2021

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Aug 3, 2021

SUMMARY

When we import examples the schema is set to null, even in databases that support schemas and have a default schema. This is problematic, because users might create duplicate physical datasets, which are usually not allowed, resulting in, eg:

  • examples.NULL.birth_names (actually in the default "public" schema, for Postgres)
  • examples.public.birth_names

These duplicate datasets have caused problems in the past, because they break assumptions about physical datasets.

This PR changes the load-examples command and the import mechanism to explicitly set the schema to the default one when one is not set.

There's a tricky case that this PR needs to handle: when a user creates a dataset duplicating one of the new examples defined in YAML files. These datasets have fixed UUIDs, and the following happens:

  1. Admin runs superset load-examples.
  2. Dataset users is created with table_name=users, schema=null and uuid=7195db6b-2d17-7619-b7c7-26b15378df8c.
  3. Later a user creates a physical dataset pointing to users. When adding a dataset we force the user to select a schema, so this dataset has table_name=users, schema=public (in Postgres) and a random UUID. Normally you can't create duplicate physical datasets, but because the schema is technically different this works.
  4. Admin runs superset load-examples again, after merging this PR.
  5. The importer sees that the users dataset matches the one with NULL schema, based on the UUID. it tries to update that dataset, setting its schema to "public".
  6. This fails because there's already a dataset with the same database_id, table_name and schema (the one created by the user).

This is a rare case, but it can happen. I added a workaround so that when this happens the duplicate datasets are not updated. This could be potentially problematic in the future, if we add more columns to a dataset and have charts use those columns; then the charts would fail to render because the dataset wouldn't be updated.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

  1. Run superset load-examples on master.
  2. Datasets have no schema.
  3. Run superset load-examples on this branch.
  4. Datasets have a schema if appropriate (for Postgresl, "public").

Same for clean installations. I tested with Postgresql, MySQL, and SQLite.

ADDITIONAL INFORMATION

  • Has associated issue: Example datasets created with incorrect schema #16051
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Aug 3, 2021

Codecov Report

Merging #16041 (2e15b0f) into master (bea8502) will decrease coverage by 0.12%.
The diff coverage is 33.82%.

❗ Current head 2e15b0f differs from pull request most recent head 00cb584. Consider uploading reports for the commit 00cb584 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16041      +/-   ##
==========================================
- Coverage   77.08%   76.95%   -0.13%     
==========================================
  Files        1037     1037              
  Lines       55647    55690      +43     
  Branches     7608     7608              
==========================================
- Hits        42898    42859      -39     
- Misses      12499    12581      +82     
  Partials      250      250              
Flag Coverage Δ
hive 81.43% <33.82%> (-0.12%) ⬇️
mysql ?
postgres ?
presto 81.73% <33.82%> (-0.12%) ⬇️
python 82.22% <33.82%> (-0.26%) ⬇️
sqlite 81.54% <33.82%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/connectors/sqla/models.py 87.39% <ø> (ø)
superset/examples/bart_lines.py 25.00% <25.00%> (-2.59%) ⬇️
superset/examples/country_map.py 23.25% <25.00%> (-1.75%) ⬇️
superset/examples/energy.py 24.44% <25.00%> (-1.75%) ⬇️
superset/examples/flights.py 17.14% <25.00%> (-1.61%) ⬇️
superset/examples/long_lat.py 22.22% <25.00%> (-1.59%) ⬇️
superset/examples/multiformat_time_series.py 15.68% <25.00%> (-0.99%) ⬇️
superset/examples/paris.py 24.13% <25.00%> (-2.79%) ⬇️
superset/examples/random_time_series.py 17.94% <25.00%> (-1.50%) ⬇️
superset/examples/sf_population_polygons.py 24.13% <25.00%> (-2.79%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bea8502...00cb584. Read the comment docs.

Copy link
Member

@eschutho eschutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any way in the future to dry up some of the duplication?

@junlincc
Copy link
Member

@betodealmeida Hi Beto!
do we have plan to merge this PR?

@betodealmeida betodealmeida force-pushed the load_examples_bug branch 3 times, most recently from d20ac3c to 525fb5f Compare November 1, 2021 21:48
@betodealmeida betodealmeida force-pushed the load_examples_bug branch 2 times, most recently from 6454c83 to 1e77343 Compare November 2, 2021 01:47
@betodealmeida betodealmeida force-pushed the load_examples_bug branch 2 times, most recently from 83c131e to 2e15b0f Compare November 2, 2021 04:12
@betodealmeida betodealmeida force-pushed the load_examples_bug branch 16 times, most recently from 83fff75 to af4e3c8 Compare November 4, 2021 16:59
@betodealmeida betodealmeida merged commit 1fbce88 into apache:master Nov 4, 2021
@eschutho eschutho added the v1.4 label Dec 10, 2021
eschutho pushed a commit that referenced this pull request Dec 10, 2021
* fix: set correct schema on config import

* Fix lint

* Fix test

* Fix tests

* Fix another test

* Fix another test

* Fix base test

* Add helper function

* Fix examples

* Fix test

* Fix test

* Fixing more tests

(cherry picked from commit 1fbce88)
AAfghahi pushed a commit that referenced this pull request Jan 10, 2022
* fix: set correct schema on config import

* Fix lint

* Fix test

* Fix tests

* Fix another test

* Fix another test

* Fix base test

* Add helper function

* Fix examples

* Fix test

* Fix test

* Fixing more tests
@mistercrunch mistercrunch added 🍒 1.4.0 🍒 1.4.1 🍒 1.4.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v1.4 🍒 1.4.0 🍒 1.4.1 🍒 1.4.2 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants