Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(key_value): use longblob on mysql #19805

Merged
merged 2 commits into from
Apr 22, 2022

Conversation

villebro
Copy link
Member

@villebro villebro commented Apr 21, 2022

SUMMARY

On Postgres LongBinary equates to BYTEA which can store 1 Gb values. However, it turns out LongBinary without a specified length creates BLOB on MySQL, which is limited to 64 kb. To make sure the value column on the key_value table is as wide on MySQL as it is on Postgres, we specify the length to ensure that an appropriately sized column is created.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro villebro requested a review from a team as a code owner April 21, 2022 14:06
@iercan
Copy link
Contributor

iercan commented Apr 21, 2022

I've confirmed it works. Thanks

@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #19805 (c190253) into master (3db4a1c) will decrease coverage by 12.60%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master   #19805       +/-   ##
===========================================
- Coverage   66.54%   53.94%   -12.61%     
===========================================
  Files        1692     1692               
  Lines       64807    64807               
  Branches     6661     6661               
===========================================
- Hits        43129    34961     -8168     
- Misses      19978    28146     +8168     
  Partials     1700     1700               
Flag Coverage Δ
hive 52.87% <ø> (ø)
mysql ?
postgres ?
presto 52.72% <ø> (ø)
python 56.73% <ø> (-25.64%) ⬇️
sqlite ?
unit 47.95% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/utils/dashboard_import_export.py 0.00% <0.00%> (-100.00%) ⬇️
superset/key_value/commands/upsert.py 0.00% <0.00%> (-89.59%) ⬇️
superset/key_value/commands/update.py 0.00% <0.00%> (-89.37%) ⬇️
superset/key_value/commands/delete.py 0.00% <0.00%> (-85.30%) ⬇️
superset/key_value/commands/delete_expired.py 0.00% <0.00%> (-80.77%) ⬇️
superset/dashboards/commands/importers/v0.py 14.79% <0.00%> (-75.15%) ⬇️
superset/datasets/commands/importers/v0.py 24.82% <0.00%> (-68.80%) ⬇️
superset/databases/commands/test_connection.py 31.42% <0.00%> (-68.58%) ⬇️
superset/datasets/commands/update.py 25.88% <0.00%> (-68.24%) ⬇️
superset/datasets/commands/create.py 30.18% <0.00%> (-67.93%) ⬇️
... and 267 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3db4a1c...c190253. Read the comment docs.

@ktmud
Copy link
Member

ktmud commented Apr 21, 2022

Should we add a new migration for users who already ran this migration? I'm also wondering whether it's worth creating a new type like MediumText and update key_value/models to point to this type as well. It's risky when the declarative model is out of sync with the actual db schema.

@villebro
Copy link
Member Author

villebro commented Apr 21, 2022

Should we add a migration for existing users?

@ktmud I was wondering the same, and I unfortunately don't have a good answer. @iercan who caught this and was kind enough to raise it already did a manual migration (see https://apache-superset.slack.com/archives/CH307T4JG/p1650463618742279?thread_ts=1650031133.456189&cid=CH307T4JG ), and I kinda feel like orgs that are running off of master might need a different process for these types of migration improvements. The problem is, if I make a new migration for this, it'll be hell cherrying it into all the various release branches. But I'm open to suggestions.

@ktmud
Copy link
Member

ktmud commented Apr 21, 2022

Hmm... since the migration is not that complicated I guess it'd be okay to ask users to do it manually. For those who choose to deploy on master, such risks are implied.

That said, do you mind changing the solution to follow the same convention like MediumText? We'd need to update the type here as well.

@ktmud
Copy link
Member

ktmud commented Apr 21, 2022

It seems another solution to this is to specify a length for LargeBinary. MySQL will automatically choose whether to use MEDIUMBLOB or LOBGBLOB depending on the length specified.

Would that be preferred?

@villebro
Copy link
Member Author

It seems another solution to this is to specify a length for LargeBinary. MySQL will automatically choose whether to use MEDIUMBLOB or LOBGBLOB depending on the length specified.

Would that be preferred?

I think that's perfect 👍 I considered adding a types.py under superset/migrations, but I ended up settling for just specifying length=2**31 (2 Gb).

I wrote the following test script:

import sqlalchemy as sa
from sqlalchemy.schema import CreateTable
from sqlalchemy.types import LargeBinary

tbl = sa.Table('tbl', sa.MetaData(), sa.Column("value", sa.LargeBinary(length=2**31)))
mysql_engine = sa.create_engine('mysql://uid:pwd@localhost')
pg_engine = sa.create_engine('postgresql://uid:pwd@localhost')

print("-- MySQL:")
print(CreateTable(tbl).compile(pg_engine))
print("-- Postgres:")
print(CreateTable(tbl).compile(mysql_engine))

which results in this:

-- MySQL:

CREATE TABLE tbl (
	value BYTEA
)


-- Postgres:

CREATE TABLE tbl (
	value BLOB(2147483648)
)

And when the MySQL one is executed, it results in LONGBLOB (@zhaoyongjie was kind enough to test this as I didn't have a MySQL instance handy)

@villebro villebro force-pushed the villebro/longblob branch from 0ddc75a to 9c1a218 Compare April 22, 2022 06:34
@villebro villebro force-pushed the villebro/longblob branch from 9c1a218 to c190253 Compare April 22, 2022 06:35
Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on MySQL 5.7 and 8.0, and the schema of key_value is expected.
image

@villebro villebro merged commit a1bd5b2 into apache:master Apr 22, 2022
@villebro villebro deleted the villebro/longblob branch April 22, 2022 10:12
villebro added a commit that referenced this pull request Apr 25, 2022
* fix(key_value): use longblob on mysql

* set length

(cherry picked from commit a1bd5b2)
hughhhh pushed a commit to hve-labs/superset that referenced this pull request May 11, 2022
* fix(key_value): use longblob on mysql

* set length
philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
* fix(key_value): use longblob on mysql

* set length
@mistercrunch mistercrunch added 🍒 1.5.0 🍒 1.5.1 🍒 1.5.2 🍒 1.5.3 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels lts-v1 preset-io size/XS 🍒 1.5.0 🍒 1.5.1 🍒 1.5.2 🍒 1.5.3 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants