Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sqllab): table list exceeds MAX_TABLE_NAMES #21262

Closed

Conversation

justinpark
Copy link
Member

@justinpark justinpark commented Aug 30, 2022

SUMMARY

Since max_tables logic multiplies (would have intended to or) max_items by the current size, it always returns entire list.
(i.e. config["MAX_TABLE_NAMES"] = 1000 and len(tables) = 40000 then max_tables = 40000 * 1000 = 40,000,000)
The previous logic also cuts out only if substr passed.
This commit always follows the MAX_TABLE_NAMES for both cases. (must set MAX_TABLE_NAMES to 0 if full table list needed)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Set MAX_TABLE_NAMES to a small number
Go to Sqllab and check the count of table list

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API
    cc: @john-bodley @ktmud

@codecov
Copy link

codecov bot commented Aug 30, 2022

Codecov Report

Merging #21262 (44f3bd3) into master (05354a9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #21262   +/-   ##
=======================================
  Coverage   66.43%   66.43%           
=======================================
  Files        1784     1784           
  Lines       68185    68174   -11     
  Branches     7265     7265           
=======================================
- Hits        45298    45293    -5     
+ Misses      21018    21012    -6     
  Partials     1869     1869           
Flag Coverage Δ
hive 53.03% <0.00%> (+0.01%) ⬆️
mysql 80.82% <100.00%> (+0.01%) ⬆️
postgres 80.87% <100.00%> (+0.01%) ⬆️
presto 52.93% <0.00%> (+0.01%) ⬆️
python 81.35% <100.00%> (+0.01%) ⬆️
sqlite 79.39% <100.00%> (+0.01%) ⬆️
unit 50.65% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/views/core.py 75.85% <100.00%> (+0.25%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettably I don't think your logic is correct, nor whether fixing the bug is actually desirable, i.e., people likely aren't aware that truncating even exists.

if total_items and substr_parsed:
max_tables = max_items * len(tables) // total_items
max_views = max_items * len(views) // total_items
max_tables = max_items or len(tables)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinpark this logic isn't quite right, the // ensures that max_tables + max_views = max_items.

Furthermore this endpoint is only invoked here which implies that substr_parsed is never truthy and the MAX_TABLE_NAMES config doesn't actually get invoked.

I think we should actually refactor this method to remove the substr_parsed logic and deprecate the MAX_TABLE_NAMES, especially given that it's non deterministic in terms of how it truncates the list.

Furthermore it seems like schema_parsed is always truthy (assuming I'm reading the frontend code correctly), i.e., per line #212 currentSchema is truthy and never "undefined" which simplifies the logic and thus large swaths of this method can be refactored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley you're right. it used to allow the empty schema_parsed option before (#1466) but no longer allowed as far as the current FE logic exists.
And MAX_TABLE_NAMES never invoked since substr_parsed is never truthy.
I hope to restore MAX_TABLE_NAMES option to fix our edge cases (>100k).
I posted updates to make max_items = max_tables + max_views

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current FE doesn't pass substr_parsed but I'll make FE changes to use substr_parsed for typeahead on truncated list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants