-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX]: JavaScripts max int is 2^53 - 1, longs are bigger #4005
Conversation
81ab7e7
to
b5a1f67
Compare
I think #3188 is related... |
I think using the BigInt package https://www.npmjs.com/package/big-integer might be smarter. Also DRY being applied would be nice and a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully BigInt is coming in JS:
https://github.com/tc39/proposal-bigint
superset/viz.py
Outdated
@@ -935,7 +946,10 @@ def to_series(self, df, classed='', title_suffix=''): | |||
ys = series[name] | |||
if df[name].dtype.kind not in 'biufc': | |||
continue | |||
series_title = name | |||
if isinstance(name, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hunk looks unrelated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually related. When you group by a numeric entity you can run into the same problem that the ID is to large to be displayed correctly in the legend or pop over. However we can just treat everything in the title a string since we can be sure its not used for anything mathematical.
for k, v in list(d.items()): | ||
# if an int is too big for Java Script to handle | ||
# convert it to a string | ||
if isinstance(v, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving the constant and the loop in an helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, def handle_js_int_overflow(data)
or something
@bolkedebruin @xrmx I looked at the various BigInt packages in JS. The problem is that I don't think d3 will handle BigInts correctly, which is why I'm converting large numbers to string only in cases where I'm pretty sure we are not going to do any computations with them, specifically:
If a value is supposed to be displayed/ingested in a visualization (e.g. line char, pie chart, ...) it is usually fine for it to loose precision since you usually don't care about the exact number anymore. An alternative to what I did in this PR would be to define that all integers for the table visualizations are always converted to a string (regardless of their size) and then fix up the sorting logic for sorting on the client side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tough one to work around of... while your fix seems like it would work for dimensions, what happens with overflowing metrics?
superset/viz.py
Outdated
@@ -935,7 +946,10 @@ def to_series(self, df, classed='', title_suffix=''): | |||
ys = series[name] | |||
if df[name].dtype.kind not in 'biufc': | |||
continue | |||
series_title = name | |||
if isinstance(name, (list, tuple)): | |||
series_title = [str(title) for title in name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that affect bar chart sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not since we are treating the title as string in the JS anyways.
@mistercrunch A couple screenshots how everything behaves: |
09a72c2
to
31c03d6
Compare
superset/viz.py
Outdated
records=df.to_dict(orient='records'), | ||
columns=list(df.columns), | ||
) | ||
|
||
for d in data.get('records', dict()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also put this into a BaseViz.handle_js_int_overflow
method
6f9cce5
to
2ee6990
Compare
Any updates on this besides a rebase (and a possible test fix)? |
2ee6990
to
1c89942
Compare
…ently used as IDs this is a hacky fix.
1c89942
to
f43ad4f
Compare
Codecov Report
@@ Coverage Diff @@
## master #4005 +/- ##
==========================================
+ Coverage 71.22% 71.23% +0.01%
==========================================
Files 190 190
Lines 14880 14900 +20
Branches 1098 1098
==========================================
+ Hits 10598 10614 +16
- Misses 4279 4283 +4
Partials 3 3
Continue to review full report at Codecov.
|
Twitter uses very long IDs for entities. This causes issues because Java Script can not handle these long IDs and will round them by treating them as a float.
This is a very hack fix, that will just convert every integer that is longer than the longest integer supported by JS to a string for both SQL Lab as well as the default table visualization.
This works since both python and JS are dynamically typed, but its not great.
Any suggestions how we should handle this correctly?