Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX]: JavaScripts max int is 2^53 - 1, longs are bigger #4005

Merged
merged 2 commits into from
Apr 4, 2018

Conversation

fabianmenges
Copy link
Contributor

Twitter uses very long IDs for entities. This causes issues because Java Script can not handle these long IDs and will round them by treating them as a float.

This is a very hack fix, that will just convert every integer that is longer than the longest integer supported by JS to a string for both SQL Lab as well as the default table visualization.
This works since both python and JS are dynamically typed, but its not great.

Any suggestions how we should handle this correctly?

@fabianmenges fabianmenges force-pushed the fmenges/hacky_js_long_fix branch 2 times, most recently from 81ab7e7 to b5a1f67 Compare December 5, 2017 02:51
@rumbin
Copy link
Contributor

rumbin commented Dec 5, 2017

I think #3188 is related...

@bolkedebruin
Copy link
Contributor

I think using the BigInt package https://www.npmjs.com/package/big-integer might be smarter. Also DRY being applied would be nice and a test.

Copy link
Contributor

@xrmx xrmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully BigInt is coming in JS:
https://github.com/tc39/proposal-bigint

superset/viz.py Outdated
@@ -935,7 +946,10 @@ def to_series(self, df, classed='', title_suffix=''):
ys = series[name]
if df[name].dtype.kind not in 'biufc':
continue
series_title = name
if isinstance(name, (list, tuple)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hunk looks unrelated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually related. When you group by a numeric entity you can run into the same problem that the ID is to large to be displayed correctly in the legend or pop over. However we can just treat everything in the title a string since we can be sure its not used for anything mathematical.

for k, v in list(d.items()):
# if an int is too big for Java Script to handle
# convert it to a string
if isinstance(v, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about moving the constant and the loop in an helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, def handle_js_int_overflow(data) or something

@fabianmenges
Copy link
Contributor Author

@bolkedebruin @xrmx I looked at the various BigInt packages in JS. The problem is that I don't think d3 will handle BigInts correctly, which is why I'm converting large numbers to string only in cases where I'm pretty sure we are not going to do any computations with them, specifically:

  • The legend of line charts
  • Any table visualization (SQL lab or table slice)

If a value is supposed to be displayed/ingested in a visualization (e.g. line char, pie chart, ...) it is usually fine for it to loose precision since you usually don't care about the exact number anymore.
Unless JS supports a real bigint and all of our visualization libraries support it we will probably have to convert big integers to string on a visualization by visualization level.

An alternative to what I did in this PR would be to define that all integers for the table visualizations are always converted to a string (regardless of their size) and then fix up the sorting logic for sorting on the client side.

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one to work around of... while your fix seems like it would work for dimensions, what happens with overflowing metrics?

superset/viz.py Outdated
@@ -935,7 +946,10 @@ def to_series(self, df, classed='', title_suffix=''):
ys = series[name]
if df[name].dtype.kind not in 'biufc':
continue
series_title = name
if isinstance(name, (list, tuple)):
series_title = [str(title) for title in name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that affect bar chart sorting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not since we are treating the title as string in the JS anyways.

@fabianmenges
Copy link
Contributor Author

fabianmenges commented Dec 5, 2017

@mistercrunch A couple screenshots how everything behaves:
SQL Lab:
You can see that the bigint is rendered correctly when its a dimension, if you put a sum around it, it would still work
screen shot 2017-12-05 at 2 25 37 pm
Table Viz
Grouped by time and summed. The result shows a loss of precision:
screen shot 2017-12-05 at 2 23 14 pm
Non Grouped the precision is maintained
screen shot 2017-12-05 at 2 24 31 pm
Bar Charts
screen shot 2017-12-05 at 2 43 01 pm

superset/viz.py Outdated
records=df.to_dict(orient='records'),
columns=list(df.columns),
)

for d in data.get('records', dict()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also put this into a BaseViz.handle_js_int_overflow method

@fabianmenges fabianmenges changed the title [BUGFIX]: Java scripts max int is 2^53 - 1, longs are bigger [BUGFIX]: JavaScripts max int is 2^53 - 1, longs are bigger Jan 9, 2018
@fabianmenges fabianmenges force-pushed the fmenges/hacky_js_long_fix branch 2 times, most recently from 6f9cce5 to 2ee6990 Compare January 16, 2018 14:41
@jeffreythewang
Copy link
Contributor

jeffreythewang commented Mar 13, 2018

Any updates on this besides a rebase (and a possible test fix)?

@codecov-io
Copy link

Codecov Report

Merging #4005 into master will increase coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4005      +/-   ##
==========================================
+ Coverage   71.22%   71.23%   +0.01%     
==========================================
  Files         190      190              
  Lines       14880    14900      +20     
  Branches     1098     1098              
==========================================
+ Hits        10598    10614      +16     
- Misses       4279     4283       +4     
  Partials        3        3
Impacted Files Coverage Δ
superset/utils.py 87.85% <100%> (+0.02%) ⬆️
superset/viz.py 78.43% <80%> (-0.03%) ⬇️
superset/dataframe.py 96.55% <87.5%> (-0.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97afcd5...f43ad4f. Read the comment docs.

@mistercrunch mistercrunch merged commit 9a79d33 into apache:master Apr 4, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
)

* [BUGFIX]: Java scripts max int is 2^53 - 1 longs are bigger and frequently used as IDs this is a hacky fix.

* Keep tuple as tuple
timifasubaa pushed a commit to timifasubaa/incubator-superset that referenced this pull request May 31, 2018
)

* [BUGFIX]: Java scripts max int is 2^53 - 1 longs are bigger and frequently used as IDs this is a hacky fix.

* Keep tuple as tuple
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
)

* [BUGFIX]: Java scripts max int is 2^53 - 1 longs are bigger and frequently used as IDs this is a hacky fix.

* Keep tuple as tuple
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants