Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly handle large numbers in responses #74

Merged
merged 6 commits into from
Sep 14, 2018

Conversation

ofrobots
Copy link
Contributor

By default axios would parse strings as JSON [1]. The strings returned
by the metadata service are not JSON. This causes strings with numbers
to be parsed as JavaScript Numbers which loses precision. Strings should
be left unperturbed.

1: https://github.com/axios/axios/blob/9005a54a8b42be41ca49a31dcfda915d1a91c388/lib/defaults.js#L60

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 24, 2018
@codecov
Copy link

codecov bot commented Aug 24, 2018

Codecov Report

Merging #74 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #74   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           1      1           
  Lines          50     55    +5     
  Branches       12     14    +2     
=====================================
+ Hits           50     55    +5
Impacted Files Coverage Δ
src/index.ts 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec1655d...4653333. Read the comment docs.

@JustinBeckwith
Copy link
Contributor

It looks like this change would stop JSON.parse for all API calls, is that right? I have a few thoughts on that:

  1. I think we could just set responseType to text on the request object to get the same effect instead of overriding transformResponse:
    https://github.com/axios/axios#request-config

  2. Wouldn't this be a pretty hairy breaking change? If we're returning objects for most calls today, wouldn't we suddenly be returning strings instead?

@ofrobots
Copy link
Contributor Author

I think we could just set responseType to text on the request object to get the same effect

Nope. This does not work. Axios always does a JSON.parse for strings.

Wouldn't this be a pretty hairy breaking change? If we're returning objects for most calls today, wouldn't we suddenly be returning strings instead?

There is potential to be breaking, if there are users who expect a JS object to be returned. I am not aware of any such place in the users of this module in the client libraries though. AFAIK, all users expect strings today. Users will have to JSON.parse themselves if they want an object.

I'm fine with marking this semver major.

@ofrobots
Copy link
Contributor Author

An alternative thing to do here would be to look at the response content-type and do JSON.parsing only if it is json. I expect that feature belongs in axios rather than here, but I will entertain counter arguments.

@JustinBeckwith
Copy link
Contributor

I think doing the JSON.parse-ing if provided with the content-type is the safest thing to do here. At a glance, there are multiple places in the auth library where we expect to get objects back. I have to think this would break others pretty hard as well.

@ofrobots
Copy link
Contributor Author

Thanks for showing me the example of users expecting objects. I've added selective parsing heregit. PTAL.

@stephenplusplus
Copy link
Contributor

So the metadata server only sometimes responds in JSON? Shouldn't it be consistent, or is this more complicated than it seems?

@ofrobots ofrobots changed the title fix: metadata response strings are not JSON fix: metadata response strings are always JSON Aug 24, 2018
@JustinBeckwith
Copy link
Contributor

Given our history with the metadata server (aye, it be a fickle beast), I would feel a lot better if we could validate it actually returns the correct contentType header in both GCE and GCF. Sorry for asking you do more work @ofrobots, this library has always scared the crap out of me.

JustinBeckwith
JustinBeckwith previously approved these changes Aug 24, 2018
@ofrobots
Copy link
Contributor Author

@stephenplusplus documented on this page: https://cloud.google.com/compute/docs/storing-retrieving-metadata

Specifically:

By default, recursive contents are returned in JSON format. If you want to return these contents in text format, append the alt=text query parameter.

In theory we should be able to pass alt=json and we will be guaranteed that the value returned is always JSON. The problem is that the metadata service is not always returning valid JSON.

Example:

$ curl -H 'Metadata-Flavor: Google' 'http://metadata.google.internal/computeMetadata/v1/instance/id?alt=json'
4520031799277581759

4520031799277581759 is not valid JSON. Parsing it will lose precision. I guess this should be opened as a bug against the metadata service.

@stephenplusplus
Copy link
Contributor

Ah, interesting. Is that the only scenario-- out of bounds integers?

@ofrobots
Copy link
Contributor Author

Sorry for asking you do more work @ofrobots, this library has always scared the crap out of me.

I'm off on vacation in a coupla hours. I'll go ahead and assign it to you @JustinBeckwith 😜.

@JustinBeckwith
Copy link
Contributor

NOT IT! NOT IT! 🤣 No problem. Enjoy vacation!

@stephenplusplus
Copy link
Contributor

Please correct me if I'm wrong:

If we ask for a specific value that is an integer, it has the potential to be out of bounds. It would be returned un-stringified, like: 4520031799277581759. We try to parse that, and we lose precision: JSON.parse(4520031799277581759) === 4520031799277582000

If we request several values from the metadata server, it will return it as key-value JSON response. But, isn't the same problem possible?

{
  someNumber: 4520031799277581759,
  someBool: true,
  etc: '...'
}

In other words, I'm wondering if we're always at risk of losing precision when we are returned a non-stringified Number value.

@ofrobots
Copy link
Contributor Author

ofrobots commented Sep 5, 2018

@stephenplusplus You're right. Recursive queries return JSON by default. Those would still be invalid JSON. This needs to be fixed in the metadata service itself. I have opened a bug for that.

In the meanwhile I think this works around the issue in the common case where folks are not doing recursive queries?

@stephenplusplus
Copy link
Contributor

Is a fix possible for the metadata service, though? If it converts number values to strings, I'm not sure how we would we know the difference when returning it to the user. Or how they would know it's out of bounds or not, and how to handle that.

In the meanwhile I think this works around the issue in the common case where folks are not doing recursive queries?

My only concern is that this PR leaves things in a bit of a jumbled state. It will return a string for all returned single-property values, even where other data types are expected. I believe that we used to return the parsed value, i.e. true and now we will return "true", right?

The jumbled part comes in where we won't JSON.parse single properties, but we will JSON.parse when the user requests multiple properties.

I'm thinking that the user probably wants JSON.parse most of the time, since to use the values returned from the metadata service in any meaningful way, they would have to do the same on their end. From that assumption (correct if wrong), maybe allowing the user to specify the transformResponse function as an option, or something like skipParse: false would be a cleaner way to go.

If we just need an immediate fix, and this is no big deal, 'cause only we use this thing, then YOLO.

@ofrobots
Copy link
Contributor Author

ofrobots commented Sep 6, 2018

Okay, I did some more research. It seems that as far as RFC 4627 is concerned, large numbers are valid JSON. The spec doesn't specify any precision for numbers. This means that using JSON.parse is unsafe unless the API guarantees that it will return large numbers as strings (or otherwise JSON.parseable data type). I do think that the metadata service should be returning strings for the instance/id rather than numbers, but strictly speaking, what is being returned today is valid JSON.

I pushed another commit that takes another approach to the problem. We now use json-bigint to parse JSON. This would have no change to the current behaviour if the numbers are representable as JavaScript numbers. For larger numbers, it returns a BigNumber.

This would be a semver major change, but only for the cases where we already has precision loss.

WDYT?

@ofrobots ofrobots dismissed JustinBeckwith’s stale review September 6, 2018 21:55

Implementation has change substantially.

@ofrobots ofrobots added the semver: major Hint for users that this is an API breaking change. label Sep 6, 2018
@codecov
Copy link

codecov bot commented Sep 6, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@c8d71fd). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #74   +/-   ##
=========================================
  Coverage          ?   98.18%           
=========================================
  Files             ?        1           
  Lines             ?       55           
  Branches          ?       13           
=========================================
  Hits              ?       54           
  Misses            ?        1           
  Partials          ?        0
Impacted Files Coverage Δ
src/index.ts 98.18% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8d71fd...8005dcc. Read the comment docs.

@ofrobots
Copy link
Contributor Author

ofrobots commented Sep 6, 2018

For reference: https://www.npmjs.com/package/bignumber.js

@ofrobots ofrobots changed the title fix: metadata response strings are always JSON fix: properly handle large numbers in responses Sep 6, 2018
@ofrobots ofrobots assigned ofrobots and unassigned JustinBeckwith Sep 6, 2018
@stephenplusplus
Copy link
Contributor

It's hard to believe how perfectly this module targets what we're doing here. The only thing that could be difficult for a user is receiving a String sometimes and a Number others. If we were writing docs to show how number values will come through, (which, maybe we should), would we show: "if (string) ... else if (number) ..."?

For comparison, in Spanner, when the user or API provides a number that is out of bounds, we throw when the user attempts to use it for anything other than logging / simply passing the value around:

> i = new Int('4520031799277581759')
Int { value: '4520031799277581759' }
> i.value
'4520031799277581759'
> i + 3
Error: Integer 4520031799277581759 is out of bounds.
    at Int.valueOf (repl:1:184)

https://github.com/googleapis/nodejs-spanner/blob/382375cd89d63c3e0b751cce45bd0780fa3aa72f/src/codec.js#L56

Here's the same pattern with BigInt:

> b = bigint.parse('{"value":4520031799277581759}')
{ value: BigNumber { s: 1, e: 18, c: [ 45200, 31799277581759 ] } }
> b.value
BigNumber { s: 1, e: 18, c: [ 45200, 31799277581759 ] }
> b.value.valueOf()
'4520031799277581759'
> b.value + 3
'45200317992775817593'

@ofrobots
Copy link
Contributor Author

So logging on GCE is simply not working correctly at the moment because of this bug. See googleapis/nodejs-logging-bunyan#148. We need to figure out a solution.

For comparison, in Spanner, when the user or API provides a number that is out of bounds, we throw when the user attempts to use it for anything other than logging / simply passing the value around

The problem here is that we have very little semantic knowledge about the value and their types. instance/id happens to be a number-like value, but it is an identifier rather than an actual number on which one may do arithmetic. There may be other values that are legitmately numbers – we don't know apriori. The users of this library do have semantic information. We don't.

Based on that I argue that it is reasonable for our contract to be: for values that are expected to be returned as a number, users will get either a number or BigInt depending on the actual value returned by the API. The user – based on their understanding of the semantics of the value – will need to deal with it appropriately.

@stephenplusplus
Copy link
Contributor

I think that's reasonable. Could you just leave a note about that in the readme?

By default axios would parse strings as JSON [1]. The strings returned
by the metadata service are not JSON. This causes strings with numbers
to be parsed as JavaScript Numbers which loses precision. Strings should
be left unperturbed.

1: https://github.com/axios/axios/blob/9005a54a8b42be41ca49a31dcfda915d1a91c388/lib/defaults.js#L60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement. semver: major Hint for users that this is an API breaking change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants