Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GET on datasets created with fillvalue=np.NaN give invalid JSON #87

Closed
loichuder opened this issue Apr 29, 2021 · 6 comments
Closed

GET on datasets created with fillvalue=np.NaN give invalid JSON #87

loichuder opened this issue Apr 29, 2021 · 6 comments

Comments

@loichuder
Copy link
Contributor

loichuder commented Apr 29, 2021

Say a dataset was created by h5py in the following fashion:

h5file.create_dataset("name", shape=[5], fillvalue=np.NaN)

The dataset info created by GET_Dataset produces then a JSON with a field:
"creationProperties": {"fillValue": NaN}.

Unquoted NaN is not supported by the JSON spec so this cannot be parsed:

JSON.parse("{\"fillValue\": NaN}")
// Uncaught SyntaxError: Unexpected token N in JSON at position 14

The presence of this unsupported value is explained the JSON serialization which is made by aiohttp.web.json_response which in turn use json.dumps. The default behaviour of json.dumps is to allow unquoted NaN.

A possible way to solve this would be to parametrize aiohttp.web.json_response to quote NaN or to send null instead:

JSON.parse("{\"fillValue\": \"NaN\"}")
// { fillValue: 'NaN' }

Edit: This issue also pops when getting data that contains NaN

@jreadey
Copy link
Member

jreadey commented May 22, 2021

I've added a query option "ignore_nan" for GET values that will return null rather than NaN in the json response. Try out the nanvalue branch. Commit is here: 66a0dc0.

Haven't added support for sql query operations or attribute data yet, but let me know if this resolves the immediate issue.

BTW - Is there a mean to use binary responses from javascript? (guess you would need the javascript equivalent of numpy). Binary vs JSON would be more efficient esp. for larger responses.

@loichuder
Copy link
Contributor Author

loichuder commented May 25, 2021

I've added a query option "ignore_nan" for GET values that will return null rather than NaN in the json response. Try out the nanvalue branch. Commit is here: 66a0dc0.

Haven't added support for sql query operations or attribute data yet, but let me know if this resolves the immediate issue.

Yup, it fixes the issue for NaN/Infinity in dataset data. However, as you said, NaN in attribute data or in fillValue (which get serialized in the metadata) will still produce invalid JSON.

BTW - Is there a mean to use binary responses from javascript? (guess you would need the javascript equivalent of numpy). Binary vs JSON would be more efficient esp. for larger responses.

This is something that we are actively exploring right now as it would indeed be more efficient and solve such issues.

For the record, in h5web, we are using ndarray as the "JS numpy" which make projects like js-numpy-parser especially interesting to us.

@jreadey
Copy link
Member

jreadey commented May 27, 2021

I've updated the branch to return the string "nan" in the fillValue for GET dataset. As with values, you'll need to add "ignore_nan" in the query option. See the commit here: 641eb6b.

I'll look at attribute support next.

@jreadey
Copy link
Member

jreadey commented May 30, 2021

And added attribute support here: e844162.

I think this should be it for supporting possible NaN data responses.

@jreadey
Copy link
Member

jreadey commented Jun 22, 2021

Merged to master. Will close issue - please reopen if you run into any problems.

@jreadey jreadey closed this as completed Jun 22, 2021
@loichuder
Copy link
Contributor Author

Sorry for not following this up earlier. It works fine 👍 !

The only missing piece would be that the JSON is still invalid when fillValue is set to (-)Infinity.

To be completely upfront, we changed our approach on h5web: we search for NaN/Infinity in the text response to replace them by their string equivalent before JSON deserialization. This allows to have valid JSON and still distinguish between NaN and Infinity, which is something we cannot do with ignore_nan as it converts both to null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants