Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test visualisations for datasets with NaN #641

Open
loichuder opened this issue Apr 29, 2021 · 9 comments
Open

Test visualisations for datasets with NaN #641

loichuder opened this issue Apr 29, 2021 · 9 comments
Labels
epic Issue that will need to be split up later on

Comments

@loichuder
Copy link
Member

Working on #639 made me wonder if h5web is really robust for datasets containing NaN.

I think we should test the fetching and visualizations for:

  • a scalar dataset containing NaN
  • an array dataset containing NaN and regular values
  • an array dataset containing only NaNs
@loichuder loichuder self-assigned this Apr 29, 2021
@loichuder
Copy link
Member Author

This poses an issue at fetching: NaN and Infinityare not supported by the JSON format. There are several ways of deal with such values.

The default behaviour or the json Python serializer is to replace Infinity and NaN by their JS equivalent: https://docs.python.org/3/library/json.html#json.dump. Unfortunately, this produces invalid JSON as in HSDS: HDFGroup/hsds#87.

A common workaround is to serialize Infinity and NaN as null. This is the approach currently in use in jupyterlab_hdf and orjson. While the JSON is valid, we cannot distinguish between NaN and Infinity in h5web. This also means that strictly speaking, the typing of our ndarray<number> is not valid as we could get ndarray<number | null> instead. I started some work on #642 but we wish to instead support NaN and Infinity explicitly and be able to distinguish the two.

The only approach I see to enable this is to serialize NaN and Infinity as strings as suggested in jupyterlab/jupyterlab-hdf5#22 (proposed implementation in jupyterlab/jupyterlab-hdf5#97). Note the use of orjson (that looks very promising for performance reasons) excludes this approach (jupyterlab/jupyterlab-hdf5#98).

@loichuder
Copy link
Member Author

A common workaround is to serialize Infinity and NaN as null. This is the approach currently in use in jupyterlab_hdf and orjson.

Update: this behaviour was implemented in HSDS when using ignore_nan: HDFGroup/hsds#87

@loichuder
Copy link
Member Author

Here is the current status for our three providers:

HSDS

When fetching

  • As we don't use the ignore_nan parameter, we receive invalid JSON containing NaN and (-)Infinity
  • We transform this invalid JSON in the Provider by replacing the invalid values by their string counterpart "NaN" and "(-)Infinity":
    return JSON.parse(data.replaceAll(/(NaN|-?Infinity)/gu, '"$&"'));

When displaying

  • ✔️ The MatrixVis and display of attributes work fine
  • ✔️ The HeatmapVis works for custom domains
  • ScalarVis shows 0 for these values
  • LineVis show a blank canvas even for slices containing only valid values
  • ❌ Computation of the domain in the HeatmapVis yields NaN or Infinity values

The biggest issue here in my opinion is that NaN/Infinity are not explicitly handled in the domain computation. Implementing this specific handling could solve the LineVis and HeatmapVis issues.

Jupyter/h5grove

When fetching

  • NaN and (-)Infinity are serialized as null
  • No transformation occurs in the Provider

When displaying

  • ✔️ The LineVis do not show points for these values
  • ❌ Attributes get displayed as null
  • ❌ The null value gets interpreted as 0 in most numeric contexts: value display in the tooltip, domain computation, colormap mapping, MatrixVis...
  • ❌ Sole exception of the rule above: ScalarVis throws with val is null

Issues stems from the fact that null is not expected in the typings nor in the numeric contexts. As a short-term fix, we could add a transformation in the Provider from null to NaN to leverage the NaN handling we will implement for HSDS. The long-term fix would be to request data in binary rather than JSON.

@loichuder
Copy link
Member Author

HSDS

#775 fixed

  • LineVis show a blank canvas even for slices containing only valid values
  • Computation of the domain in the HeatmapVis yields NaN or Infinity values

The only remaining issue is ScalarVis shows 0 for these values but after investigation, this is a general issue of HSDS for numeric scalar datasets: HDFGroup/hsds#100.

Jupyter/h5grove

Issues were not fixed. The implementation #776 that transforms null in NaN implied a dent on performance that was too important.

@axelboc
Copy link
Contributor

axelboc commented Oct 26, 2021

Pasting this from #817:

h5grove

When fetching

  • ✔️ NaN and (-)Infinity values are conserved in the payload and interpreted as their JS counterpart in the Provider

When displaying

  • ✔️ The LineVis do not show points for these values
  • ✔️ MatrixVis, ScalarVis and the tooltip display the correct value NaN/Infinity
  • ❌ Attributes get still displayed as null as the binary format is only for dataset values

@axelboc
Copy link
Contributor

axelboc commented Oct 26, 2021

I think the next step would be to allow fetching attribute values as binary in H5Grove.

The current API endpoint of H5Grove returns a JSON dictionary, even when fetching a single attribute, so we would need to change this. We could of course add a new endpoint that returns the value of a single attribute as binary, but a better solution may be to keep the current endpoint and its batching capability, and use the multipart/* MIME type.

This page gives an example of a multipart/form-data response that includes both JSON and binary data: https://blog.marcinbudny.com/2014/02/sending-binary-data-along-with-rest-api.html

@loichuder
Copy link
Member Author

Note that HDFGroup/hsds#100 is now fixed so that NaN/Infinity support is complete in HSDS !

At a price at a replaceAll in the Provider:

return JSON.parse(data.replaceAll(/(NaN|-?Infinity)/gu, '"$&"'));

Only the support of NaN/Infinity for attributes in h5grove is left: I am curious to try multipart/form-data !

@axelboc
Copy link
Contributor

axelboc commented Oct 27, 2021

Excellent!! Do we then have to pass a reviver to the JSON.parse() call to transform "NaN" and "(-)Infinity" strings back to JS "numbers"?

@loichuder
Copy link
Member Author

Excellent!! Do we then have to pass a reviver to the JSON.parse() call to transform "NaN" and "(-)Infinity" strings back to JS "numbers"?

It sort of works when keeping these as strings but it perhaps would be cleaner indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that will need to be split up later on
Projects
None yet
Development

No branches or pull requests

2 participants