Error using Report.save() when using Pandas Int64 vs np.int64 #937

azamdin23 · 2024-01-03T21:41:51Z

Description

I am getting a TypeError: Object of type NAType is not JSON serializable exception when trying to save report results to json using Report.save() (or even when using Report.save_html())

The issue occurs when I have categorical features, based on columns that are in pandas own Int64 dtype. Everything works fine if using numpy's int64 dtype instead. The difference is the pandas dtype is nullable, which can be useful. It doesn't seem to matter whether my dataframe actually has null values included though, I always get the serialisation error.

Expected behaviour

I should be able to save reports without error (which already seems to be computed successfully, just unable to serialise to a file).

Reproducible example

import pandas as pd
import numpy as np
from traceback import format_exc
from evidently.pipeline.column_mapping import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

random_ints = np.random.choice([0, 1], size=100)
data = {
    "categories_notok": pd.Series(random_ints, dtype=pd.Int64Dtype()),
    "categories_ok": pd.Series(random_ints, dtype=np.int64),
}
df = pd.DataFrame(data)

print("First run the report on the 'categorical_notok' column, a stacktrace will be printed")
column_mapping = ColumnMapping(numerical_features=[], categorical_features=["categories_notok"])
data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(
    reference_data=df,
    current_data=df,
    column_mapping=column_mapping,
)
try:
    data_drift_report.save("this_wont_save.json")
    print("saved")
except TypeError as exc:
    print(format_exc())
    print("not saved")

print("\nIf I just specify the 'categorical_ok' column it works fine")
column_mapping = ColumnMapping(numerical_features=[], categorical_features=["categories_ok"])
data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(
    reference_data=df,
    current_data=df,
    column_mapping=column_mapping,
)
data_drift_report.save("this_will_save.json")
print("saved")

The output is:

First run the report on the 'categorical_notok' column, a stacktrace will be printed
Traceback (most recent call last):
  File "/tmp/ipykernel_1766/1107039489.py", line 11, in <module>
    data_drift_report.save("this_wont_save.json")
  File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 475, in save
    self._get_snapshot().save(filename)
  File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 398, in save
    json.dump(self.dict(), f, indent=2, cls=NumpyEncoder)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  [Previous line repeated 2 more times]
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 439, in _iterencode
    yield from _iterencode(o, _current_indent_level)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 429, in _iterencode
    yield from _iterencode_list(o, _current_indent_level)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/utils/numpy_encoder.py", line 54, in default
    return json.JSONEncoder.default(self, o)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type NAType is not JSON serializable

not saved

If I just specify the 'categorical_ok' column it works fine
saved

Additional info

Pandas version: 2.1.4
Numpy version: 1.26.3
Evidently version: 0.4.13

The text was updated successfully, but these errors were encountered:

* #937: Add check for pandas null in json conversion. * #937: Check that object isn't sequence before checking for null.

emeli-dral added the bug Something isn't working label Jan 4, 2024

Liraim added a commit that referenced this issue Jan 5, 2024

#937: Add check for pandas null in json conversion.

d52981d

Liraim mentioned this issue Jan 5, 2024

#937: Add check for pandas null in json conversion. #938

Merged

Liraim added a commit that referenced this issue Jan 5, 2024

#937: Check that object isn't sequence before checking for null.

c209184

emeli-dral closed this as completed in #938 Jan 9, 2024

emeli-dral pushed a commit that referenced this issue Jan 9, 2024

#937: Add check for pandas null in json conversion. (#938)

7a3f5d4

* #937: Add check for pandas null in json conversion. * #937: Check that object isn't sequence before checking for null.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using Report.save() when using Pandas Int64 vs np.int64 #937

Error using Report.save() when using Pandas Int64 vs np.int64 #937

azamdin23 commented Jan 3, 2024

Error using Report.save() when using Pandas Int64 vs np.int64 #937

Error using Report.save() when using Pandas Int64 vs np.int64 #937

Comments

azamdin23 commented Jan 3, 2024

Description

Expected behaviour

Reproducible example

Additional info