Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using Report.save() when using Pandas Int64 vs np.int64 #937

Closed
azamdin23 opened this issue Jan 3, 2024 · 0 comments · Fixed by #938
Closed

Error using Report.save() when using Pandas Int64 vs np.int64 #937

azamdin23 opened this issue Jan 3, 2024 · 0 comments · Fixed by #938
Labels
bug Something isn't working

Comments

@azamdin23
Copy link

Description

I am getting a TypeError: Object of type NAType is not JSON serializable exception when trying to save report results to json using Report.save() (or even when using Report.save_html())

The issue occurs when I have categorical features, based on columns that are in pandas own Int64 dtype. Everything works fine if using numpy's int64 dtype instead. The difference is the pandas dtype is nullable, which can be useful. It doesn't seem to matter whether my dataframe actually has null values included though, I always get the serialisation error.

Expected behaviour

I should be able to save reports without error (which already seems to be computed successfully, just unable to serialise to a file).

Reproducible example

import pandas as pd
import numpy as np
from traceback import format_exc
from evidently.pipeline.column_mapping import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

random_ints = np.random.choice([0, 1], size=100)
data = {
    "categories_notok": pd.Series(random_ints, dtype=pd.Int64Dtype()),
    "categories_ok": pd.Series(random_ints, dtype=np.int64),
}
df = pd.DataFrame(data)

print("First run the report on the 'categorical_notok' column, a stacktrace will be printed")
column_mapping = ColumnMapping(numerical_features=[], categorical_features=["categories_notok"])
data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(
    reference_data=df,
    current_data=df,
    column_mapping=column_mapping,
)
try:
    data_drift_report.save("this_wont_save.json")
    print("saved")
except TypeError as exc:
    print(format_exc())
    print("not saved")

print("\nIf I just specify the 'categorical_ok' column it works fine")
column_mapping = ColumnMapping(numerical_features=[], categorical_features=["categories_ok"])
data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(
    reference_data=df,
    current_data=df,
    column_mapping=column_mapping,
)
data_drift_report.save("this_will_save.json")
print("saved")

The output is:

First run the report on the 'categorical_notok' column, a stacktrace will be printed
Traceback (most recent call last):
  File "/tmp/ipykernel_1766/1107039489.py", line 11, in <module>
    data_drift_report.save("this_wont_save.json")
  File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 475, in save
    self._get_snapshot().save(filename)
  File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 398, in save
    json.dump(self.dict(), f, indent=2, cls=NumpyEncoder)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  [Previous line repeated 2 more times]
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 439, in _iterencode
    yield from _iterencode(o, _current_indent_level)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 429, in _iterencode
    yield from _iterencode_list(o, _current_indent_level)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/utils/numpy_encoder.py", line 54, in default
    return json.JSONEncoder.default(self, o)
  File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type NAType is not JSON serializable

not saved

If I just specify the 'categorical_ok' column it works fine
saved

Additional info

Pandas version: 2.1.4
Numpy version: 1.26.3
Evidently version: 0.4.13

@emeli-dral emeli-dral added the bug Something isn't working label Jan 4, 2024
emeli-dral pushed a commit that referenced this issue Jan 9, 2024
* #937: Add check for pandas null in json conversion.

* #937: Check that object isn't sequence before checking for null.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants