You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am getting a TypeError: Object of type NAType is not JSON serializable exception when trying to save report results to json using Report.save() (or even when using Report.save_html())
The issue occurs when I have categorical features, based on columns that are in pandas own Int64 dtype. Everything works fine if using numpy's int64 dtype instead. The difference is the pandas dtype is nullable, which can be useful. It doesn't seem to matter whether my dataframe actually has null values included though, I always get the serialisation error.
Expected behaviour
I should be able to save reports without error (which already seems to be computed successfully, just unable to serialise to a file).
Reproducible example
importpandasaspdimportnumpyasnpfromtracebackimportformat_excfromevidently.pipeline.column_mappingimportColumnMappingfromevidently.reportimportReportfromevidently.metric_presetimportDataDriftPresetrandom_ints=np.random.choice([0, 1], size=100)
data= {
"categories_notok": pd.Series(random_ints, dtype=pd.Int64Dtype()),
"categories_ok": pd.Series(random_ints, dtype=np.int64),
}
df=pd.DataFrame(data)
print("First run the report on the 'categorical_notok' column, a stacktrace will be printed")
column_mapping=ColumnMapping(numerical_features=[], categorical_features=["categories_notok"])
data_drift_report=Report(metrics=[DataDriftPreset()])
data_drift_report.run(
reference_data=df,
current_data=df,
column_mapping=column_mapping,
)
try:
data_drift_report.save("this_wont_save.json")
print("saved")
exceptTypeErrorasexc:
print(format_exc())
print("not saved")
print("\nIf I just specify the 'categorical_ok' column it works fine")
column_mapping=ColumnMapping(numerical_features=[], categorical_features=["categories_ok"])
data_drift_report=Report(metrics=[DataDriftPreset()])
data_drift_report.run(
reference_data=df,
current_data=df,
column_mapping=column_mapping,
)
data_drift_report.save("this_will_save.json")
print("saved")
The output is:
First run the report on the 'categorical_notok' column, a stacktrace will be printed
Traceback (most recent call last):
File "/tmp/ipykernel_1766/1107039489.py", line 11, in <module>
data_drift_report.save("this_wont_save.json")
File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 475, in save
self._get_snapshot().save(filename)
File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 398, in save
json.dump(self.dict(), f, indent=2, cls=NumpyEncoder)
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/__init__.py", line 179, in dump
for chunk in iterable:
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
[Previous line repeated 2 more times]
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 439, in _iterencode
yield from _iterencode(o, _current_indent_level)
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 429, in _iterencode
yield from _iterencode_list(o, _current_indent_level)
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/home/ec2-user/SageMaker/data-monitoring-test/.venv/lib/python3.9/site-packages/evidently/utils/numpy_encoder.py", line 54, in default
return json.JSONEncoder.default(self, o)
File "/home/ec2-user/anaconda3/envs/39/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type NAType is not JSON serializable
not saved
If I just specify the 'categorical_ok' column it works fine
saved
Description
I am getting a
TypeError: Object of type NAType is not JSON serializable
exception when trying to save report results to json usingReport.save()
(or even when usingReport.save_html()
)The issue occurs when I have categorical features, based on columns that are in pandas own
Int64
dtype. Everything works fine if using numpy'sint64
dtype instead. The difference is the pandas dtype is nullable, which can be useful. It doesn't seem to matter whether my dataframe actually has null values included though, I always get the serialisation error.Expected behaviour
I should be able to save reports without error (which already seems to be computed successfully, just unable to serialise to a file).
Reproducible example
The output is:
Additional info
Pandas version: 2.1.4
Numpy version: 1.26.3
Evidently version: 0.4.13
The text was updated successfully, but these errors were encountered: