Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Handling of deprecated np.bool_ is not ideal #1006

Closed
1 of 3 tasks
Andrew-S-Rosen opened this issue Oct 26, 2024 · 8 comments
Closed
1 of 3 tasks

[Bug]: Handling of deprecated np.bool_ is not ideal #1006

Andrew-S-Rosen opened this issue Oct 26, 2024 · 8 comments
Labels

Comments

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Oct 26, 2024

Code snippet

# pip install maggma ase==3.23.0 pymatgen==2024.10.25 numpy==1.26.4
from ase.build import bulk
from pymatgen.io.ase import AseAtomsAdaptor
from maggma.stores import MemoryStore

atoms = bulk("Cu")
structure = AseAtomsAdaptor.get_structure(atoms)
print(type(structure[0].lattice.pbc[0]))  # np.bool_

store = MemoryStore()
with store as s:
    s.update(structure, key="charge")

What happened?

When trying to store a document with a numpy 1.x np.bool_ entry, the underlying bson dependency raises an InvalidDocument error and maggma can't store the document in the connected store. Some sanitization should likely be done to address this.

Hmm, but the following works okay, so there is something in-between here:

with store as s:
    s.update({"test": 1, "val": (np.bool_)}, key="test")

Version

0.70.0

Which OS?

  • MacOS
  • Windows
  • Linux

Log output

Cell In[1], line 11
      9 store = MemoryStore()
     10 with store as s:
---> 11     s.update(structure, key="charge")

File ~\miniconda\envs\test\Lib\site-packages\maggma\stores\mongolike.py:395, in MongoStore.update(self, docs, key)
    393 if len(requests) > 0:
    394     try:
--> 395         self._collection.bulk_write(requests, ordered=False)
    396     except (OperationFailure, DocumentTooLarge) as e:
    397         if self.safe_update:

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:1852, in Collection.bulk_write(self, requests, ordered, bypass_document_validation, session)
   1850 for operation in requests:
   1851     operation._add_to_bulk(bulk)
-> 1852 return BulkWriteResult(bulk.execute(), True)

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:331, in BulkOperationBuilder.execute(self, write_concern)
    329 exec_name = execute_func.__name__
    330 try:
--> 331     op_result = execute_func()
    332 except WriteError as error:
    333     result['writeErrors'].append({
    334         'index': index,
    335         'code': error.code,
    336         'errmsg': str(error),
    337     })

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:157, in BulkWriteOperation.register_update_op.<locals>.exec_update()
    156 def exec_update():
--> 157     result = collection._update(spec=selector, document=document,
    158                                 multi=multi, upsert=self.is_upsert,
    159                                 **extra_args)
    160     ret_val = {}
    161     if result.get('upserted'):

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:896, in Collection._update(self, spec, document, upsert, manipulate, multi, check_keys, hint, session, collation, let, array_filters, **kwargs)
    894     if not check_keys:
    895         _validate_data_fields(document)
--> 896     BSON.encode(document, check_keys=check_keys)
    897 existing_document.update(self._internalize_dict(document))
    898 if existing_document['_id'] != _id:

File ~\miniconda\envs\test\Lib\site-packages\bson\__init__.py:1429, in BSON.encode(cls, document, check_keys, codec_options)
   1403 @classmethod
   1404 def encode(
   1405     cls: Type[BSON],
   (...)
   1408     codec_options: CodecOptions[Any] = DEFAULT_CODEC_OPTIONS,
   1409 ) -> BSON:
   1410     """Encode a document to a new :class:`BSON` instance.
   1411
   1412     A document can be any mapping type (like :class:`dict`).
   (...)
   1427        Replaced `uuid_subtype` option with `codec_options`.
   1428     """
-> 1429     return cls(encode(document, check_keys, codec_options))

File ~\miniconda\envs\test\Lib\site-packages\bson\__init__.py:1050, in encode(document, check_keys, codec_options)
   1047 if not isinstance(codec_options, CodecOptions):
   1048     raise _CODEC_OPTIONS_TYPE_ERROR
-> 1050 return _dict_to_bson(document, check_keys, codec_options)

InvalidDocument: cannot encode object: True, of type: <class 'numpy.bool_'>
@Andrew-S-Rosen
Copy link
Member Author

Andrew-S-Rosen commented Oct 26, 2024

Looks like a similar error occurred here: materialsproject/fireworks#522.

@rkingsbury
Copy link
Collaborator

Data is passed through monty.json.jsanitize before being ent to pymongo / bson, and bson itself does not convert numpy types. So I'm thinking the best fix here is to update jsanitize to cast custom numpy types, including np.int32, np.int64, and np.bool_ to native python types.

What do you think?

@Andrew-S-Rosen
Copy link
Member Author

I also think this is the right way to go! Thanks for the suggestion! I will close this issue when I have a moment to open a proper monty issue.

@rkingsbury
Copy link
Collaborator

I also think this is the right way to go! Thanks for the suggestion! I will close this issue when I have a moment to open a proper monty issue.

Thanks @Andrew-S-Rosen ! Interestingly, the existing maggma tests pass when I add all the numpy custom types, including np.bool_ (see #1007 ) , so I'm not sure what is going on here. But explicit handling in jsanitize can't hurt (unless there are users that want to serialize and then recover numpy types other than bool). np.bool_ is deprecated but the int32/int64/float32/float64 are not.

@Andrew-S-Rosen
Copy link
Member Author

I will have to drill down a bit to see how to make this more of a minimal example than I have. Perhaps it is something to do with how it is nested in the data structure. Glad the tests pass though!

@rkingsbury
Copy link
Collaborator

Ah, I guess there's a bigger problem here - I realized that our last Auto Dependency Upgrades PR actually downgraded numpy to 1.26. (See #1004). So the tests probably pass b/c the CI is installing numpy 1.26?

@Andrew-S-Rosen
Copy link
Member Author

In my case, the issue only appears on numpy 1.x! 😅

@Andrew-S-Rosen
Copy link
Member Author

Closed because this is a monty issue: materialsvirtuallab/monty#726.

@Andrew-S-Rosen Andrew-S-Rosen closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants