Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ObjectId in to_builtins/convert #516

Closed
lorien opened this issue Aug 14, 2023 · 4 comments · Fixed by #517
Closed

Support for ObjectId in to_builtins/convert #516

lorien opened this issue Aug 14, 2023 · 4 comments · Fixed by #517

Comments

@lorien
Copy link

lorien commented Aug 14, 2023

Description

Values of ObjectId type are used as primary key in every collection in MongoDB database.

From the MongoDB documentation:

ObjectIds are small, likely unique, fast to generate, and ordered. ObjectId values are 12 bytes in length, consisting of:

  • A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
  • A 5-byte random value generated once per process. This random value is unique to the machine and process.
  • A 3-byte incrementing counter, initialized to a random value.

For timestamp and counter values, the most significant bytes appear first in the byte sequence (big-endian). This is unlike other BSON values, where the least significant bytes appear first (little-endian).

If an integer value is used to create an ObjectId, the integer replaces the timestamp.

In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.

At the moment it is not possible to load object from mongodb (as python dict) and convert it into Struct instance because there is no support for ObjectId type.

Just a simple example which illustrates how a record created in mongodb collection has an ObjectId field.

>>> from pymongo import MongoClient
>>> client = MongoClient()
>>> from pymongo import MongoClient
>>> db = MongoClient()["test"]
>>> db.animal.insert_one({"name": "wolf"})
<pymongo.results.InsertOneResult object at 0x7f7063e3a4d0>
>>> record = db.animal.find_one({})
>>> print(record)
{'_id': ObjectId('64da6aabb1143f2d43f999f3'), 'name': 'wolf'}
>>> print(type(record["_id"]))
<class 'bson.objectid.ObjectId'>

ObjectID documentation:

@jcrist
Copy link
Owner

jcrist commented Aug 14, 2023

Seems to work fine for me:

In [1]: import bson

In [2]: import msgspec

In [3]: class Ex(msgspec.Struct):
   ...:     _id: bson.objectid.ObjectId
   ...:     x: int
   ...:     y: str
   ...: 

In [4]: msg = {"_id": bson.objectid.ObjectId(), "x": 1, "y": "two"}

In [5]: msg
Out[5]: {'_id': ObjectId('64da70341916b9d3c85d516d'), 'x': 1, 'y': 'two'}

In [6]: type(msg["_id"])
Out[6]: bson.objectid.ObjectId

In [7]: msgspec.convert(msg, type=Ex)
Out[7]: Ex(_id=ObjectId('64da70341916b9d3c85d516d'), x=1, y='two')

Can you write out an example struct definition and dict input showing the issue? Something that doesn't require instantiating a mongo client that I can run locally.

@lorien
Copy link
Author

lorien commented Aug 14, 2023

Hmm. Actually I did not try to do it. I was under impression that in Struct I can use only types described at https://jcristharif.com/msgspec/supported-types.html

@jcrist
Copy link
Owner

jcrist commented Aug 14, 2023

There's a line at the bottom of the list saying

Additional types may be supported through extensions.

If you have concrete suggestions for how we can make this clearer, please let me know (or better yet, submit a PR :)).

When using msgspec.convert custom types like ObjectId returned by the wrapped protocol (bson in this case) will "just work". Custom types that need to be coerced from a builtin type (e.g. parsing a str to some object) will require a dec_hook.


You're right though, currently there's no way to get to_builtins to emit an ObjectId - you'd have to coerce it to a str with enc_hook=str.

In [15]: id = bson.objectid.ObjectId()

In [16]: msgspec.to_builtins({"_id": id}, enc_hook=str)
Out[16]: {'_id': '64da73f09b6133c8e60b2ea3'}

If this is insufficient, we can add a way to support to_builtins passing through protocol-specific types like ObjectId.

@lorien
Copy link
Author

lorien commented Aug 15, 2023

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants