Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB source unable to parse datetimes with years > 9999 #10079

Closed
luksfarris opened this issue Mar 19, 2024 · 0 comments · Fixed by #10110
Closed

MongoDB source unable to parse datetimes with years > 9999 #10079

luksfarris opened this issue Mar 19, 2024 · 0 comments · Fixed by #10110
Labels
bug Bug report

Comments

@luksfarris
Copy link

luksfarris commented Mar 19, 2024

Describe the bug

This is a classical python error, if the year on the DateTime of MongoDB is larger than datetime.MAXYEAR (9999), the pymongo driver returns an error. This is a super easy fix, and the stack trace hints at the it:

bson.errors.InvalidBSON: year 275760 is out of range (Consider Using CodecOptions(datetime_conversion=DATETIME_AUTO) or MongoClient(datetime_conversion='DATETIME_AUTO')). See: https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes

To Reproduce
Steps to reproduce the behavior:

  1. Create a mongodb document with a datetime field with year > 9999
  2. Ingest data from this MongoDB collection
  3. See error

Expected behavior
DataHub should be able to parse documents with unreasonably large years.

Additional context

I'm on version 0.13.1.

I think the fix would be changing datahub/ingestion/source/mongodb.py line 285, and adding the following:

self.mongo_client = pymongo.MongoClient(self.config.connect_uri,datetime_conversion='DATETIME_AUTO', **options)  # type: ignore

Sorry that I don't have time now to submit a PR. If nobody gets to this before, I can try doing it in the weekend. Thank you for DataHub, I really love it

@luksfarris luksfarris added the bug Bug report label Mar 19, 2024
@luksfarris luksfarris changed the title MongoDB source unable to parse large timestamps MongoDB source unable to parse datetimes with years > 9999 Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
1 participant