-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix/correctly share session handler across ingest docs #1806
Conversation
how does one test this locally? I'm on your branch and added a log here and set num_processes to 1 in the test. the output showed we were creating the session handle several times over:
|
@ryannikolaidis Since Roman is in the google_drive connector file. Can we remove |
yea, probably okay to just cut a quick branch to remove that to keep it clean |
@ryannikolaidis When you tested did you also have the changes from the serialization fix? I'm still seeing the top 5 *** create session handle *** But the ones below that I'm not seeing because the serialization fix is working correctly once it goes to the partition phase. |
@potter-potter right, but still should only see 2 in that, correct? (that was what my eyes were on here) |
Yep. exactly. So it seems to not be working yet. |
So why does it call update_source_metadata before it calls I'm guessing it calls update_source_metadata right when it instantiates the GoogleDriveIngestDoc (for file in files) and at that point it can't see the other object's session_handle. But I haven't fully checked this out. |
919cd9a
to
5473138
Compare
FYI, part of the issue here was that source metadata was being fetched as part of serializing the ingest docs after the doc factory step in the pipeline. This PR now depends on a fix in the serializing PR: #1800. Will revisit this PR and rerun validation after that gets merged into main. |
Just merged in Main into my local branch. Definitely an improvement! We are still grabbing credentials on the partition though. 3 times (1 per file). And the _source_metadata comes in as None. |
dabef64
to
6e858a1
Compare
After rebasing and running this again, I only say one session handle created during the google drive ingest test. |
I did what I believe was the same as you with the Rebasing. I assume Rebasing from main. I put print statement in
And then a breakpoint to check if _source_metadata was there: ingest/interfaces.py
And I still get the 3 calls for credits during partitioning. (in addition to the 2 expected ones for listing and downloading) |
89ad964
to
4ac35b9
Compare
hmm, running locally I'm hitting:
|
I got the same thing when I tried yesterday. "TypeError: cannot pickle '_cffi_backend.FFI' object" |
I've tried running this locally on my mac as well as on a fresh rocky image that is supported by this repo and was not able to reproduce this pickling issue. |
Which python version? I'll try that one... |
@potter-potter |
@ryannikolaidis @rbiseck3 Well I ran it with 3.10.12 and:
|
Worked fine in python 3.8.18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working well for me.
4ac35b9
to
6a13ecb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking until we can resolve noted issue running on mac
Python 3.10.11. |
still seeing same issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, looks like not related to version. created a fresh virtual env but still on 3.10.11 and now it passes. I guess worth noting internally if other folks run into this that clean env seems to do the trick
6a13ecb
to
49e19ce
Compare
49e19ce
to
42f0148
Compare
Description
Fix session handler