Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/correctly share session handler across ingest docs #1806

Merged
merged 2 commits into from
Oct 31, 2023

Conversation

rbiseck3
Copy link
Contributor

@rbiseck3 rbiseck3 commented Oct 19, 2023

Description

Fix session handler

@ryannikolaidis
Copy link
Contributor

how does one test this locally? I'm on your branch and added a log here and set num_processes to 1 in the test. the output showed we were creating the session handle several times over:


2023-10-19 17:08:13,769 MainProcess INFO     running pipeline: DocFactory -> Reader -> Partitioner -> Copier with config: {"reprocess": true, "verbose": true, "work_dir": "/Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive", "output_dir": "/Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/structured-output/google-drive", "num_processes": 1, "raise_on_error": false}
2023-10-19 17:08:15,358 MainProcess INFO     Running doc factory to generate ingest docs. Source connector: {"processor_config": {"reprocess": true, "verbose": true, "work_dir": "/Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive", "output_dir": "/Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/structured-output/google-drive", "num_processes": 1, "raise_on_error": false}, "read_config": {"download_dir": "/Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive", "re_download": false, "preserve_downloads": true, "download_only": false, "max_docs": null}, "connector_config": {"drive_id": "1OQZ66OHBE30rNsNa7dweGLfRmXvkT_jr", "service_account_key": "/var/folders/6z/chjl3j496jqgl_ztrs4063hm0000gn/T/tmp.ICVOOM6v", "extension": null, "recursive": false}}
***** create session handle ****
***** create session handle ****
***** create session handle ****
***** create session handle ****
2023-10-19 17:08:17,314 MainProcess INFO     processing 3 docs via 1 processes
2023-10-19 17:08:17,315 MainProcess INFO     Calling Reader with 3 docs
2023-10-19 17:08:17,315 MainProcess INFO     Running source node to download data associated with ingest docs
***** create session handle ****
2023-10-19 17:08:18,319 MainProcess DEBUG    Creating directory: /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive
2023-10-19 17:08:18,320 MainProcess DEBUG    File downloaded: /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive/117qrVqiCoR5EjYMsDHGdy3UMkEtKr9Q8-test-drive-doc.docx.
2023-10-19 17:08:19,172 MainProcess DEBUG    File downloaded: /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx.
2023-10-19 17:08:20,204 MainProcess DEBUG    File downloaded: /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive/1cTKXAreuj-wYmL38nFnqKvz3X8UKcaMC-foo.txt.
2023-10-19 17:08:20,204 MainProcess INFO     Calling Partitioner with 3 docs
2023-10-19 17:08:20,204 MainProcess INFO     Running partition node to extract content from json files. Config: {"pdf_infer_table_structure": false, "skip_infer_table_types": null, "strategy": "hi_res", "ocr_languages": null, "encoding": null, "fields_include": ["element_id", "text", "type", "metadata", "embeddings"], "flatten_metadata": false, "metadata_exclude": ["coordinates", "filename", "file_directory", "metadata.data_source.date_processed", "metadata.last_modified", "metadata.detection_class_prob", "metadata.parent_id", "metadata.category_depth", "metadata.data_source.version"], "metadata_include": [], "partition_endpoint": "https://api.unstructured.io/general/v0/general", "partition_by_api": false, "api_key": null}, partition kwargs: {}]
2023-10-19 17:08:20,204 MainProcess INFO     Creating /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned
2023-10-19 17:08:20,205 MainProcess INFO     Processing /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive/117qrVqiCoR5EjYMsDHGdy3UMkEtKr9Q8-test-drive-doc.docx
2023-10-19 17:08:20,205 MainProcess DEBUG    Using local partition
***** create session handle ****
2023-10-19 17:08:20,880 MainProcess INFO     writing partitioned content to /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned/3d4482a8a65f553f0bca5de9f62ddccc.json
2023-10-19 17:08:20,881 MainProcess INFO     Processing /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx
2023-10-19 17:08:20,881 MainProcess DEBUG    Using local partition
***** create session handle ****
2023-10-19 17:08:21,374 MainProcess INFO     writing partitioned content to /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned/6af41eb2d29c3ebcab0a8c8fc88966f0.json
2023-10-19 17:08:21,376 MainProcess INFO     Processing /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/download/google-drive/1cTKXAreuj-wYmL38nFnqKvz3X8UKcaMC-foo.txt
2023-10-19 17:08:21,376 MainProcess DEBUG    Using local partition
***** create session handle ****
2023-10-19 17:08:21,806 MainProcess INFO     writing partitioned content to /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned/c561720a56acdd84c03351220712995e.json
2023-10-19 17:08:21,806 MainProcess INFO     Calling Copier with 3 docs
2023-10-19 17:08:21,806 MainProcess INFO     Running copy node to move content to desired output location
2023-10-19 17:08:21,806 MainProcess INFO     Copying /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned/3d4482a8a65f553f0bca5de9f62ddccc.json -> /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/structured-output/google-drive/117qrVqiCoR5EjYMsDHGdy3UMkEtKr9Q8-test-drive-doc.docx.json
2023-10-19 17:08:21,807 MainProcess INFO     Copying /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned/6af41eb2d29c3ebcab0a8c8fc88966f0.json -> /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/structured-output/google-drive/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx.json
2023-10-19 17:08:21,808 MainProcess INFO     Copying /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/workdir/google-drive/partitioned/c561720a56acdd84c03351220712995e.json -> /Users/ryannikolaidis/Development/unstructured/unstructured/test_unstructured_ingest/structured-output/google-drive/1cTKXAreuj-wYmL38nFnqKvz3X8UKcaMC-foo.txt.json

@potter-potter
Copy link
Contributor

@ryannikolaidis Since Roman is in the google_drive connector file. Can we remove write_result on line 222, or should we make that a different story?

@ryannikolaidis
Copy link
Contributor

@ryannikolaidis Since Roman is in the google_drive connector file. Can we remove write_result on line 222, or should we make that a different story?

yea, probably okay to just cut a quick branch to remove that to keep it clean

@potter-potter
Copy link
Contributor

potter-potter commented Oct 20, 2023

@ryannikolaidis When you tested did you also have the changes from the serialization fix?

I'm still seeing the top 5 *** create session handle ***

But the ones below that I'm not seeing because the serialization fix is working correctly once it goes to the partition phase.

@ryannikolaidis
Copy link
Contributor

ryannikolaidis commented Oct 20, 2023

@ryannikolaidis When you tested did you also have the changes from the serialization fix?

I'm still seeing the top 5 *** create session handle ***

But the ones below that I'm not seeing because the serialization fix is working correctly once it goes to the partition phase.

@potter-potter right, but still should only see 2 in that, correct? (that was what my eyes were on here)

@potter-potter
Copy link
Contributor

@ryannikolaidis When you tested did you also have the changes from the serialization fix?
I'm still seeing the top 5 *** create session handle ***
But the ones below that I'm not seeing because the serialization fix is working correctly once it goes to the partition phase.

@potter-potter right, but still should only see 2 in that, correct? (that was what my eyes were on here)

Yep. exactly. So it seems to not be working yet.

@potter-potter
Copy link
Contributor

potter-potter commented Oct 20, 2023

@ryannikolaidis When you tested did you also have the changes from the serialization fix?
I'm still seeing the top 5 *** create session handle ***
But the ones below that I'm not seeing because the serialization fix is working correctly once it goes to the partition phase.

@potter-potter right, but still should only see 2 in that, correct? (that was what my eyes were on here)

Yep. exactly. So it seems to not be working yet.

So why does it call update_source_metadata before it calls get_file (this is where it is not using the session correctly) and then it also calls update_source_metadata when it is get_file the file (it uses it correctly here). You would think we could skip the earlier call.

I'm guessing it calls update_source_metadata right when it instantiates the GoogleDriveIngestDoc (for file in files) and at that point it can't see the other object's session_handle. But I haven't fully checked this out.

@rbiseck3 rbiseck3 force-pushed the roman/session-handler-fix branch from 919cd9a to 5473138 Compare October 20, 2023 12:41
@rbiseck3
Copy link
Contributor Author

rbiseck3 commented Oct 20, 2023

FYI, part of the issue here was that source metadata was being fetched as part of serializing the ingest docs after the doc factory step in the pipeline. This PR now depends on a fix in the serializing PR: #1800. Will revisit this PR and rerun validation after that gets merged into main.

@potter-potter
Copy link
Contributor

Just merged in Main into my local branch. Definitely an improvement!
We're no longer grabbing multiple credentials on the download.

We are still grabbing credentials on the partition though. 3 times (1 per file). And the _source_metadata comes in as None.

@rbiseck3 rbiseck3 force-pushed the roman/session-handler-fix branch 2 times, most recently from dabef64 to 6e858a1 Compare October 25, 2023 14:46
@rbiseck3
Copy link
Contributor Author

After rebasing and running this again, I only say one session handle created during the google drive ingest test.

@potter-potter
Copy link
Contributor

potter-potter commented Oct 25, 2023

After rebasing and running this again, I only say one session handle created during the google drive ingest test.

I did what I believe was the same as you with the Rebasing. I assume Rebasing from main.

I put print statement in
connector/google_drive.py

 try:

        print("******* GRABBING CREDS ************")

        os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = key_path

And then a breakpoint to check if _source_metadata was there:

ingest/interfaces.py

    def partition_file(
        self,
        partition_config: PartitionConfig,
        **partition_kwargs,
    ) -> t.List[Element]:

        breakpoint()

        if not partition_config.partition_by_api:

And I still get the 3 calls for credits during partitioning. (in addition to the 2 expected ones for listing and downloading)

@rbiseck3 rbiseck3 force-pushed the roman/session-handler-fix branch 2 times, most recently from 89ad964 to 4ac35b9 Compare October 26, 2023 21:32
@ryannikolaidis
Copy link
Contributor

hmm, running locally I'm hitting:

cannot pickle '_cffi_backend.FFI' object
Traceback (most recent call last):
  File "/Users/ryannikolaidis/Development/unstructured/unstructured/unstructured/ingest/pipeline/source.py", line 39, in run
    for k, v in doc.to_dict().items():
  File "/Users/ryannikolaidis/Development/unstructured/unstructured/unstructured/ingest/interfaces.py", line 257, in to_dict
    as_dict = _asdict(self, encode_json=encode_json)
  File "/Users/ryannikolaidis/.pyenv/versions/unstructured/lib/python3.10/site-packages/dataclasses_json/core.py", line 393, in _asdict
    value = _asdict(
  File "/Users/ryannikolaidis/.pyenv/versions/unstructured/lib/python3.10/site-packages/dataclasses_json/core.py", line 393, in _asdict
    value = _asdict(
  File "/Users/ryannikolaidis/.pyenv/versions/unstructured/lib/python3.10/site-packages/dataclasses_json/core.py", line 411, in _asdict
    return copy.deepcopy(obj)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_cffi_backend.FFI' object
2023-10-27 12:41:37,766 SpawnPoolWorker-3 ERROR    failed to get data associated with source doc: {'processor_config': {'reprocess': True, 'verbose': True, 'work_dir': '/Users/ryannikolaidis/.cache/unstructured/ingest/pipeline', 'output_dir': 'structured-output', 'num_processes': 6, 'raise_on_error': False}, 'read_config': {'download_dir': '/Users/ryannikolaidis/.cache/unstructured/ingest/google_drive/df0a1d5b59', 're_download': False, 'preserve_downloads': True, 'download_only': False, 'max_docs': None}, 'connector_config': {'drive_id': '1OQZ66OHBE30rNsNa7dweGLfRmXvkT_jr', 'service_account_key': '/Users/ryannikolaidis/.ssh/google-cloud-unstructured-ingest-test-d4fc30286d9d.json', 'extension': None, 'recursive': False}, '_source_metadata': None, '_date_processed': None, '_session_handle': None, 'meta': {'mimeType': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'id': '1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o', 'name': 'fake.docx', 'download_dir': '/Users/ryannikolaidis/.cache/unstructured/ingest/google_drive/df0a1d5b59', 'download_filepath': '/Users/ryannikolaidis/.cache/unstructured/ingest/google_drive/df0a1d5b59/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx', 'output_dir': 'structured-output', 'output_filepath': '/Users/ryannikolaidis/Development/unstructured/unstructured/structured-output/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx'}, 'registry_name': 'google_drive', 'base_filename': '/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx', 'filename': '/Users/ryannikolaidis/.cache/unstructured/ingest/google_drive/df0a1d5b59/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx', '_output_filename': '/Users/ryannikolaidis/Development/unstructured/unstructured/structured-output/1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o-fake.docx.json', 'record_locator': {'drive_id': '1OQZ66OHBE30rNsNa7dweGLfRmXvkT_jr', 'file_id': '1SpQuE7jHz9nMt5hfQXsiok1SgIdRYX5o'}}, cannot pickle '_cffi_backend.FFI' object
Traceback (most recent call last):
  File "/Users/ryannikolaidis/Development/unstructured/unstructured/unstructured/ingest/pipeline/source.py", line 39, in run
    for k, v in doc.to_dict().items():
  File "/Users/ryannikolaidis/Development/unstructured/unstructured/unstructured/ingest/interfaces.py", line 257, in to_dict
    as_dict = _asdict(self, encode_json=encode_json)
  File "/Users/ryannikolaidis/.pyenv/versions/unstructured/lib/python3.10/site-packages/dataclasses_json/core.py", line 393, in _asdict
    value = _asdict(
  File "/Users/ryannikolaidis/.pyenv/versions/unstructured/lib/python3.10/site-packages/dataclasses_json/core.py", line 393, in _asdict
    value = _asdict(
  File "/Users/ryannikolaidis/.pyenv/versions/unstructured/lib/python3.10/site-packages/dataclasses_json/core.py", line 411, in _asdict
    return copy.deepcopy(obj)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ryannikolaidis/.pyenv/versions/3.10.11/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_cffi_backend.FFI' object
2023-10-27 12:41:37,783 MainProcess INFO     No files to run partition over

@potter-potter
Copy link
Contributor

I got the same thing when I tried yesterday. "TypeError: cannot pickle '_cffi_backend.FFI' object"

@rbiseck3
Copy link
Contributor Author

I've tried running this locally on my mac as well as on a fresh rocky image that is supported by this repo and was not able to reproduce this pickling issue.

@potter-potter
Copy link
Contributor

I've tried running this locally on my mac as well as on a fresh rocky image that is supported by this repo and was not able to reproduce this pickling issue.

Which python version? I'll try that one...

@rbiseck3
Copy link
Contributor Author

@potter-potter Python 3.10.12 on my mac, Python 3.10.13 on the rocky image.

@potter-potter
Copy link
Contributor

@ryannikolaidis @rbiseck3 Well I ran it with 3.10.12 and:

  1. It ran with no pickling errors
  2. It only grabbed credentials 2 times - as expected. once for listing ids, once for downloading
    I'm gonna check it out a little more.
    I'll also try again with the latest 3.8

@potter-potter
Copy link
Contributor

potter-potter commented Oct 27, 2023

Worked fine in python 3.8.18
Worked fine in python 3.8.15
@ryannikolaidis what version of python were you using when you had the error?

Copy link
Contributor

@potter-potter potter-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working well for me.

@rbiseck3 rbiseck3 force-pushed the roman/session-handler-fix branch from 4ac35b9 to 6a13ecb Compare October 30, 2023 15:18
Copy link
Contributor

@ryannikolaidis ryannikolaidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking until we can resolve noted issue running on mac

@ryannikolaidis
Copy link
Contributor

ryannikolaidis commented Oct 30, 2023

Worked fine in python 3.8.18 Worked fine in python 3.8.15 @ryannikolaidis what version of python were you using when you had the error?

Python 3.10.11.

@ryannikolaidis
Copy link
Contributor

still seeing same issue.

Copy link
Contributor

@ryannikolaidis ryannikolaidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, looks like not related to version. created a fresh virtual env but still on 3.10.11 and now it passes. I guess worth noting internally if other folks run into this that clean env seems to do the trick

@rbiseck3 rbiseck3 force-pushed the roman/session-handler-fix branch from 6a13ecb to 49e19ce Compare October 30, 2023 17:56
@rbiseck3 rbiseck3 enabled auto-merge October 30, 2023 20:44
@rbiseck3 rbiseck3 added this pull request to the merge queue Oct 30, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 30, 2023
@rbiseck3 rbiseck3 added this pull request to the merge queue Oct 30, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 30, 2023
@rbiseck3 rbiseck3 added this pull request to the merge queue Oct 30, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 30, 2023
@rbiseck3 rbiseck3 force-pushed the roman/session-handler-fix branch from 49e19ce to 42f0148 Compare October 31, 2023 11:44
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 11:44 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 enabled auto-merge October 31, 2023 11:44
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 11:46 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 11:46 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 11:46 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 11:46 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 added this pull request to the merge queue Oct 31, 2023
Merged via the queue into main with commit 963ac35 Oct 31, 2023
45 checks passed
@rbiseck3 rbiseck3 deleted the roman/session-handler-fix branch October 31, 2023 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants