Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pickle's loads/dumps as a faster deepcopy #5281

Merged
merged 22 commits into from
Jun 25, 2024

Conversation

kenodegard
Copy link
Contributor

@kenodegard kenodegard commented Apr 12, 2024

Description

Attempting to use pickle instead of deepcopy for a faster copy.

Split from:

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 12, 2024
Copy link

codspeed-hq bot commented Apr 12, 2024

CodSpeed Performance Report

Merging #5281 will improve performances by 66.17%

Comparing kenodegard:pickle-vs-deepcopy (e5318de) with main (fbbc298)

Summary

⚡ 1 improvements
✅ 2 untouched benchmarks

Benchmarks breakdown

Benchmark main kenodegard:pickle-vs-deepcopy Change
test_pin_subpackage_benchmark 12.2 s 7.4 s +66.17%

@kenodegard kenodegard marked this pull request as ready for review April 12, 2024 20:14
@kenodegard kenodegard requested a review from a team as a code owner April 12, 2024 20:14
Copy link
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment here is as last time. If we encounter a data structure that does not pickle, we will get an error. It might be worth making a separate helper function with a try except block falling back to deepcopy.

conda_build/config.py Outdated Show resolved Hide resolved
conda_build/config.py Outdated Show resolved Hide resolved
conda_build/config.py Outdated Show resolved Hide resolved
@beckermr
Copy link
Contributor

pre-commit.ci autofix

@beckermr
Copy link
Contributor

beckermr commented Jun 21, 2024

@kenodegard @dholth Are we worried about errors where something cannot be pickled or shall we merge this one?

The test failures appear to be unrelated warnings appearing w/ python 3.12:

DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.

I don't follow it totally but it appears to happen in extractions of packages.

@minrk
Copy link
Contributor

minrk commented Jun 21, 2024

Are we worried about errors where something cannot be pickled

Since this is currently only applied to variant objects and I think variants are necessarily dicts of lists, strings and numbers, I don't think this is likely to come up.

beckermr
beckermr previously approved these changes Jun 22, 2024
Copy link
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a news item and then merge!

@isuruf isuruf added this to the 24.7.x milestone Jun 22, 2024
beckermr
beckermr previously approved these changes Jun 22, 2024
@jaimergp
Copy link
Contributor

DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.

I don't follow it totally but it appears to happen in extractions of packages.

Yep, this is the tarfile usage in conda-package-streaming. Added conda/conda-package-streaming#87 to track. For now we can add a filterwarnings setting to pyproject.toml, I'd say. I think @zklaus was looking into it.

@beckermr
Copy link
Contributor

filterwarnings setting to pyproject.toml

do you have an example of this or a link to the docs?

I am not following.

@beckermr
Copy link
Contributor

pre-commit.ci autofix

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@beckermr beckermr enabled auto-merge (squash) June 25, 2024 11:22
@beckermr beckermr merged commit 3a09f13 into conda:main Jun 25, 2024
28 checks passed
@kenodegard kenodegard deleted the pickle-vs-deepcopy branch June 25, 2024 14:05
@minrk
Copy link
Contributor

minrk commented Jun 28, 2024

Thanks for this! I've been testing it out, and with this and #5384 rerender time is reduced by ~90%, from over 30 minutes to about 4 on some recipes with a large variant matrix (petsc4py).

FWIW, I spent a little time looking into making config.variant itself an immutable dict via deepfreeze since that's already computed for the hash anyway and would eliminate the need to copy variants altogether. But there are a few places where the variant is modified in place that make it a little bit tricky. Still doable and perhaps right thing to do, but not so simple. I think it also comes at an increased cost of computing list_of_dicts_to_dict_of_lists because there would end up being a lot of converting tuples to lists and back.

I also noticed in conda-forge/conda-smithy#1967 that deepfreeze is quite a lot more expensive than using HashableDict (deepfreeze takes ~10x the time of hash(HashableDict(dict))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants