Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve OmegaConfigLoader performance #4367

Merged
merged 9 commits into from
Jan 3, 2025
Merged

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Dec 3, 2024

Description

Resolves #4322

Development notes

  • As mentioned by @noklam here, this PR makes a quick fix for _get_globals_value and _get_runtime_value by not re-creating _globals and runtime_params for each reference in the catalog
  • As mentioned by @matthias here, separated the config merge and underscore filter which improved the performance as well
  • This is a minor improvement to the performance of OCL. The bottleneck still remains the use of
    OmegaConf.load
    OmegaConf.to_container
    OmegaConf.merge

Test Data

# globals.yml

compression_type: '${random_choice: gzip, bz2, xz, None}'
random_seed: 42
separator: "${random_choice: ,, \t, |}"
# catalog.yml with variable number of entries

interpolated_dataset_3610:
  filepath: data/interpolated_dataset_3610.csv
  load_args:
    encoding: '${random_choice: utf-8, iso-8859-1, utf-16}'
    sep: ${globals:separator}
  save_args:
    compression: ${globals:compression_type}
    index: false
  type: pandas.CSVDataSet

Before:

10 datasets with variable interpolation 10

Pasted Graphic 9

After:

image

image


Observation:
There is a slight improvement in the performance as seen above

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@astrojuanlu
Copy link
Member

34 % reduction, not bad 👏🏼

Copy link
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me, thanks @ravi-kumar-pilla!

@noklam
Copy link
Contributor

noklam commented Jan 3, 2025

I have tested the PR and have the same findings, the bottleneck of resolver has been removed with caching mechanism implemented in this PR, but there are bottlenecks remains from OmegaConf.

I have reduce the issue into a single large config with no resolver involved.
File: catalog_datasets_with_variables.yml.zip

It can be benchmark with pyinstrument with a simple script like

from omegaconf import OmegaConf
OmegaConf.load(path)

or

%%time
OmegaConf.load(path)

This takes a significant time to run compare to yaml.safe_load, which is instant.

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments to reference in the future. Also please update the release notes.

Signed-off-by: ravi_kumar_pilla <[email protected]>
@ravi-kumar-pilla ravi-kumar-pilla enabled auto-merge (squash) January 3, 2025 18:37
@ravi-kumar-pilla ravi-kumar-pilla merged commit 057de34 into main Jan 3, 2025
39 of 40 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the chore/improve-ocl branch January 3, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve OmegaConfigLoader performance when global/variable interpolations are involved
4 participants