Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunking for json normalization #914

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

kbuma
Copy link
Contributor

@kbuma kbuma commented Feb 5, 2024

Summary

Collections such as materials have end up with over 4 million columns if the whole collection is json normalized. Attempting to process collections like this caused a memory error.

Solution is to chunk and then normalize and filter that chunk, then pull all those filtered chunks together.

Checklist

  • Google format doc strings added.
  • Code linted with ruff. (For guidance in fixing rule violates, see rule list)
  • Type annotations included. Check with mypy.
  • Tests added for new features/fixes.
  • I have run the tests locally and they passed.

@munrojm munrojm merged commit 445df9f into materialsproject:main Feb 5, 2024
8 checks passed
@kbuma kbuma deleted the bugfix/opendata_chunk_write branch September 9, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants