Added pooch registries for data files #4098

santacodes · 2024-05-16T17:07:22Z

Description

Added pooch as a [dev] dependency and added a file to create and store data file registries for pybamm-data.
Added dataloader method to fetch data files from pybamm-data and store it under local cache folder and import it in notebooks wherever needed with specified paths.
Removed data files from local repo under pybamm/input
Data files under pybamm/input/parameters have not been removed or included in the pooch registries.
Added documentation, tests and an example notebook for pybamm.DataLoader class.

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist:

No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
All tests pass: $ python run-tests.py --all (or $ nox -s tests)
The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

review-notebook-app · 2024-05-16T17:07:28Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

pybamm/pybammdata.py

pyproject.toml

agriyakhetarpal · 2024-05-16T18:11:25Z

Additionally, could you use the function you added in the failing tests as well?

codecov · 2024-05-18T18:27:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.54%. Comparing base (739a1c0) to head (92f988a).
Report is 2 commits behind head on develop.

❗ Current head 92f988a differs from pull request most recent head a7a4926

Please upload reports for the commit a7a4926 to get more accurate results.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4098      +/-   ##
===========================================
- Coverage    99.55%   99.54%   -0.02%     
===========================================
  Files          287      288       +1     
  Lines        21753    21767      +14     
===========================================
+ Hits         21657    21668      +11     
- Misses          96       99       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

agriyakhetarpal

Thanks for this, @santacodes! Could you fix the coverage by adding additional tests for the get_data method (to check the NameError that you raised), and for the checksum option in the show_registry method? @valentinsulzer, could you review this as well?

Besides this, I will recommend building the source distribution and the wheels on your fork by triggering the manual job we have available, and locally using python -m build to ensure that we aren't missing out on any files and that the sizes of the distributions are alright.

tests/unit/test_pybamm_data.py

tests/unit/test_simulation.py

pybamm/pybamm_data.py

Co-authored-by: Agriya Khetarpal <[email protected]>

agriyakhetarpal

Looks good to me now, thanks, @santacodes! I'll wait for some additional thoughts on this before we merge.

cc @cringeyburger @prady0t – this PR has removed (most of) the PyBaMM data files from the PyBaMM source code. Testing the built wheels using CIBW_TEST_COMMAND should now be a bit easier. I feel that the pytest --pyargs pybamm invocation would be better than a pybamm.test() one – that way we would not need to move the tests/ directory inside pybamm/

pyproject.toml

docs/source/examples/notebooks/models/Validating_mechanical_models_Enertech_DFN.ipynb

arjxn-py

I've no further concerns with this one, good work @santacodes.
@agriyakhetarpal are we good to merge this or we still seek +1 review?

agriyakhetarpal · 2024-05-25T12:52:11Z

I think this is good to go on my end, but I just had a question about skipping tests without networks – @santacodes, the pytest decorator + no_internet_connection() we have will prevent the tests from failing, but this is just for the unit and the integration tests. Is there a way we can cleanly extend this behaviour for the example scripts and notebooks which make use of the data files, in specific? It's not a big deal though, because we expect that most people have internet access nowadays somehow or the other.

santacodes · 2024-05-25T13:05:07Z

I think this is good to go on my end, but I just had a question about skipping tests without networks – @santacodes, the pytest decorator + no_internet_connection() we have will prevent the tests from failing, but this is just for the unit and the integration tests. Is there a way we can cleanly extend this behaviour for the example scripts and notebooks which make use of the data files, in specific? It's not a big deal though, because we expect that most people have internet access nowadays somehow or the other.

For notebooks and scripts, I am thinking we could add an error-handling method to raise an Error, pooch raises socket.gaierror by default when there is no network connection as it cannot create a socket connection with the upstream URL.
The second option would be to check for the internet connection before actually fetching data files from upstream and warning the user about network connectivity.
I think both can be implemented, but I am not sure which would be more optimal for scripts and notebooks specifically.

agriyakhetarpal · 2024-05-25T13:32:49Z

For notebooks and scripts, I am thinking we could add an error-handling method to raise an Error, pooch raises socket.gaierror by default when there is no network connection as it cannot create a socket connection with the upstream URL. The second option would be to check for the internet connection before actually fetching data files from upstream and warning the user about network connectivity. I think both can be implemented, but I am not sure which would be more optimal for scripts and notebooks specifically.

I thought of that but both of these options should ideally stay outside and abstracted away from the content inside the notebooks and scripts, which isn't quite viable and seems over-engineered... I opine that we should skip them for now. We can help out on a per-user basis if someone tries to use nox -s scripts or nox -s examples without an internet connection. Another reason why I say this is because we do not include any of the example scripts or notebooks inside the sdist or the wheel, which means that people have to come to GitHub or browse the online documentation to access them anyway, and accessing GitHub requires access to an internet connection.

kratman

Minor changes to the tests

tests/unit/test_pybamm_data.py

kratman

For any other reviewers: I checked the link failure. It should start passing once this PR is merged. The link checker goes with the stable documentation, so it cannot see the new page

Co-authored-by: Eric G. Kratz <[email protected]>

Changes made

agriyakhetarpal · 2024-05-27T14:41:25Z

Maybe we should refer to the reST-based heading for PyBaMM Data instead of adding an HTTP(S) link to our documentation? Otherwise, this failing link will keep failing for all the release candidates until we have the v24.5 release out – the link is pointing to stable, not latest. @santacodes, could you make this change in the notebook?

agriyakhetarpal · 2024-05-27T18:41:09Z

Changing to ../api/pybamm_data.rst won't help – could you add a heading under the PyBaMM Data page and use reST syntax to link to it? Something under this SO post should help

kratman

Marking @agriyakhetarpal's last comment as a change request to keep this from being merged until it is fixed:

Changing to ../api/pybamm_data.rst won't help – could you add a heading under the PyBaMM Data page and use reST syntax to link to it? Something under this SO post should help

santacodes · 2024-05-29T16:55:08Z

Changing to ../api/pybamm_data.rst won't help – could you add a heading under the PyBaMM Data page and use reST syntax to link to it? Something under this SO post should help

I tried referencing headers, but sphinx doesn't seem to render it as a link no matter what, even when I reference the pybamm_data.rst to reference the documentation header as they are not under the same directory. It only works by referencing the notebook's path like in my last commit, or by referencing docs/source/api/pybamm_data.rst:#pybamm.DataLoader. They both mean the same thing, so I will leave it up to you to decide further.

agriyakhetarpal · 2024-05-30T01:29:03Z

I think this should be okay, then, thanks for the explanation. Not sure if there is a better way to link them unless we switch to MyST completely for the codebase, which is something for a later time

Based on the reasoning provided we can go forward with this. The documentation is able to link to the requested page without issues

* added pooch * removed getpath() * added data_loader to notebooks and scripts * removed data files in the registry * added docs and implemented code review suggestions * added example notebook, unit tests and edited changelog * fixed pouch cell notebook * Apply suggestions from code review Co-authored-by: Agriya Khetarpal <[email protected]> * style: pre-commit fixes * added more test coverage, fixed notebook and code review suggestions * added bib references * fixed doctest and script test errors * fixed codacy warnings * using sockets for checking network * code review suggestions along with notebook update * Apply suggestions from code review Co-authored-by: Eric G. Kratz <[email protected]> * fixed RUF015 * changed link referenced and fixed lychee warning * changed the data files link under input/ to pybamm-data repo --------- Co-authored-by: Agriya Khetarpal <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ferran Brosa Planella <[email protected]> Co-authored-by: Arjun Verma <[email protected]> Co-authored-by: Eric G. Kratz <[email protected]>

santacodes and others added 6 commits May 15, 2024 22:36

added pooch

605d251

removed getpath()

a2f9e90

added data_loader to notebooks and scripts

508e419

Merge branch 'pybamm-team:develop' into data

7cf3293

removed data files in the registry

e4d7757

Merge branch 'data' of https://github.com/santacodes/PyBaMM into data

a84ae68

santacodes requested review from martinjrobins, Saransh-cpp, agriyakhetarpal, kratman, arjxn-py, a team, brosaplanella, DrSOKane, rtimms, valentinsulzer and TomTranter as code owners May 16, 2024 17:07

agriyakhetarpal reviewed May 16, 2024

View reviewed changes

santacodes and others added 3 commits May 18, 2024 13:13

Merge branch 'pybamm-team:develop' into data

d9ed5eb

added docs and implemented code review suggestions

53637f0

added example notebook, unit tests and edited changelog

064a639

santacodes and others added 2 commits May 19, 2024 00:06

Merge branch 'develop' into data

4648e48

fixed pouch cell notebook

2b99b92

agriyakhetarpal reviewed May 18, 2024

View reviewed changes

santacodes and others added 3 commits May 19, 2024 01:21

Apply suggestions from code review

e221bd2

Co-authored-by: Agriya Khetarpal <[email protected]>

style: pre-commit fixes

fd1aefe

added more test coverage, fixed notebook and code review suggestions

38e76ee

Merge branch 'develop' into data

ebd59a3

agriyakhetarpal approved these changes May 24, 2024

View reviewed changes

arjxn-py reviewed May 24, 2024

View reviewed changes

pyproject.toml Show resolved Hide resolved

arjxn-py reviewed May 24, 2024

View reviewed changes

docs/source/examples/notebooks/models/Validating_mechanical_models_Enertech_DFN.ipynb Outdated Show resolved Hide resolved

Merge branch 'develop' into data

92f988a

arjxn-py approved these changes May 25, 2024

View reviewed changes

Merge branch 'develop' into data

2068032

kratman previously requested changes May 27, 2024

View reviewed changes

tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved

tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved

kratman reviewed May 27, 2024

View reviewed changes

santacodes and others added 2 commits May 27, 2024 19:18

Apply suggestions from code review

a29c775

Co-authored-by: Eric G. Kratz <[email protected]>

fixed RUF015

9bd31a8

changed link referenced and fixed lychee warning

8744591

agriyakhetarpal added 2 commits May 28, 2024 04:24

Merge branch 'develop' into data

948d466

Merge branch 'develop' into data

28562ca

kratman previously requested changes May 29, 2024

View reviewed changes

santacodes and others added 2 commits May 29, 2024 22:26

changed the data files link under input/ to pybamm-data repo

684129e

Merge branch 'develop' into data

a7a4926

kratman merged commit a553fe6 into pybamm-team:develop May 30, 2024
24 checks passed

santacodes mentioned this pull request May 31, 2024

Repository structure (file layouts, cookiecutter templating engines, distribution options) pybamm-team/pybamm-cookie#1

Open

santacodes deleted the data branch June 16, 2024 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added pooch registries for data files #4098

Added pooch registries for data files #4098

santacodes commented May 16, 2024 •

edited

Loading

review-notebook-app bot commented May 16, 2024

agriyakhetarpal commented May 16, 2024

codecov bot commented May 18, 2024 •

edited

Loading

agriyakhetarpal left a comment

agriyakhetarpal left a comment

arjxn-py left a comment

agriyakhetarpal commented May 25, 2024

santacodes commented May 25, 2024

agriyakhetarpal commented May 25, 2024

kratman left a comment

kratman left a comment

agriyakhetarpal commented May 27, 2024

agriyakhetarpal commented May 27, 2024

kratman left a comment

santacodes commented May 29, 2024

agriyakhetarpal commented May 30, 2024

Added pooch registries for data files #4098

Added pooch registries for data files #4098

Conversation

santacodes commented May 16, 2024 • edited Loading

Description

Type of change

Key checklist:

Further checks:

review-notebook-app bot commented May 16, 2024

agriyakhetarpal commented May 16, 2024

codecov bot commented May 18, 2024 • edited Loading

Codecov Report

agriyakhetarpal left a comment

Choose a reason for hiding this comment

agriyakhetarpal left a comment

Choose a reason for hiding this comment

arjxn-py left a comment

Choose a reason for hiding this comment

agriyakhetarpal commented May 25, 2024

santacodes commented May 25, 2024

agriyakhetarpal commented May 25, 2024

kratman left a comment

Choose a reason for hiding this comment

kratman left a comment

Choose a reason for hiding this comment

agriyakhetarpal commented May 27, 2024

agriyakhetarpal commented May 27, 2024

kratman left a comment

Choose a reason for hiding this comment

santacodes commented May 29, 2024

agriyakhetarpal commented May 30, 2024

santacodes commented May 16, 2024 •

edited

Loading

codecov bot commented May 18, 2024 •

edited

Loading