Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/netcdf #53

Open
wants to merge 38 commits into
base: dev
Choose a base branch
from
Open

Feature/netcdf #53

wants to merge 38 commits into from

Conversation

bruvio
Copy link
Collaborator

@bruvio bruvio commented Mar 24, 2022

@bobturneruk @RyanJField

I drafted the PR as you suggested.

I wonder if you can suggest how to move with this?

at the moment I am reading the R implementation and, for now, translate read_array/write_array to python.

I would like to hear your view on this as I do not want to reinvent the wheel. For example, as I am not familiar with R, is there already a Python implementation for the R functions resolve_read/resolve_write?

@bruvio bruvio added the enhancement New feature or request label Mar 24, 2022
@RyanJField
Copy link
Collaborator

The code for resolve_read and write are in the link functions if you read the R link_read / write functions you can see the code was put into resolve read / write to avoid duplication with the read / write_ functions.

@bobturneruk
Copy link

Pasting these in here in case they're useful:

I guess they specify what read_array and write_array should do (in a HDF5 setup). I guess the strategy here is to sidestep HDF5 for python and go straight to netCDF?

@bruvio
Copy link
Collaborator Author

bruvio commented Mar 29, 2022

@RyanJField @bobturneruk at the moment (after refactorin the code to expose resolve_read and resolve_write as in R)
I get an error when executing

@bruvio
Copy link
Collaborator Author

bruvio commented Mar 29, 2022

@bobturneruk @RyanJField

as far as this stands I am getting an error when executing fair run command in this branch.

as I ssaid I just exposed (i.e. refactored) two functions resolve_read and resolve_write to be similar to the R implementation and then serve for future implementation of read/write netcdf.

I would like a new set of eyes to have a look into this error as it is not related to the cli,

$ fair run simpleModel/ext/SEIRSconfig.yaml 
Updating registry from simpleModel/ext/SEIRSconfig.yaml
Traceback (most recent call last):
  File "simpleModel/ext/SEIRSModelRun.py", line 39, in <module>
    simpleModel.SEIRS_Plot(sm, model_plot)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/simpleModel/common/SEIRS_Plot.py", line 28, in SEIRS_Plot
    plt.savefig(save_location)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/pyplot.py", line 958, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/figure.py", line 3019, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2259, in print_figure
    canvas = self._get_output_canvas(backend, format)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2188, in _get_output_canvas
    raise ValueError(
ValueError: Format 'netcdf' is not supported (supported formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff)

@RyanJField
Copy link
Collaborator

@bobturneruk @RyanJField

as far as this stands I am getting an error when executing fair run command in this branch.

as I ssaid I just exposed (i.e. refactored) two functions resolve_read and resolve_write to be similar to the R implementation and then serve for future implementation of read/write netcdf.

I would like a new set of eyes to have a look into this error as it is not related to the cli,

$ fair run simpleModel/ext/SEIRSconfig.yaml 
Updating registry from simpleModel/ext/SEIRSconfig.yaml
Traceback (most recent call last):
  File "simpleModel/ext/SEIRSModelRun.py", line 39, in <module>
    simpleModel.SEIRS_Plot(sm, model_plot)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/simpleModel/common/SEIRS_Plot.py", line 28, in SEIRS_Plot
    plt.savefig(save_location)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/pyplot.py", line 958, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/figure.py", line 3019, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2259, in print_figure
    canvas = self._get_output_canvas(backend, format)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2188, in _get_output_canvas
    raise ValueError(
ValueError: Format 'netcdf' is not supported (supported formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff)

You need to use the file_type from the write block:

file_type = write["file_type"]

@bruvio
Copy link
Collaborator Author

bruvio commented Mar 31, 2022

with commit 16b7a14 I started to implement writing data to netcdf as the example set out by Derek.
@bobturneruk @RyanJField fell free to have a look and comment.

@bobturneruk
Copy link

Looks to be shaping up well. I'll have a go at getting it running on my machine and comment more.

@bruvio
Copy link
Collaborator Author

bruvio commented Apr 3, 2022

up to now there are a couple of wrapper functions to write 1d data as f(x) ,2d as f(x,y) and 3d data as f(x,y,z). Also there are wrapper functions to create groups and nested groups. Group creation is idempotent so creating a group that is already populated does not raise errors or deletes data. I still have to write wrappers to read data from netcdf. And then if you agree we can move to write read_array and write_array function as in the R implementation.

@bruvio
Copy link
Collaborator Author

bruvio commented Apr 28, 2022

@RyanJField @richardreeve @B0SKAMP @bobturneruk
commit c979fbf contains a first implementation of write_array.
there are also tests (no edge cases for now)

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 17 Code Smells

No Coverage information No Coverage information
18.3% 18.3% Duplication

@bruvio
Copy link
Collaborator Author

bruvio commented Jun 22, 2022

@B0SKAMP @richardreeve @RyanJField @bobturneruk
Hi!
I refactored the code.
Hope meets the requirements.

let me know.

@bobturneruk bobturneruk requested a review from richardreeve June 23, 2022 10:24
@bobturneruk
Copy link

I'm not sufficiently sure of the current requirements to say if this meeting them. This will need attention from @richardreeve, I think. I can look at the code.

@richardreeve
Copy link
Member

Great, thanks @bruvio - I'll have some time to look at this tomorrow.

@richardreeve
Copy link
Member

Hi @bruvio - I (actually met up with!) and talked through this with @B0SKAMP today, and since he has a much clearer understanding of the details than me, he said he'd talk to you about it early next week. It sounds like between the two of you you're converging on a really nice solution.

@bruvio
Copy link
Collaborator Author

bruvio commented Jun 27, 2022

with last bunch of commits refactored more the code to prepare headers better, now also the data is set into the headers when calling prepare_headers. I also added a bit more refactoring here and there.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 18 Code Smells

No Coverage information No Coverage information
26.3% 26.3% Duplication

@bruvio bruvio marked this pull request as ready for review July 26, 2022 12:01
@bruvio
Copy link
Collaborator Author

bruvio commented Jul 26, 2022

I think we can start reviewing this.

this is a copy paste of an email exchange:

Currently all data is written in one go. Another interface can be provided to write in slices.

What we need to agree on is the internal format of the netcdf file.

The naming format is "grp1/grp2/..../grp/varname

The root of the netcdf file should have a schema version attribute.

I create a netcdf dimension with the "_dim" suffix when creating a dimension. Then a netcdf variable with the dimension name which references the "_dim" suffix.

When an array is created I give the dimension names. From these a retrieve the netcdf "_dim" dimensions. A netcdf variable is created refering to the "_dim" dimensions

Finally as discussed there is a possibility to pass custom attributes.

@RyanJField @richardreeve @B0SKAMP

Derek is not able to see this, he might be able to comment as well?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants