Feature/netcdf #53

bruvio · 2022-03-24T14:31:24Z

I drafted the PR as you suggested.

I wonder if you can suggest how to move with this?

at the moment I am reading the R implementation and, for now, translate read_array/write_array to python.

I would like to hear your view on this as I do not want to reinvent the wheel. For example, as I am not familiar with R, is there already a Python implementation for the R functions resolve_read/resolve_write?

RyanJField · 2022-03-24T15:13:23Z

The code for resolve_read and write are in the link functions if you read the R link_read / write functions you can see the code was put into resolve read / write to avoid duplication with the read / write_ functions.

bobturneruk · 2022-03-28T13:25:47Z

Pasting these in here in case they're useful:

I guess they specify what read_array and write_array should do (in a HDF5 setup). I guess the strategy here is to sidestep HDF5 for python and go straight to netCDF?

bruvio · 2022-03-29T16:03:55Z

@RyanJField @bobturneruk at the moment (after refactorin the code to expose resolve_read and resolve_write as in R)
I get an error when executing

bruvio · 2022-03-29T16:08:47Z

@bobturneruk @RyanJField

as far as this stands I am getting an error when executing fair run command in this branch.

as I ssaid I just exposed (i.e. refactored) two functions resolve_read and resolve_write to be similar to the R implementation and then serve for future implementation of read/write netcdf.

I would like a new set of eyes to have a look into this error as it is not related to the cli,

$ fair run simpleModel/ext/SEIRSconfig.yaml 
Updating registry from simpleModel/ext/SEIRSconfig.yaml
Traceback (most recent call last):
  File "simpleModel/ext/SEIRSModelRun.py", line 39, in <module>
    simpleModel.SEIRS_Plot(sm, model_plot)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/simpleModel/common/SEIRS_Plot.py", line 28, in SEIRS_Plot
    plt.savefig(save_location)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/pyplot.py", line 958, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/figure.py", line 3019, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2259, in print_figure
    canvas = self._get_output_canvas(backend, format)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2188, in _get_output_canvas
    raise ValueError(
ValueError: Format 'netcdf' is not supported (supported formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff)

RyanJField · 2022-03-29T17:27:12Z

@bobturneruk @RyanJField

as far as this stands I am getting an error when executing fair run command in this branch.

as I ssaid I just exposed (i.e. refactored) two functions resolve_read and resolve_write to be similar to the R implementation and then serve for future implementation of read/write netcdf.

I would like a new set of eyes to have a look into this error as it is not related to the cli,

$ fair run simpleModel/ext/SEIRSconfig.yaml 
Updating registry from simpleModel/ext/SEIRSconfig.yaml
Traceback (most recent call last):
  File "simpleModel/ext/SEIRSModelRun.py", line 39, in <module>
    simpleModel.SEIRS_Plot(sm, model_plot)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/simpleModel/common/SEIRS_Plot.py", line 28, in SEIRS_Plot
    plt.savefig(save_location)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/pyplot.py", line 958, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/figure.py", line 3019, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2259, in print_figure
    canvas = self._get_output_canvas(backend, format)
  File "/home/bruvio/Dropbox/work/UKAEA/pyDataPipeline/.venv/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2188, in _get_output_canvas
    raise ValueError(
ValueError: Format 'netcdf' is not supported (supported formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff)

You need to use the file_type from the write block:

pyDataPipeline/data_pipeline_api/link.py

Line 26 in 6da168b

file_type = write["file_type"]

…e/pyDataPipeline into feature/netcdf

bruvio · 2022-03-31T16:11:26Z

with commit 16b7a14 I started to implement writing data to netcdf as the example set out by Derek.
@bobturneruk @RyanJField fell free to have a look and comment.

bobturneruk · 2022-04-01T07:58:05Z

Looks to be shaping up well. I'll have a go at getting it running on my machine and comment more.

bruvio · 2022-04-03T05:32:16Z

up to now there are a couple of wrapper functions to write 1d data as f(x) ,2d as f(x,y) and 3d data as f(x,y,z). Also there are wrapper functions to create groups and nested groups. Group creation is idempotent so creating a group that is already populated does not raise errors or deletes data. I still have to write wrappers to read data from netcdf. And then if you agree we can move to write read_array and write_array function as in the R implementation.

bruvio · 2022-04-28T16:03:17Z

@RyanJField @richardreeve @B0SKAMP @bobturneruk
commit c979fbf contains a first implementation of write_array.
there are also tests (no edge cases for now)

…ray twice

sonarqubecloud · 2022-04-29T15:52:22Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
17 Code Smells

No Coverage information
18.3% Duplication

… feature/netcdf

…ite data

bruvio · 2022-06-22T15:36:35Z

@B0SKAMP @richardreeve @RyanJField @bobturneruk
Hi!
I refactored the code.
Hope meets the requirements.

let me know.

bobturneruk · 2022-06-23T10:25:23Z

I'm not sufficiently sure of the current requirements to say if this meeting them. This will need attention from @richardreeve, I think. I can look at the code.

richardreeve · 2022-06-23T12:09:11Z

Great, thanks @bruvio - I'll have some time to look at this tomorrow.

richardreeve · 2022-06-24T21:39:20Z

Hi @bruvio - I (actually met up with!) and talked through this with @B0SKAMP today, and since he has a much clearer understanding of the details than me, he said he'd talk to you about it early next week. It sounds like between the two of you you're converging on a really nice solution.

bruvio · 2022-06-27T13:30:13Z

with last bunch of commits refactored more the code to prepare headers better, now also the data is set into the headers when calling prepare_headers. I also added a bit more refactoring here and there.

sonarqubecloud · 2022-07-11T15:20:36Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
18 Code Smells

No Coverage information
26.3% Duplication

bruvio · 2022-07-26T12:04:44Z

I think we can start reviewing this.

this is a copy paste of an email exchange:

Currently all data is written in one go. Another interface can be provided to write in slices.

What we need to agree on is the internal format of the netcdf file.

The naming format is "grp1/grp2/..../grp/varname

The root of the netcdf file should have a schema version attribute.

I create a netcdf dimension with the "_dim" suffix when creating a dimension. Then a netcdf variable with the dimension name which references the "_dim" suffix.

When an array is created I give the dimension names. From these a retrieve the netcdf "_dim" dimensions. A netcdf variable is created refering to the "_dim" dimensions

Finally as discussed there is a possibility to pass custom attributes.

@RyanJField @richardreeve @B0SKAMP

Derek is not able to see this, he might be able to comment as well?
Thanks

bruvio added 2 commits March 24, 2022 10:27

setting up netcdf move

1170111

drafting util functions

8bf8dc5

bruvio added the enhancement New feature or request label Mar 24, 2022

bruvio requested review from bobturneruk and RyanJField March 24, 2022 14:31

bruvio added 3 commits March 24, 2022 21:59

refactoring to expose resolve_read and resolve_write like in R

2b0df63

SEIRS model at this stage fails [skipci]

850e811

[skip ci] fix mypy errors [skip ci]

2259ecc

Fix file_type

6da168b

bruvio added 4 commits March 29, 2022 18:34

[skip ci] drafting tests

e0af496

[skip ci] Merge branch 'feature/netcdf' of github.com:FAIRDataPipelin…

187209a

…e/pyDataPipeline into feature/netcdf

[skip ci] writing wrapper function to write netcdf data

0f537f8

testing wrappers to write variables into netcdf file

16b7a14

bruvio added 5 commits April 1, 2022 11:06

increase test coverage

eb4f62d

[skip ci] improve extract_id testing

cb8d728

[skip ci] added function to create nested group in netcdf dataset

6539d5c

refactoring tests to increase coverage

d6e69f6

remove unneeded checks

66b398d

bruvio added 4 commits April 3, 2022 08:54

string refactoring

04e5e5b

implemented wrapper function to write data with attributes

db06c64

test create_nd_variables_in_group_w_attribute

9a22d7c

increasing test coverage

4386110

bruvio added 5 commits April 28, 2022 20:16

array variable will be stored using default name array

6bd69b3

added more test code to prove update capabilities of calling write_ar…

660a297

…ray twice

code and tests refactoring

aef1e11

fix wrong assert in test

d76a759

refactoring test to fix windows CI

46a5be3

bruvio force-pushed the feature/netcdf branch from 0b1e3ba to ea3b3b7 Compare April 29, 2022 12:36

more refactoring to test for windows CI

48ab107

bruvio force-pushed the feature/netcdf branch from ea3b3b7 to 48ab107 Compare April 29, 2022 12:43

testing append new variable scenario to netcdf file

14a4045

bobturneruk mentioned this pull request May 9, 2022

Make main the default branch for consistency #55

Merged

2 tasks

bruvio added 5 commits June 15, 2022 11:10

refactor

4e70b8d

Merge branch 'dev' of github.com:FAIRDataPipeline/pyDataPipeline into…

fd5bdbd

… feature/netcdf

refactored code and split write array in two : prepare headers and wr…

0a03289

…ite data

fixed netcdf tests

08ff277

remove unused code

5d27bc1

bobturneruk requested a review from richardreeve June 23, 2022 10:24

bruvio added 3 commits June 27, 2022 11:51

refactored prepare headers to also set data variable

0b56506

refactoring

87d6ca0

refactoring

b8bbe74

refactoring according to new specifications and addying enum types

353a7af

bruvio marked this pull request as ready for review July 26, 2022 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/netcdf #53

Feature/netcdf #53

bruvio commented Mar 24, 2022

RyanJField commented Mar 24, 2022

bobturneruk commented Mar 28, 2022

bruvio commented Mar 29, 2022

bruvio commented Mar 29, 2022 •

edited

Loading

RyanJField commented Mar 29, 2022

bruvio commented Mar 31, 2022

bobturneruk commented Apr 1, 2022

bruvio commented Apr 3, 2022

bruvio commented Apr 28, 2022

sonarqubecloud bot commented Apr 29, 2022

bruvio commented Jun 22, 2022

bobturneruk commented Jun 23, 2022

richardreeve commented Jun 23, 2022

richardreeve commented Jun 24, 2022

bruvio commented Jun 27, 2022

sonarqubecloud bot commented Jul 11, 2022

bruvio commented Jul 26, 2022 •

edited

Loading

Feature/netcdf #53

Are you sure you want to change the base?

Feature/netcdf #53

Conversation

bruvio commented Mar 24, 2022

RyanJField commented Mar 24, 2022

bobturneruk commented Mar 28, 2022

bruvio commented Mar 29, 2022

bruvio commented Mar 29, 2022 • edited Loading

RyanJField commented Mar 29, 2022

bruvio commented Mar 31, 2022

bobturneruk commented Apr 1, 2022

bruvio commented Apr 3, 2022

bruvio commented Apr 28, 2022

sonarqubecloud bot commented Apr 29, 2022

bruvio commented Jun 22, 2022

bobturneruk commented Jun 23, 2022

richardreeve commented Jun 23, 2022

richardreeve commented Jun 24, 2022

bruvio commented Jun 27, 2022

sonarqubecloud bot commented Jul 11, 2022

bruvio commented Jul 26, 2022 • edited Loading

bruvio commented Mar 29, 2022 •

edited

Loading

bruvio commented Jul 26, 2022 •

edited

Loading