[DATA REQUEST] Add COSIMA Panantarctic / GFDL_OM4 Builder & Data #175

anton-seaice · 2024-06-25T03:01:53Z

Description of the data product

Location of the data product on Gadi

Checklist

Add a "x" between the brackets to all that apply

This data product is stable (unlikely to change substantially or move)
This data product is of use to the broader community
This data product is documented: link
This data product is licensed under
Those who want to access this data can be added to the project that houses it

anton-seaice · 2024-06-25T03:08:00Z

Following on from COSIMA/cosima-recipes#369 , I am suggesting maybe adding OM4_025.JRA_RYF to the intake catalog.

@dougiesquire - As this is a different model configuration, I guess this would require a new datastore "builder", so maybe its not worth the effort? The runs are used in cosima recipes to show examples of handling MOM6 data.

@adele-morrison - Are their companion runs to OM4_025.JRA_RYF which also should be added? Can you help with the "Description of the data product" and "Location of the data product on Gadi" sections, and then I will edit the original post please?

anton-seaice · 2024-07-10T05:19:10Z

I tried using the access-om3 builder, and got these errors when using builder.parser:

{'INVALID_ASSET': '/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ice_daily.nc', 'TRACEBACK': 'Traceback (most recent call last):\n File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.04/lib/python3.10/site-packages/access_nri_intake/source/builders.py", line 329, in parser\n raise ParserError(f"Cannot determine realm for file {file}")\naccess_nri_intake.source.builders.ParserError: Cannot determine realm for file /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ice_daily.nc\n'}

{'INVALID_ASSET': '/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ocean_daily.nc', 'TRACEBACK': 'Traceback (most recent call last):\n File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.04/lib/python3.10/site-packages/access_nri_intake/source/builders.py", line 329, in parser\n raise ParserError(f"Cannot determine realm for file {file}")\naccess_nri_intake.source.builders.ParserError: Cannot determine realm for file /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ocean_daily.nc\n'}

dougiesquire · 2024-07-11T00:00:32Z

Ah, yet another permutation of file naming. It might be safest just to write a dedicated builder, which is straightforward. I guess it would be an Om4Builder?

Is this output structured in a similar way to the regional MOM6 output? If so, it may be worth thinking about writing a builder that handles both?

adele-morrison · 2024-07-17T00:28:09Z

Apologies for being slow. Yes, lets add the panan experiments to Intake. We'd still to like delete a bunch of the daily data for the 1/20th panan, is that ok to do after it's added to Intake? After that frees up space on ol01 ideally I'd also like to move the 1/10th panan from ik11 to ol01. But the current locations are as follows:
/g/data/ol01/outputs/mom6-panan/panant-0025-zstar-ACCESSyr2/
and
/g/data/ol01/outputs/mom6-panan/panant-005-zstar-ACCESSyr2/
and
/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/

dougiesquire · 2024-07-17T00:36:18Z

We'd still to like delete a bunch of the daily data for the 1/20th panan, is that ok to do after it's added to Intake?

I think if we know this is going to happen then it would be better to wait until it is done. We can get a Builder set up and ready to go though.

marc-white · 2024-08-20T01:47:08Z

@anton-seaice could you please add the precise location(s) of the data on Gadi?

anton-seaice · 2024-08-20T01:50:10Z

@adele-morrison is more on top of it than I am ? Noting the comments above about possibly moving it.

/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF seem to be appropriate to get some sample data if thats what you are after ?

adele-morrison · 2024-08-20T01:59:00Z

Yes that’s the right location. Would be great to get this in the catalog so we can keep switching all the COSIMA recipes over. What do you need in terms of documentation?

adele-morrison · 2024-08-20T02:00:38Z

I don’t think there’s any plans to move OM4_025.JRA_RYF. The panan data location is still in flux. I will try to keep that moving forward.

marc-white · 2024-08-20T02:03:08Z

OK, I'll start taking a look at the current data structure and builders to see what needs to happen to get these data ingested. Stay tuned...

marc-white · 2024-08-20T04:37:11Z

The filenames all look pretty coherent, but there's a couple of things I haven't been able to work out on my own:

What is the 'static' frequency, e.g., 19000101.ocean_static.nc? I'm assuming this is some sort of snapshot - should this file be ingested?
There are 'scalar' versions of some files, e.g., 19000101.ocean_annual.nc and 19000101.ocean_scalar_annual.nc. Again, what do these represent, and should they be ingested?

minghangli-uni · 2024-08-20T05:09:10Z

What is the 'static' frequency, e.g., 19000101.ocean_static.nc

It contains fields that do not change in frequency, such as grid-related data. It is saved once per run.

19000101.ocean_annual.nc

contains annually-averaged 2d fields

19000101.ocean_scalar_annual.nc

contains annually-averaged 0d fields

anton-seaice · 2024-08-20T05:13:36Z

I think we want all of those files - there is a frequency = 'fx' for the static files which exists in OM2 and OM3 datastores (and maybe others)

marc-white · 2024-08-20T06:00:28Z

Ah yes, I've found the fx frequency down in the utils package - I might variable that out so it's clearer

marc-white · 2024-08-21T05:04:57Z

Dumping this here so I can find it later (for building workable test data): https://stackoverflow.com/questions/15141563/python-netcdf-making-a-copy-of-all-variables-and-attributes-but-one

marc-white · 2024-08-26T06:35:56Z

I now have what I think is a functional AccessOm4Builder that works on /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF. Are there some other data locations that I should be attacked as a check?

dougiesquire · 2024-08-26T06:44:45Z

@marc-white, we definitely don't want to call this AccessOm4Builder. The "OM4"
data at /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF is from GFDL OM4 (I think - @adele-morrison can you confirm?), not an ACCESS model.

I'd suggest seeing if the data mentioned in this comment can use the same builder. If so, then we could possibly call the builder Mom6Builder

marc-white · 2024-08-28T00:31:00Z

/g/data/ol01/outputs/mom6-panan/panant-0025-zstar-ACCESSyr2/
and
/g/data/ol01/outputs/mom6-panan/panant-005-zstar-ACCESSyr2/
and
/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/

I've updated the Builder to be able to read the filenames found in these directories. However, I've come across an interesting conundrum whilst trying to test the resulting catalog; the data in those three directories are, when ingested in to the catalog, pretty much identical, to the point where I can't figure out how to, say, get only the data from 0025-zstar (without resorting to the obvious solution of building a catalog only from that directory). This is causing me to have issues in forming a Dask array, because the catalog doesn't understand how to merge the files (I think it is ending up with three 'layers' of the same time series, and bombs out).

For the uninitiated like myself, what is the difference between these three runs, and how can I differentiate between them in an intake/access-nri-intake way?

dougiesquire · 2024-08-28T00:35:23Z

@marc-white, each of the experiments should be separate intake-esm datastores within the catalog.

marc-white · 2024-08-30T01:37:16Z

HI @anton-seaice and @adele-morrison , I'm now at the point where I'm ready to try an all-up ingest of the data. However, the metadata.yaml for OM4_025.JRA_RYF is incomplete, and doesn't exist for the mom6-panan datasets. Could you please add one for each dataset? Instructions are here: https://access-nri-intake-catalog.readthedocs.io/en/latest/management/building.html#metadata-yaml-files

adele-morrison · 2024-08-30T03:31:15Z

I've updated the metadata.yaml for OM4_025.JRA_RYF. I think @AndyHoggANU ran it, so some of the entries are currently just me guessing what the simulation is.

We're not quite ready to add the panan simulations ending in zstar-ACCESSyr2 to Intake yet (as above), because we still need to delete a bunch of that data and shift the 1/10th deg to ol01.

But we could add /g/data/ik11/outputs/mom6-panan/panant-01-zstar-v13 and panant-01-hycom1-v13 to Intake now.

adele-morrison · 2024-08-30T03:34:29Z

@AndyHoggANU any chance you want to create the metdata.yamls for panant-01-zstar-v13 and panant-01-hycom1-v13? Or @julia-neme perhaps you could do the panant-01-zstar-v13 one? That's what you used in your paper right?

adele-morrison · 2024-08-30T05:57:34Z

I've confirmed with @AndyHoggANU and metadata.yaml for OM4_025.JRA_RYF is good to go.

marc-white · 2024-10-22T02:34:10Z

We're getting there! Just one more request of @julia-neme and @AndyHoggANU - please ensure your experiment metadata.yaml files contain a model description (Andy, I know at least one of yours already does). This should be a list of models the experiment used, e.g., based on the description of OM4_025.JRA_RYF, a suitable entry may be:

model:
- MOM6
- SIS2

(I know that model isn't listed as required, but it turns out it is if the data doesn't contain this information - see #223 )

Also, could I please confirm that this list of experiments that we want to ingest is the correct one (I know there's a lot of back-and-forth above about this):

/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF
/g/data/ik11/outputs/mom6-panan/panant-01-hycom1-v13
/g/data/ik11/outputs/mom6-panan/panant-01-zstar-v13

adele-morrison · 2024-10-22T03:45:36Z

@marc-white confirming that your list of 3 experiments above to ingest is correct. Thanks!

AndyHoggANU · 2024-10-22T09:07:51Z

OK, I added the model flag to the OM4 and hycom1 case. @julia-neme will need to do it for zstar.

marc-white · 2024-11-07T00:20:28Z

@julia-neme have you been able to make this update for the zstar data?

julia-neme · 2024-11-07T14:35:19Z

Yes, done now. Sorry I was on leave!

marc-white · 2024-11-14T06:02:12Z

@charles-turner-1 I'm having some troubles on the branch for this issue with the tests (see here, specifically tests/test_builders::test_parse_access_ncfile).

Each of the new MOM6 test files is missing a coordinate(s) when doing the intake-esm-to-direct values comparison. I've had a look at a couple of the examples via PDB, and it seems like intake-esm is dropping the coords that are marked as only being the edges of data variables (e.g., z_i is being dropped from 19000101.ocean_annual_z.nc by intake-esm). Is this something you encountered in #222 or #232 ?

charles-turner-1 · 2024-11-14T06:24:04Z

Interesting - I don't remember this being an issue in access-nri-intake-catalog, but I do vaguely recall something similar to this happening with the tests in intake-esm. If you look here, I think it was necessary to request all variables known to be in the dataset when requesting coords.

I think that the varname parameter was being populated somewhere with variables indexed by the esm-datastore in our use case. Plausibly theses edges coords are being dropped here for reasons related to this?

I think the call to open_esm_datastore should pass all the variables that the parser has found in by ingesting variable column name. My guess is that for some reason the parser didn't index all of these & so intake-esm is throwing them away?

marc-white · 2024-11-15T05:51:48Z

@charles-turner-1 having stepped through the code line-by-line, I can see where, e.g., xTe and yTe are being dropped from 19000101.ice_daily: intake_esm/source.py. The offending line is:

ds = ds[variables]

Before this line, xTe and yTe still appear in the ds printout:

Pdb) ds.coords
Coordinates:
  * xT       (xT) float64 8B 0.0
  * xTe      (xTe) float64 8B 0.0
  * yT       (yT) float64 8B 0.0
  * yTe      (yTe) float64 8B 0.0
  * time     (time) object 3kB 1900-01-01 12:00:00 ... 1900-12-31 12:00:00
  * nv       (nv) float64 16B 1.0 2.0

And then, after that line:

(Pdb) ds
<xarray.Dataset> Size: 20kB
Dimensions:     (time: 365, yT: 1, xT: 1, nv: 2)
Coordinates:
  * xT          (xT) float64 8B 0.0
  * yT          (yT) float64 8B 0.0
  * time        (time) object 3kB 1900-01-01 12:00:00 ... 1900-12-31 12:00:00
  * nv          (nv) float64 16B 1.0 2.0

I'm presuming the ds=ds[variables] line is down-selecting to only those coordinates that appear to be relevant to the variables requested. xTe and yTe are not listed as being axes for any of the variables, but they are edges for a couple of them:

(Pdb) ds.variables
Frozen({'xT': <xarray.IndexVariable 'xT' (xT: 1)> Size: 8B
array([0.])
Attributes:
    domain_decomposition:  [   1 1440    1 1440]
    units:                 degrees_E
    long_name:             T point nominal longitude
    axis:                  X
    edges:                 xTe, 'xTe': <xarray.IndexVariable 'xTe' (xTe: 1)> Size: 8B
array([0.])
Attributes:
    units:      degrees_E
    long_name:  T-cell edge nominal longitude
    axis:       X, 'yT': <xarray.IndexVariable 'yT' (yT: 1)> Size: 8B
array([0.])
Attributes:
    domain_decomposition:  [   1 1080    1 1080]
    units:                 degrees_N
    long_name:             T point nominal latitude
    axis:                  Y
    edges:                 yTe, 'yTe': <xarray.IndexVariable 'yTe' (yTe: 1)> Size: 8B

The question here is, should these edges values exist in the catalog?

charles-turner-1 · 2024-11-17T23:22:00Z

Okay, couple of things I can think of here.

It looks to me like xT, xTe, yT, and yTe are all scalar variables. I know that there's some additional logic that intake-esm performs related to scalar variables, just before dropping variables that haven't been requested - are you able to check whether using an array that's got two coordinate values in the xT, xTe, yT, and yTe dimensions stops intake-esm dropping them?
Even though intake-esm is dropping scalar variables here, this logic should be repeated by the test. It's strange that it isn't - is the test manipulating xTe and yTe in any way at all?

I think that in the context of the test, xTe and yTe shouldn't be contained in the final output. However, they should still make it into the catalog - the test assumes that we've requested a variable & filters for that accordingly - the replicated logic being applied to xr_ds is from the if requested variables: ... block in intake_esm.source._open_dataset.

I think we might want to add additional tests which don't apply this logic?

marc-white · 2024-11-17T23:38:04Z

I'll try manipulating the test files and giving that a go.
The tests shouldn't be manipulating anything - it's simply your comparison test from Correctness checks for parse_access_ncfile #232 with new files added.

charles-turner-1 · 2024-11-17T23:47:45Z

Sorry, could have been clearer: I'd be curious to see whether xTe and yTe are altered by the test specifically here (starred lines):

1201    xr_ds = xr.open_dataset(file, **xarray_open_kwargs)
1202
1203 *   scalar_variables = [v for v in xr_ds.data_vars if len(xr_ds[v].dims) == 0]
1204 *   xr_ds = xr_ds.set_coords(scalar_variables)
1205 
1206   xr_ds = xr_ds[expected.variable]

marc-white · 2024-11-17T23:50:07Z

I've already removed my stack-tracing so I can have a go at de-scalaring the scalar variables, but I didn't think so? From memory, the reason I dived down into the intake-esm code was because I couldn't find a line in the test itself where I lost those variables. I can double-check that later though.

marc-white · 2024-11-18T01:20:05Z

Sorry, could have been clearer: I'd be curious to see whether xTe and yTe are altered by the test specifically here (starred lines):

1201 xr_ds = xr.open_dataset(file, **xarray_open_kwargs)
1202
1203 * scalar_variables = [v for v in xr_ds.data_vars if len(xr_ds[v].dims) == 0]
1204 * xr_ds = xr_ds.set_coords(scalar_variables)
1205
1206 xr_ds = xr_ds[expected.variable]

I think you might have this backwards - xr_ds (which is the 'direct open' of the relevant file) contains the missing xTe and yTe coordinates, while ie_ds (the intake-esm open of the file) does not. A pdb inspection of xr_ds before your highlighted lines confirms that xTe and yTe are already there:

(Pdb) xr_ds
<xarray.Dataset> Size: 20kB
Dimensions:     (xT: 1, xTe: 1, yT: 1, yTe: 1, time: 365, nv: 2)
Coordinates:
  * xT          (xT) float64 8B 0.0
  * xTe         (xTe) float64 8B 0.0
  * yT          (yT) float64 8B 0.0
  * yTe         (yTe) float64 8B 0.0
  * time        (time) object 3kB 1900-01-01 12:00:00 ... 1900-12-31 12:00:00
  * nv          (nv) float64 16B 1.0 2.0
Data variables:
    siconc      (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    sithick     (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    average_T1  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    average_T2  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    average_DT  (time) timedelta64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    time_bnds   (time, nv) timedelta64[ns] 6kB dask.array<chunksize=(365, 2), meta=np.ndarray>

Interestingly, xTe isn't actually scalar - it's a single-element array:

(Pdb) xr_ds["xTe"]
<xarray.DataArray 'xTe' (xTe: 1)> Size: 8B
array([0.])
Coordinates:
  * xTe      (xTe) float64 8B 0.0
Attributes:
    units:      degrees_E
    long_name:  T-cell edge nominal longitude
    axis:       X

This is the case both before and after your starred lines.

charles-turner-1 · 2024-11-18T01:44:21Z

Sorry, could have been clearer: I'd be curious to see whether xTe and yTe are altered by the test specifically here (starred lines):
1201 xr_ds = xr.open_dataset(file, **xarray_open_kwargs)
1202
1203 * scalar_variables = [v for v in xr_ds.data_vars if len(xr_ds[v].dims) == 0]
1204 * xr_ds = xr_ds.set_coords(scalar_variables)
1205
1206 xr_ds = xr_ds[expected.variable]

I think you might have this backwards - xr_ds (which is the 'direct open' of the relevant file) contains the missing xTe and yTe coordinates, while ie_ds (the intake-esm open of the file) does not. A pdb inspection of xr_ds before your highlighted lines confirms that xTe and yTe are already there:

Yeah, I think this is what I would expect - am I correctly understanding that the subsequent operations being applied to xr_ds aren't dropping these variables then?

(Pdb) xr_ds
<xarray.Dataset> Size: 20kB
Dimensions:     (xT: 1, xTe: 1, yT: 1, yTe: 1, time: 365, nv: 2)
Coordinates:
  * xT          (xT) float64 8B 0.0
  * xTe         (xTe) float64 8B 0.0
  * yT          (yT) float64 8B 0.0
  * yTe         (yTe) float64 8B 0.0
  * time        (time) object 3kB 1900-01-01 12:00:00 ... 1900-12-31 12:00:00
  * nv          (nv) float64 16B 1.0 2.0
Data variables:
    siconc      (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    sithick     (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    average_T1  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    average_T2  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    average_DT  (time) timedelta64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    time_bnds   (time, nv) timedelta64[ns] 6kB dask.array<chunksize=(365, 2), meta=np.ndarray>

Interestingly, xTe isn't actually scalar - it's a single-element array:

I think xarray stores scalars internally as single element arrays, & then handles scalars by checking for the dimensions that these variables depend on? I figure this is to make the internal data structures consistent - eg. this issue.

(Pdb) xr_ds["xTe"]
<xarray.DataArray 'xTe' (xTe: 1)> Size: 8B
array([0.])
Coordinates:
  * xTe      (xTe) float64 8B 0.0
Attributes:
    units:      degrees_E
    long_name:  T-cell edge nominal longitude
    axis:       X

This is the case both before and after your starred lines.

I don't think the operations in these lines would do anything to the xTe, yTe variables themselves - instead, they're just being set to coordinates. My understanding of how ds[variables] then works is that it will retain all the data variables in variables, all the coordinates that those data variables depend on, and any coordinate variables explicitly requested.

I think I've misunderstood the source of the error: are xTe and yTe being retained by

ie_ds and not xr_ds?
xr_ds and not ie_ds?

If I'm understanding correctly now and it's case 2., what have you requested as variables? eg. in your test parametrisation, what is in VARS here?

_AccessNCFileInfo(
path = None, # type: ignore,
...
variable = VARS,
...

marc-white · 2024-11-18T01:50:18Z

ie_ds and not xr_ds?

xr_ds and not ie_ds?

@charles-turner-1 it's case #2:

(Pdb) ie_ds
<xarray.Dataset> Size: 20kB
Dimensions:     (time: 365, yT: 1, xT: 1, nv: 2)
Coordinates:
  * xT          (xT) float64 8B 0.0
  * yT          (yT) float64 8B 0.0
  * time        (time) object 3kB 1900-01-01 12:00:00 ... 1900-12-31 12:00:00
  * nv          (nv) float64 16B 1.0 2.0
Data variables:
    average_DT  (time) timedelta64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    siconc      (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    average_T1  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    time_bnds   (time, nv) timedelta64[ns] 6kB dask.array<chunksize=(365, 2), meta=np.ndarray>
    average_T2  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    sithick     (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
Attributes:
    file_format:      NETCDF4
    NumFilesInSet:    1
    title:            #
    grid_type:        regular
    grid_tile:        N/A
    intake_esm_vars:  ['average_DT', 'siconc', 'average_T1', 'time_bnds', 'av...
(Pdb) xr_ds
<xarray.Dataset> Size: 20kB
Dimensions:     (xT: 1, xTe: 1, yT: 1, yTe: 1, time: 365, nv: 2)
Coordinates:
  * xT          (xT) float64 8B 0.0
  * xTe         (xTe) float64 8B 0.0
  * yT          (yT) float64 8B 0.0
  * yTe         (yTe) float64 8B 0.0
  * time        (time) object 3kB 1900-01-01 12:00:00 ... 1900-12-31 12:00:00
  * nv          (nv) float64 16B 1.0 2.0
Data variables:
    siconc      (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    sithick     (time, yT, xT) float32 1kB dask.array<chunksize=(365, 1, 1), meta=np.ndarray>
    average_T1  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    average_T2  (time) datetime64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    average_DT  (time) timedelta64[ns] 3kB dask.array<chunksize=(365,), meta=np.ndarray>
    time_bnds   (time, nv) timedelta64[ns] 6kB dask.array<chunksize=(365, 2), meta=np.ndarray>
Attributes:
    file_format:    NETCDF4
    NumFilesInSet:  1
    title:          #
    grid_type:      regular
    grid_tile:      N/A

The test parametrization is as follows (I worked this out by adding what was required to make the first half of the test pass):

(
            builders.Mom6Builder,
            "mom6/output000/19000101.ice_daily.nc",
            _AccessNCFileInfo(
                path=None,  # type: ignore
                filename="19000101.ice_daily.nc",
                file_id="XXXXXXXX_ice_daily",
                filename_timestamp="19000101",
                frequency="subhr",
                start_date="1900-01-01, 00:00:00",
                end_date="1900-01-01, 00:00:00",
                variable=[
                    "xT",
                    "xTe",
                    "yT",
                    "yTe",
                    "time",
                    "nv",
                    "siconc",
                    "sithick",
                    "average_T1",
                    "average_T2",
                    "average_DT",
                    "time_bnds",
                ],
                variable_long_name=[
                    "T point nominal longitude",
                    "T-cell edge nominal longitude",
                    "T point nominal latitude",
                    "T-cell edge nominal latitude",
                    "time",
                    "vertex number",
                    "ice concentration",
                    "ice thickness",
                    "Start time for average period",
                    "End time for average period",
                    "Length of average period",
                    "time axis boundaries",
                ],
                variable_standard_name=[
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                ],
                variable_cell_methods=[
                    "",
                    "",
                    "",
                    "",
                    "",
                    "",
                    "time: mean",
                    "time: mean",
                    "",
                    "",
                    "",
                    "",
                ],
                variable_units=[
                    "degrees_E",
                    "degrees_E",
                    "degrees_N",
                    "degrees_N",
                    "days since 1900-01-01 00:00:00",
                    "",
                    "0-1",
                    "m-ice",
                    "days since 1900-01-01 00:00:00",
                    "days since 1900-01-01 00:00:00",
                    "days",
                    "days",
                ],
            ),
        ),

This should be all committed in the 175 branch.

charles-turner-1 · 2024-11-18T01:52:10Z

Cool, I'll check out the branch & see if I can figure out whats up.

charles-turner-1 · 2024-11-18T02:43:40Z

@marc-white What version of intake-esm are you using to test against? I started digging into the tests - only to find they all mysteriously started passing.

In tests/conftest.py, I added some functionality which added xfails to these tests - but I hadn't written it as a regex so it didn't capture the additional cases by default. Unfortunately, lots of the test files don't have coordinate variables that are missed without coordinate discovery enabled, & so we can't easily use a regex to sort this.

Purely my cockup here - I really should have made reference to the conftest.py in that test - I've added it in the updated tests. Sorry for the holdup

charles-turner-1 · 2024-11-18T02:50:22Z

I've pushed a commit which fixes the test failures to the head of 175 & tests are all passing now.

If you can see a way of improving the way these dynamic xfails are handled, I think that would be a great shout - I'd rather not cause any more of these painful issues if we can avoid it. Once we get a new release of intake-esm out it shouldn't be an issue, but in the interim I can see this causing some more issues if we're not careful.

marc-white · 2024-11-18T03:11:38Z

Ah right, so we are actually expecting to see unused coordinates thrown away. Gotcha.

I think there is a wider thing to consider here though. From what I can tell, the coordinates that have been thrown away in this instance are 'edge' coordinates. They seem to define the edge behaviour of the main coordinates. However, those coordinates are used. Therefore, should these 'second-order' coordinates be something we should be preserving?

(Also, looks like I'm currently on intake-esm=2023.11.10.)

charles-turner-1 · 2024-11-18T03:35:28Z

I think there is a wider thing to consider here though. From what I can tell, the coordinates that have been thrown away in this instance are 'edge' coordinates. They seem to define the edge behaviour of the main coordinates. However, those coordinates are used. Therefore, should these 'second-order' coordinates be something we should be preserving?

Yeah, I think we probably do want to keep these hanging around. Are you able to open an issue about this on intake-esm - I'll start looking into it more closely once I get done with this E2E test.

(Also, looks like I'm currently on intake-esm=2023.11.10.)

Cool, so I think if you were to create a new environment & install the access-nri-intake-catalog package, then pip install git+https://github.com/intake/intake-esm && XFAILS=0 pytest tests should give you 320 passed tests, whereas pip install intake-esm && XFAILS=1 pytest tests should give you 313 passed tests & 7 xfailed.

Once we get a new release of intake-esm out, we should be able to get rid of the xfails completely. What this means in practical terms for the point above is that if the edges / second-order coordinates have been erroneously chucked away, the user will at least be able to directly search for them.

But to answer the question more directly, yes, I think the desired behaviour is probably that there should be some sort of 'coordinate tree' traversal, rather than just going up a single level & including those coordinates.

marc-white · 2024-11-18T04:22:19Z

Interesting, I'm now getting two test failures on the Gadi access-med-0.6 environment that I don't get on my laptop from the MOM6 data, despite having the same intake/intake-esm. I might generate a PR just to see what the CI comes up with...

anton-seaice · 2024-11-19T04:19:11Z

I think there is a wider thing to consider here though. From what I can tell, the coordinates that have been thrown away in this instance are 'edge' coordinates. They seem to define the edge behaviour of the main coordinates. However, those coordinates are used. Therefore, should these 'second-order' coordinates be something we should be preserving?

I think as long as you could request the 'edge' coordinates from the catalog, then its sufficient. They don't need to be attached automatically.

anton-seaice added the data request Add data to the catalog label Jun 25, 2024

review-notebook-app bot mentioned this issue Jul 1, 2024

Intake conversion Model_Agnostic_Analysis COSIMA/cosima-recipes#378

Draft

This was referenced Jul 16, 2024

Converting notebooks from COSIMA Cookbook to ACCESS-NRI intake catalog COSIMA/cosima-recipes#313

Open

Created neutral density example notebook COSIMA/cosima-recipes#417

Merged

anton-seaice mentioned this issue Aug 19, 2024

Intake conversion TemperatureSalinityDiagrams COSIMA/cosima-recipes#371

Open

marc-white self-assigned this Aug 20, 2024

rbeucher assigned charles-turner-1 Sep 12, 2024

marc-white mentioned this issue Oct 17, 2024

[BUG] Make model required for metadata.yaml files #223

Open

charles-turner-1 mentioned this issue Nov 7, 2024

Stable version numbering (ie 1.x.x) #230

Open

marc-white mentioned this issue Nov 7, 2024

Switch to a date-based catalog versioning system, and related updates #243

Merged

rbeucher added this to Model Evaluation & Diagnostics Nov 12, 2024

github-project-automation bot moved this to Todo in Model Evaluation & Diagnostics Nov 12, 2024

marc-white linked a pull request Nov 18, 2024 that will close this issue

Add MOM6 support (om4 025jra ryf) #258

Open

This was referenced Nov 18, 2024

Add MOM6 support (om4 025jra ryf) #258

Open

[BUG] Build function takes min/max version from live directory always, instead of build directory #263

Open

[DATA REQUEST] Add COSIMA Panantarctic / GFDL_OM4 Builder & Data #175

[DATA REQUEST] Add COSIMA Panantarctic / GFDL_OM4 Builder & Data #175

Comments

anton-seaice commented Jun 25, 2024 • edited by marc-white Loading

Description of the data product

Location of the data product on Gadi

Checklist

anton-seaice commented Jun 25, 2024

anton-seaice commented Jul 10, 2024

dougiesquire commented Jul 11, 2024

adele-morrison commented Jul 17, 2024

dougiesquire commented Jul 17, 2024

marc-white commented Aug 20, 2024

anton-seaice commented Aug 20, 2024

adele-morrison commented Aug 20, 2024

adele-morrison commented Aug 20, 2024

marc-white commented Aug 20, 2024

marc-white commented Aug 20, 2024

minghangli-uni commented Aug 20, 2024

anton-seaice commented Aug 20, 2024

marc-white commented Aug 20, 2024

marc-white commented Aug 21, 2024

marc-white commented Aug 26, 2024

dougiesquire commented Aug 26, 2024

marc-white commented Aug 28, 2024

dougiesquire commented Aug 28, 2024

marc-white commented Aug 30, 2024

adele-morrison commented Aug 30, 2024 • edited Loading

adele-morrison commented Aug 30, 2024

adele-morrison commented Aug 30, 2024

marc-white commented Oct 22, 2024 • edited Loading

adele-morrison commented Oct 22, 2024

AndyHoggANU commented Oct 22, 2024

marc-white commented Nov 7, 2024

julia-neme commented Nov 7, 2024

marc-white commented Nov 14, 2024

charles-turner-1 commented Nov 14, 2024 • edited Loading

marc-white commented Nov 15, 2024

charles-turner-1 commented Nov 17, 2024

marc-white commented Nov 17, 2024

charles-turner-1 commented Nov 17, 2024 • edited Loading

marc-white commented Nov 17, 2024

marc-white commented Nov 18, 2024

charles-turner-1 commented Nov 18, 2024

marc-white commented Nov 18, 2024

charles-turner-1 commented Nov 18, 2024

charles-turner-1 commented Nov 18, 2024

charles-turner-1 commented Nov 18, 2024 • edited Loading

marc-white commented Nov 18, 2024 • edited Loading

charles-turner-1 commented Nov 18, 2024 • edited Loading

marc-white commented Nov 18, 2024

anton-seaice commented Nov 19, 2024

anton-seaice commented Jun 25, 2024 •

edited by marc-white

Loading

adele-morrison commented Aug 30, 2024 •

edited

Loading

marc-white commented Oct 22, 2024 •

edited

Loading

charles-turner-1 commented Nov 14, 2024 •

edited

Loading

charles-turner-1 commented Nov 17, 2024 •

edited

Loading

charles-turner-1 commented Nov 18, 2024 •

edited

Loading

marc-white commented Nov 18, 2024 •

edited

Loading

charles-turner-1 commented Nov 18, 2024 •

edited

Loading