Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf dimensions get messed up if they are out of order and if axis attributes are available #680

Open
dschlaep opened this issue Apr 17, 2024 · 12 comments

Comments

@dschlaep
Copy link

Function .get_dims() sorts dimensions according to the canonical axis order if "axis" attributes are provided by the netcdf.

However, this doesn't seem to work as expected if the dimensions in the netcdf are not already sorted. This seems to be caused by line 521 of file R/ncdf.R (https://github.com/r-spatial/stars/blob/f0156c9266ba16a8c4efc8da9702f053e98c871a/R/ncdf.R#L521C5-L521C48) which uses

  `dims` <- dims[match(dims$id, dim_matcher), ]`

instead of

  `dims <- dims[match(dim_matcher, dims$id, nomatch = 0L), ]`

(or similar).

It would be great if stars could read such netcdfs, thanks!

packageVersion("stars")
#> [1] '0.6.5'
packageVersion("ncmeta")
#> [1] '0.4.0'
packageVersion("sf")
#> [1] '1.0.16'

# Demonstration
dim_matcher <- c(1L, 0L, 3L)

dims <- data.frame(
  id = c(3L, 1L, 0L),
  name = c("time", "lon", "lat"),
  length = c(9, 2, 3),
  unlim = c(FALSE, FALSE, FALSE),
  coord_dim = c(TRUE, TRUE, TRUE),
  coord_var = c("time", "lon", "lat"),
  axis = c("T", "X", "Y")
)

# Existing implementation: axis order ends up as Y - T - X
dims[match(dims$id, dim_matcher), ]
#>   id name length unlim coord_dim coord_var axis
#> 3  0  lat      3 FALSE      TRUE       lat    Y
#> 1  3 time      9 FALSE      TRUE      time    T
#> 2  1  lon      2 FALSE      TRUE       lon    X

# Suggested fix: axis order is now X - Y - T
dims[match(dim_matcher, dims$id, nomatch = 0L), ]
#>   id name length unlim coord_dim coord_var axis
#> 2  1  lon      2 FALSE      TRUE       lon    X
#> 3  0  lat      3 FALSE      TRUE       lat    Y
#> 1  3 time      9 FALSE      TRUE      time    T


# Example with two netCDFs
tas <- array(
  data = rowSums(
    expand.grid(seq_len(9), 10 * (seq_len(2) - 1), 100 * (seq_len(3) - 1))
  ),
  dim = c(9, 2, 3)
)

# File1 has no "axis" attributes
file1 <- tempfile("tas_example_", fileext = ".nc")
nc <- RNetCDF::create.nc(file1)

id_lat <- RNetCDF::dim.def.nc(nc, "lat", 3)
iv_lat <- RNetCDF::var.def.nc(nc, "lat", "NC_FLOAT", id_lat)
RNetCDF::var.put.nc(nc, "lat", c(40, 45, 50))

id_lon <- RNetCDF::dim.def.nc(nc, "lon", 2)
iv_lon <- RNetCDF::var.def.nc(nc, "lon", "NC_FLOAT", id_lon)
RNetCDF::var.put.nc(nc, "lon", c(-100, -95))

id_bnds <- RNetCDF::dim.def.nc(nc, "bnds", 2)

id_time <- RNetCDF::dim.def.nc(nc, "time", 9)
iv_time <- RNetCDF::var.def.nc(nc, "time", "NC_INT", id_time)
RNetCDF::var.put.nc(nc, "time", 1:9)

iv_tas <- RNetCDF::var.def.nc(nc, "temperature", "NC_FLOAT", c(id_time, id_lon, id_lat))
RNetCDF::var.put.nc(nc, "temperature", tas)

RNetCDF::close.nc(nc)


# File2 has X, Y, T axis attributes
file2 <- tempfile("tas_example_", fileext = ".nc")
file.copy(from = file1, to = file2)
#> [1] TRUE
nc <- RNetCDF::open.nc(file2, write = TRUE)

RNetCDF::att.put.nc(nc, "lon", "axis", "NC_CHAR", "X")
RNetCDF::att.put.nc(nc, "lat", "axis", "NC_CHAR", "Y")
RNetCDF::att.put.nc(nc, "time", "axis", "NC_CHAR", "T")

RNetCDF::close.nc(nc)


# Read example files with `read_ncdf()`
stars::read_ncdf(file1) # Works well (no "axis" attributes -- `dim_matcher` is all NA)
#> no 'var' specified, using temperature
#> other available variables:
#>  lat, lon, time
#> Will return stars object with 54 cells.
#> Warning in .get_nc_projection(meta$attribute, rep_var, cv): No projection
#> information found in nc file.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>              Min. 1st Qu. Median Mean 3rd Qu. Max.
#> temperature     1   15.25    110  110  204.75  219
#> dimension(s):
#>      from to offset delta x/y
#> time    1  9    0.5     1 [x]
#> lon     1  2 -102.5     5 [y]
#> lat     1  3   37.5     5

stars::read_ncdf(file2) # Doesn't work as expected: dimensions are messed up
#> no 'var' specified, using temperature
#> other available variables:
#>  lat, lon, time
#> Will return stars object with 54 cells.
#> Warning in .get_nc_time(dimensions, make_time, coord_var, rep_var, meta):
#> ignoring units of time dimension, not able to interpret

#> Warning in .get_nc_time(dimensions, make_time, coord_var, rep_var, meta): No
#> projection information found in nc file.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>              Min. 1st Qu. Median Mean 3rd Qu. Max.
#> temperature     1   15.25    110  110  204.75  219
#> dimension(s):
#>      from to offset delta x/y
#> lat     1  9   37.5     5 [x]
#> time    1  2      1     1 [y]
#> lon     1  3 -102.5     5

# Clean up
unlink(file1)
unlink(file2)

Created on 2024-04-17 with reprex v2.1.0

@edzer
Copy link
Member

edzer commented Apr 18, 2024

Nice ;-)

We also have:

(x1 = stars::read_ncdf(file2))
# no 'var' specified, using temperature
# other available variables:
#  lat, lon, time
# Will return stars object with 54 cells.
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature     1   15.25    110  110  204.75  219
# dimension(s):
#      from to offset delta x/y
# lat     1  9   37.5     5 [x]
# time    1  2      1     1 [y]
# lon     1  3 -102.5     5    
# Warning messages:
# 1: In .get_nc_time(dimensions, make_time, coord_var, rep_var, meta) :
#   ignoring units of time dimension, not able to interpret
# 2: In .get_nc_projection(meta$attribute, rep_var, cv) :
#   No projection information found in nc file.
(x2 = stars::read_stars(file2))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#                               Min. 1st Qu. Median Mean 3rd Qu. Max.
# tas_example_5cbe857ee56c4.nc     1   15.25    110  110  204.75  219
# dimension(s):
#     from to offset delta x/y
# x      1  9      0     1 [x]
# y      1  2      2    -1 [y]
# lat    1  3     40     5    
(x3 = stars::read_mdim(file2))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature     1   15.25    110  110  204.75  219
# dimension(s):
#      from to offset delta x/y
# time    1  9    0.5     1 [x]
# lon     1  2 -102.5     5 [y]
# lat     1  3     40     5    
identical(as.vector(x1[[1]]), as.vector(x2[[1]]))
# [1] FALSE
identical(as.vector(x1[[1]]), as.vector(x3[[1]]))
# [1] TRUE

@edzer
Copy link
Member

edzer commented Apr 20, 2024

But also

identical(as.vector(x1[[1]]), as.vector(x2[[1]][,2:1,]))
# [1] TRUE

so that is because GDAL reverts the y axis direction; and

stars::read_mdim(file2, raster = c("lon", "lat"))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature     1   15.25    110  110  204.75  219
# dimension(s):
#      from to offset delta x/y
# time    1  9    0.5     1    
# lon     1  2 -102.5     5 [x]
# lat     1  3     40     5 [y]

where offset is ok for lon, but not for time and lat.

@edzer
Copy link
Member

edzer commented Apr 20, 2024

@mdsumner or @dblodgett-usgs could one of you take a look at the .get_dims() issue?

@dblodgett-usgs
Copy link
Contributor

I think this is all good -- thanks for the thorough report @dschlaep !! I incorporated your reprex into a test and made the change.

Question for @edzer I'm looking at the print.dimensions method and tracked to here: https://github.com/r-spatial/stars/blob/main/R/dimensions.R#L658

For the base example (without the axis attribute) we have:

netcdf tas_example_4ed43eafb7d {
dimensions:
        lat = 3 ;
        lon = 2 ;
        bnds = 2 ;
        time = 9 ;
variables:
        float lat(lat) ;
        float lon(lon) ;
        int time(time) ;
        float temperature(lat, lon, time) ;
data:

 lat = 40, 45, 50 ;

 lon = -100, -95 ;

 time = 1, 2, 3, 4, 5, 6, 7, 8, 9 ;

 temperature =
  1, 2, 3, 4, 5, 6, 7, 8, 9,
  11, 12, 13, 14, 15, 16, 17, 18, 19,
  101, 102, 103, 104, 105, 106, 107, 108, 109,
  111, 112, 113, 114, 115, 116, 117, 118, 119,
  201, 202, 203, 204, 205, 206, 207, 208, 209,
  211, 212, 213, 214, 215, 216, 217, 218, 219 ;
}

In this example, there's nothing to go on to figure out that the 'lat' and 'lon' variables should be interpreted as latitude and longitude, so the [x] and [y] labels aren't right. If I add standard_name and units it still gets the axis labels wrong.

netcdf tas_example_4ed4569a6c6 {
dimensions:
        lat = 3 ;
        lon = 2 ;
        bnds = 2 ;
        time = 9 ;
variables:
        float lat(lat) ;
                lat:standard_name = "latidude" ;
                lat:units = "degrees" ;
        float lon(lon) ;
                lon:standard_name = "longitude" ;
                lon:units = "degrees" ;
        int time(time) ;
                time:standard_name = "time" ;
                time:units = "days since 1900-01-01" ;
        float temperature(lat, lon, time) ;
data:

 lat = 40, 45, 50 ;

 lon = -100, -95 ;

 time = 1, 2, 3, 4, 5, 6, 7, 8, 9 ;

 temperature =
  1, 2, 3, 4, 5, 6, 7, 8, 9,
  11, 12, 13, 14, 15, 16, 17, 18, 19,
  101, 102, 103, 104, 105, 106, 107, 108, 109,
  111, 112, 113, 114, 115, 116, 117, 118, 119,
  201, 202, 203, 204, 205, 206, 207, 208, 209,
  211, 212, 213, 214, 215, 216, 217, 218, 219 ;
}

I see:

stars object with 3 dimensions and 1 attribute
attribute(s):
             Min. 1st Qu. Median Mean 3rd Qu. Max.
temperature     1   15.25    110  110  204.75  219
dimension(s):
     from to         offset  delta x/y
time    1  9 1900-01-02 UTC 1 days [x]
lon     1  2         -102.5      5 [y]
lat     1  3           37.5      5    

Should the dimension order be returning in x, y, t order here since we know what the order is? I'm don't remember how many places in stars this assumption is being made vs doing a little more introspection to figure out what the x and y axes are.

@edzer edzer closed this as completed in ee95dc1 Apr 22, 2024
edzer added a commit that referenced this issue Apr 22, 2024
honor axis attribute if different from native order -- fixes #680
@edzer
Copy link
Member

edzer commented Apr 22, 2024

@dblodgett-usgs thanks!

If I add standard_name and units it still gets the axis labels wrong.

If you change latidude into latitude, we get

> stars::read_ncdf("x.nc")
no 'var' specified, using temperature
other available variables:
 lat, lon, time
Will return stars object with 54 cells.
No projection information found in nc file. 
 Coordinate variable units found to be degrees, 
 assuming WGS84 Lat/Lon.
stars object with 3 dimensions and 1 attribute
attribute(s):
             Min. 1st Qu. Median Mean 3rd Qu. Max.
temperature     1   15.25    110  110  204.75  219
dimension(s):
     from to         offset  delta         refsys x/y
lon     1  3         -102.5      5 WGS 84 (CRS84) [x]
lat     1  9           37.5      5 WGS 84 (CRS84) [y]
time    1  2 1900-01-02 UTC 1 days        POSIXct    

which gets the names right, but the dimensions wrong (lon should be 2, lat 3, time 9).

edzer added a commit that referenced this issue Apr 22, 2024
@edzer
Copy link
Member

edzer commented Apr 22, 2024

This adds some dim name matching to read_mdim():

stars::read_mdim("x.nc")
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature     1   15.25    110  110  204.75  219
# dimension(s):
#      from to     offset  delta         refsys x/y
# time    1  9 1900-01-02 1 days           Date    
# lon     1  2     -102.5      5 WGS 84 (CRS84) [x]
# lat     1  3         40      5 WGS 84 (CRS84) [y]
stars::read_mdim("x.nc") |> aperm(c(2,3,1))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature     1   15.25    110  110  204.75  219
# dimension(s):
#      from to     offset  delta         refsys x/y
# lon     1  2     -102.5      5 WGS 84 (CRS84) [x]
# lat     1  3         40      5 WGS 84 (CRS84) [y]
# time    1  9 1900-01-02 1 days           Date    

dblodgett-usgs added a commit to dblodgett-usgs/stars that referenced this issue Apr 22, 2024
@dblodgett-usgs
Copy link
Contributor

@edzer -- see one more minor PR updating the test.

edzer added a commit that referenced this issue Apr 22, 2024
@edzer
Copy link
Member

edzer commented Apr 24, 2024

Still, read_ncdf gets the dimensions wrong:

     from to         offset  delta         refsys x/y
lon     1  3         -102.5      5 WGS 84 (CRS84) [x]
lat     1  9           37.5      5 WGS 84 (CRS84) [y]
time    1  2 1900-01-02 UTC 1 days        POSIXct    

@edzer edzer reopened this Apr 24, 2024
@dblodgett-usgs
Copy link
Contributor

Sorry, what do you see that's wrong there?

@edzer
Copy link
Member

edzer commented Apr 24, 2024

time should range from 1 to 9, lat from 1 to 3, lon from 1 to 2.

@edzer
Copy link
Member

edzer commented Apr 24, 2024

(note to self: read_mdim() gets the lat offset wrong)

edzer added a commit that referenced this issue Apr 24, 2024
@dblodgett-usgs
Copy link
Contributor

I see -- ok. I'll get in and see where that went wrong as soon as I can.

dblodgett-usgs added a commit to dblodgett-usgs/stars that referenced this issue Apr 24, 2024
dblodgett-usgs added a commit to dblodgett-usgs/stars that referenced this issue Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants