-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Record time dimension in stars objects using CFtime package #674
Comments
Absolutely! |
Do you also plan to provide
> structure(19804.5, class = "Date") |> as.POSIXct()
[1] "2024-03-22 12:00:00 UTC" |
First the good bits:
Assuming there will only be There are likely going to be other areas where the This supports conversion of a large proportion of NetCDF data sets out there, particularly re-analysis data such as ERA5 and other data sets of observational data. That includes climate projections that use the Next the major issue:
The non-standard calendars are used for climate projections and like data. That data is not intended to be assessed for individual absolute values and most certainly not for comparison of daily projections between different models, even when the calendars are the same. The projected data should also not be compared to observed data, they are not predictions after all. So when the I realize that many users are looking for a way to compare data from different models and observations on a daily basis. I have just not heard a compelling case anywhere why that would be necessary or advisable, even for processing efficiencies - I am open to be convinced if you have any case to present. Rather than accommodating the default argument of "put it all in a data.frame to analyze all data in one go" I would propose to explain why that is a bad idea. In more practical terms, I will happily write an extensive vignette explaining the issue and demonstrating how to do it the right way. What I will not do is fudge calendars to enable bad analyses. Given this, please let me know if you prefer "black magic" or |
So, replacing I see some parallel here with datum transformations, which is also approximate and black magic to most, but enables you to get combine datasets associated with one datum with that associated with another. Approximate and all, but despite the black magic important to have, and part of accepted scientific practice (there's no alternative). I see some of the necessity of being able to convert the funny (year_360, year_365) calendars to regular types ( How would you advice an econometrist or insurance person to combine economic forecasts associated with a regular |
I'm not sure that datum transformations would compare, unless people start producing their own transformation parameters. MAUP is a better analogue in my opinion, in that people may combine or compare things in inappropriate ways. But let's rest that discussion for now. How about the following: |
I think that is a great idea! |
Pinging @mdsumner and @dblodgett for information and possible feedback.
Time in
stars
stars
currently uses a combination ofPOSIXct
,Date
andPCICt
to represent the time dimension in raster data sets, predominantly (only?) those based on the NetCDF format.stars
selects between these representations based on the "calendar" reported in the metadata of variables, "360_day" and "365_day" withPCICt
and the remaining ones withPOSIXct
orDate
.Time information in many data sets is based on the
udunits
format, which is used in both the COARDS and CF Metadata Conventions, in common use for climate and weather data and other environmental data sets. That format is parsed in multiple locations, such asparse_netcdf_meta()
indimensions.R
,read_mdim()
inmdim.R
, and.get_nc_time()
inncdf.R
, with several helper functions for interpretation and writing. There are some differences in how the parsing is performed across these functions. As I see it, there are several issues with this:read_stars()
,read_mdim()
,read_ncdf()
) may result in different time representations.Date
inread_mdim()
. The offset values, however, are doubles and any fractional part is dropped. This is not a hypothetical possibility; the CMIP5/6 standards require that the time coordinate is recorded for the middle of the observation period ("For time-mean data, a time coordinate value must be defined as the mid-point of the interval over which the average is computed. (More generally, this same rule applies whenever time-bounds are included: the time coordinate value should be the mean of the two time bounds.)") so daily data is always recorded at 12 noon, monthly data at noon on day 15 for a month with 31 days, etc. There are also CMIP 6hr data sets that use "days since ..." with fractional offsets, so those data sets cannot be read withread_mdim()
at all without loss of data integrity. Writing data back to file willresult in a non-compliant time representation.
POSIXct
andPCICt
the time information is represented in seconds, which potentially leads to a loss of information. Upon writing astars
object to file the time representation may be different from the source. Inadd_units_attr()
inmdim.R
, for instance, an effort is made to select a time unit that gives integer offset values. So a CF-compliant day data set with "days since ..." and the time coordinate at noon will become "hours since ...".POSIXct
andPCICt
use1970-01-01 00:00:00
as their origin and time information from the file is converted to that origin. With data representing periods prior to that origin the offsets are negative. This is a violation of CMIP5/6 standards which require that "all values of the time coordinate should be positive and monotonically increasing". Climate projection data uses 1850-01-01 as the start of the historical simulation period so there is lots of data out there where this may produce non-compliant results. (I am not sure if the COARDS and CF Metadata Conventions themselves allow negative offsets.)read_ncdf()
. In.get_nc_time()
time information is identified only if there is an "axis = T" attribute. While many data sets (but far from all) will indeed report axis attributes this is not a requirement in the COARDS and CF Metadata Conventions. Both conventions state that "the attribute axis may be attached to a coordinate variable and given one of the values X, Y, Z or T". Further, "a reference time coordinate is identifiable from its units string alone" and "optionally, the time coordinate may be indicated additionally by providing the standard_name attribute with an appropriate value, and/or the axis attribute with the value T" (note the "may be"s). Checking just for the attribute "units = ..." is more reliable as there is no default and no alternative encoding.julian
calendar is not supported byPOSIXct
,Date
orPCICt
.In summary,
stars
is not consistent in how "time" is interpreted and represented, and there may be a loss of data integrity.Proposal
CFtime
is a package that supports management and interpretation of the "time" dimension as defined in the CF Metadata Conventions (and thus the COARDS convention). All 9 defined calendars are supported, as well as all reasonable units (second, minute, hour, day, month, year). The package is pure R and has no dependencies outside of default R packages. The package does not read or write files itself and it thus works with any file driver that can produce the required dimension data and stores time information using theudunits
format;RNetCDF
andncdf4
both work and the GDAL drivers for NetCDF (includingmdim
) can be easily made to work withCFtime
.I see several advantages to using
CFtime
instars
:CFtime
is a relatively complete and accurate implementation of "time" information as defined by the CF Metadata Conventions (and COARDS), supporting all defined calendars and all reasonable time units.CFtime
can handle time dimension information forread_stars()
,read_mdim()
andread_ncdf()
, providing consistency to the user ofstars
.PCICt
(see examples below).CFtime
has logic to determine completeness of time series that are not numerically equi-distant. Many CMIP5/6 data sets of monthly, seasonal or yearly data use a "days since ..." unit because theudunits
definition of units coarser than a day are not the same as those used in climate modeling and use of the units "month" and "year" is discouraged. So monthly data are recorded as being between 29.5 to 31 units apart.CFtime
will correctly interpret this as being a complete time series whileregular_intervals()
indimensions.R
reports it as irregular.CFtime
supports merging of time information from multiple source files.CFtime
can produce a logical vector for subsetting based on timestamps; this could support functions likeslice.stars()
using anindex
consisting of a character vector of length 1 or 2 with the (extreme) value(s of the range) to extract.CFtime
can produce calender-aware factors that can make a function likeaggregate.stars()
more versatile and user-friendly. This functionality goes beyond whataggregate.stars()
(currently) does, particularly with regards to epoch-based factors. Exposing thisCFtime
functionality throughstars
, or extendingaggregate.stars()
with these options, would give the users more modeling options.Examples
Package
CFtime
is being integrated into packagencmeta
, the dev version on GitHub already contains it. Time information is stored in the newextended
attribute of thenc_meta
object. Inncdf.R
, functions.get_time_meta()
,.get_nc_time()
andmake_cal_time2()
reduce to this:In
read_mdim()
, encapsulated functioncreate_units()
becomes:The same simplification can be made in
read_stars()
. Note that the call toget_pcict()
is gone. The onlyrefsys
that will be created for the time dimension isCFtime
(or something like that), making the code simpler in other places as well. If needed, aCFtime
object can create a vector ofPOSIXct
values if the calendar is compatible (standard
,gregorian
,proleptic_gregorian
):posix_time <- CFtimestamp(time, asPOSIX = TRUE)
.Implementation
Is there interest to evaluate the use of
CFtime
instars
?If so, I can put in time. As developer of
CFtime
I know that package in-and-out and I can make any requisite changes to smoothen integration. For the basic read/write/print operations I can integrate that functionality fairly quickly. I will most likely need some support on extending functions likeslice()
andaggregate()
, even if only on stress-testing modified code and sternly lecturing me on the tidyverse.The
ncmeta
package will be updated in the near future. TheCFtime
package will likely need an update as well.Any other areas of change that I have not identified could be included as well.
Thinking about a 2-3 month timeline to release to allow for additional features and stress-testing.
Happy to team up with other interested contributors.
Reference documents used above
The text was updated successfully, but these errors were encountered: