Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing documentation for creating a Scene without a filename or reader #2836

Closed
joleenf opened this issue Jun 23, 2024 · 6 comments · Fixed by #2868
Closed

Confusing documentation for creating a Scene without a filename or reader #2836

joleenf opened this issue Jun 23, 2024 · 6 comments · Fixed by #2868

Comments

@joleenf
Copy link
Contributor

joleenf commented Jun 23, 2024

The documentation in the Scene code indicates that a Scene object can be created without a filename or reader. The documentation refers to a Dataset, which looks like the nomenclature used to refer to an xarray Dataset. However, the "Dataset" in

     scn = Scene()
     scn['my_dataset'] = Dataset(my_data_array, **my_info)

Is referring to a Dataset object that preceded the use of xarray in SatPy, and the documentation itself should have been purged.

@djhoese
Copy link
Member

djhoese commented Jun 23, 2024

I'm really confused by that example code. The old Dataset didn't have anything to do with xarray, but this code is passing a DataArray apparently.

@joleenf
Copy link
Contributor Author

joleenf commented Jun 23, 2024

The example code is taken from the Scene documentation. I am also confused by the documentation code. Not certain if the "Dataset" referred to a Dataset from xarray, I used both

scn['my_dataset'] = Dataset(xr.DataArray, **my_info)

and
scn['my_dataset'] = xr.Dataset

with a dtype in the attrs among other things like the crs and a dataset name.

from satpy import Scene
new_scene = Scene()

# Load a Scene (not worrying about datetime for now...) from GOES-18 (west) and GOES-16 (east)
g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")

g16.load(["cloud_mask"])
g18.load(["cloud_mask"])

#  Save the area definition of the GOES-18 (west) satellite for later.
area_def18 = g18["cloud_mask"].attrs["area"]

# These global attributes are saved so that it is easier to record the original date of the projection information used.
c18_attrs = g16["cloud_mask"].attrs

# Resample the GOES-16 data into the GOES-18 projection.
g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")

# Mask the region where there is overlap.
g16_ol_in_g18space = xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0)

# Create a new scene.
new_scene["overlap_mask"] = Dataset(data_vars={'overlap': (['y', 'x'], g16_ol_in_g18space.data)},
                           coords=g18["cloud_mask"].coords,
                           attrs=dict(name="overlap_mask", long_name="overlap_mask", platform_name="GOES-18",
                                      start_time=c18_attrs["start_time"].isoformat(), flag_values=[0, 1],
                                      flag_meaning="0: No Overlap, 1: Overlap",
                                      dtype=g16_ol_in_g18space.data.dtype))

Creates a Scene and a dataset that looks kind of right?

new_scene["overlap_mask"]
Out[22]: 
<xarray.Dataset> Size: 60MB
Dimensions:    (y: 1500, x: 2500)
Coordinates:
    latitude   (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
    longitude  (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
    crs        object 8B PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["Unkno...
  * y          (y) float64 12kB 4.588e+06 4.586e+06 ... 1.586e+06 1.584e+06
  * x          (x) float64 20kB -2.504e+06 -2.502e+06 ... 2.502e+06 2.504e+06
Data variables:
    overlap    (y, x) int64 30MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
Attributes:
    name:           overlap_mask
    long_name:      overlap_mask
    platform_name:  GOES-18
    start_time:     2023-05-26T16:36:17.100000
    flag_values:    [0, 1]
    flag_meaning:   0: No Overlap, 1: Overlap
    dtype:          int64
    _satpy_id:      DataID(name='overlap_mask')

However,
new_scene.save_datasets(filename="test.nc")

raises an exception (I am assuming because I am using an xarray Dataset, rather than the Dataset that was intended in the documentation):

/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py:274: UserWarning: dtype int64 not compatible with CF-1.7.
  grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets,  # list of xr.DataArray
Traceback (most recent call last):
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-22641c74dec4>", line 1, in <module>
    new_scene.save_datasets(filename="test.nc")
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/scene.py", line 1293, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py", line 274, in save_datasets
    grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets,  # list of xr.DataArray
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 232, in collect_cf_datasets
    ds = _collect_cf_dataset(
         ^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 137, in _collect_cf_dataset
    new_dataarray = make_cf_data_array(new_dataarray,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 79, in make_cf_data_array
    dataarray = _preprocess_data_array_name(dataarray=dataarray,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 49, in _preprocess_data_array_name
    dataarray = dataarray.rename(new_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4282, in rename
    return self._rename(name_dict=name_dict, **names)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4217, in _rename
    name_dict = either_dict_or_kwargs(name_dict, names, "rename")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/namedarray/utils.py", line 193, in either_dict_or_kwargs
    raise ValueError(f"the first argument to .{func_name} must be a dictionary")
ValueError: the first argument to .rename must be a dictionary```

@djhoese
Copy link
Member

djhoese commented Jun 24, 2024

So a Scene should only ever contain DataArray objects except in I think a very small exception case (temporary container during MultiScene animation stuff). I mentioned on slack that there is a to_xarray method on the Scene which should give you a Dataset object that is xarray-savable with .to_netcdf() I think. That would be my first try for what you're doing, but regardless this documentation should be updated to use a DataArray and assigning to attrs= for the metadata unless I'm missing something.

@joleenf
Copy link
Contributor Author

joleenf commented Jun 24, 2024

Sorry, I had missed the "to_xarray" mention. to_xarray() works with to_netcdf() for me.

@joleenf
Copy link
Contributor Author

joleenf commented Jun 27, 2024

import xarray as xr

from satpy import Scene
from xarray import DataArray

g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")

g16.load(["cloud_mask"])
g18.load(["cloud_mask"])

area_def18 = g18["cloud_mask"].attrs["area"]

g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")

g18mask = Scene()
g18mask["overlap"] = DataArray(data=xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0),
                               dims=['y', 'x'],
                               attrs=dict(name="overlap_mask", long_name="G18FixedGrid_overlap_G16",
                                          platform_name="GOES-18",
                                          start_time=g16_ol_in_g18space["cloud_mask"].attrs["start_time"], flag_values=[0, 1],
                                          flag_meaning="0: No Overlap, 1: Overlap",
                                          area=area_def18))

g18mask.save_dataset(filename="{long_name}.nc", dataset_id="overlap")

works. So maybe it is just clarifying the old documentation?

@djhoese
Copy link
Member

djhoese commented Jun 27, 2024

Yeah it is likely as simple as changing Dataset to DataArray and using attrs= instead of ** on the info dictionary. Would you mind making a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants