Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot allocate very large (global ~1km) ESMF grid #29

Open
JiaweiZhuang opened this issue Aug 26, 2018 · 19 comments
Open

Cannot allocate very large (global ~1km) ESMF grid #29

JiaweiZhuang opened this issue Aug 26, 2018 · 19 comments
Labels

Comments

@JiaweiZhuang
Copy link
Owner

@sdeastham failed to use xESMF to regrid global ~1 km resolution data. The issue is that ESMPy cannot create very large grid object. A minimal example is:

import numpy as np
import ESMF

# not-so-large grid works
grid = ESMF.Grid(np.array([20000, 10000]), 
                 staggerloc=ESMF.StaggerLoc.CENTER, 
                 coord_sys=ESMF.CoordSys.SPH_DEG)

# larger grid breaks
grid = ESMF.Grid(np.array([20000, 15000]), 
                 staggerloc=ESMF.StaggerLoc.CENTER, 
                 coord_sys=ESMF.CoordSys.SPH_DEG)

The last command leads to TypeError: buffer is too small for requested array:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-e629ff74bc4d> in <module>()
      1 grid = ESMF.Grid(np.array([20000, 15000]), 
      2                  staggerloc=ESMF.StaggerLoc.CENTER,
----> 3                  coord_sys=ESMF.CoordSys.SPH_DEG)

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/util/decorators.py in new_func(*args, **kwargs)
     62 
     63         esmp = esmpymanager.Manager(debug = False)
---> 64         return func(*args, **kwargs)
     65     return new_func
     66 

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in __init__(self, max_index, num_peri_dims, periodic_dim, pole_dim, coord_sys, coord_typekind, staggerloc, filename, filetype, reg_decomp, decompflag, is_sphere, add_corner_stagger, add_user_area, add_mask, varname, coord_names, tilesize, regDecompPTile, name)
    451         # Add coordinates if a staggerloc is specified
    452         if staggerloc is not None:
--> 453             self.add_coords(staggerloc=staggerloc, from_file=from_file)
    454 
    455         # Add items if they are specified, this is done after the

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in add_coords(self, staggerloc, coord_dim, from_file)
    797 
    798                 # and now for Python
--> 799                 self._allocate_coords_(stagger, from_file=from_file)
    800 
    801                 # set the staggerlocs to be done

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in _allocate_coords_(self, stagger, localde, from_file)
   1022         if (self.ndims == self.rank) or (self.ndims == 0):
   1023             for xyz in range(self.rank):
-> 1024                 self._link_coord_buffer_(xyz, stagger, localde)
   1025         # and this way if we have 1d coordinates
   1026         elif self.ndims < self.rank:

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in _link_coord_buffer_(self, coord_dim, stagger, localde)
   1072         lb, ub = ESMP_GridGetCoordBounds(self, staggerloc=stagger, localde=localde)
   1073 
-> 1074         gridCoordP = ndarray_from_esmf(data, self.type, ub-lb)
   1075 
   1076         # alias the coordinates to a grid property

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/util/esmpyarray.py in ndarray_from_esmf(data, dtype, shape)
     37 
     38     esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
---> 39                            buffer, order="F")
     40 
     41     return esmfarray

TypeError: buffer is too small for requested array

@bekozi Any idea about this?

@JiaweiZhuang
Copy link
Owner Author

Also, what's the scientific use case for this @sdeastham? I thought that global ~1 km data are so rare right now, so didn't pay much attention to such a case (#26 (comment)). I know @bekozi is pretty interested in regridding ultra-high resolution data. Maybe a chance to collaborate.

@sdeastham
Copy link

sdeastham commented Aug 26, 2018 via email

@bekozi
Copy link

bekozi commented Aug 27, 2018

@sdeastham This grid is a good test case for ultra-high resolution conservative regridding. Right now, the workflow for regridding data at this resolution works with netCDF (I see the highest resolution grid is not in netCDF format - assuming this is the dataset you are interested). I am migrating the workflow to use xarray-dask/xESMF so GeoTiff will be supported. It's good to hear you have workarounds as it will take some time to get this working. If your timeline is more pressing for the high resolution, let me know...

Also thanks to @JiaweiZhuang for getting this on the tracker!

Note: It often makes sense to store high resolution sparse data like these in an unstructured format if netCDF is required.

@sdeastham
Copy link

sdeastham commented Aug 27, 2018 via email

@NicWayand
Copy link

Also hitting this limit working with some Nasa Ice Bridge data.
+1 for making this a priority. (Wish I new more about the ESMF backend to help...)

@bekozi
Copy link

bekozi commented Sep 13, 2018

@NicWayand thanks for the data link. I'm approaching something workable as example/dev code that can handle xarray datasets from netCDF or rasterio. Would you be interested in helping with the xESMF code in an example notebook? Basically, it would be the piece of code that takes source and destination xarray spatial chunks and regrids them. Hopefully it will be pretty straightforward. EDIT: It may be a little complex, as I'm still unsure how best to use dask.

@NicWayand
Copy link

NicWayand commented Sep 17, 2018

Hi @bekoi, I am interested in helping but not clear exactly what your asking for? Just an example using xesmf where this issue is hit?

@bekozi
Copy link

bekozi commented Sep 18, 2018

@NicWayand thanks and sorry for not being more clear. I've realized I won't have time to learn and use xESMF properly and was fishing for some expertise. I had some example code as part of #26, but that did not use xarray as the IO backend. The next version will use xarray exclusively for IO (allows GeoTiff, etc.).

I'm looking for help with the bit of code that regrids source/destination spatial chunks within a dask task. This is the location in the old code where the regridding would go. Again, this example will be evolving, but it is a similar approach at heart. However, I am not sure this is the right approach for using dask so am looking for ideas. With big gridded data, we want to avoid loading everything on the root task. Lazy loading may be the ticket, but I still think it would be best to isolate processing within tasks as opposed to having a bunch of computation on the root with communication.

Anyway, no pressure of course. I'll post another PR here, and we can go from there!

@ZhangAllen98
Copy link

ZhangAllen98 commented May 17, 2019

Is there any progress on this issue about the "TypeError: buffer is too small for requested array"?
I want to do a regridding work from a global ~1km resolution to a regional domain(maybe 9 or 3 or 1km resolution, the domain is only a small part compared to the global),but it failed.

@JiaweiZhuang
Copy link
Owner Author

I want to do a regridding work from a global ~1km resolution to a regional domain(maybe 9 or 3 or 1km resolution, the domain is only a small part compared to the global),but it failed.

Try cropping the global data (with ds.sel()/ds.isel()) before building the regridder? This will avoid creating large ESMF grid objects.

@JiaweiZhuang
Copy link
Owner Author

JiaweiZhuang commented May 17, 2019

As for TypeError: buffer is too small for requested array, I am still not sure what's happening.

From the source code addon/ESMPy/src/ESMF/util/esmpyarray.py, the function ndarray_from_esmf() is trying to create numpy array from ESMF Fortran array:

import ESMF.api.constants as constants
import numpy as np
import numpy.ma as ma
import ctypes as ct
import sys

def ndarray_from_esmf(data, dtype, shape):
    '''
    :param data: buffer of fortran allocated ESMF array
    :type data: ctypes void_p
    :param dtype: the type of the esmf buffer
    :type dtype: ESMF.TypeKind
    :param shape: N-D Python shape corresponding to 1D ESMF allocation
    :type shape: list or tuple
    :return: numpy array representing the data with dtype and shape
    '''
    # find the size of the local coordinates
    size = np.prod(shape[:]) * \
           np.dtype(constants._ESMF2PythonType[dtype]).itemsize

    # create a numpy array to point to the ESMF data allocation
    if sys.version_info[0] >= 3:
        buffer = ct.pythonapi.PyMemoryView_FromMemory
        buffer.restype = ct.py_object
        buffer = buffer(data, ct.c_int(size), 0x200)
    else:
        buffer = np.core.multiarray.int_asbuffer(
            ct.addressof(data.contents), size)


    esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
                           buffer, order="F")

    return esmfarray

The error happens in the call:

    esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
                           buffer, order="F")

The minimal code to get the same error is:

np.ndarray(shape=3, buffer=np.array([0.0, 1.0]), dtype=np.float64) 
# buffer size is smaller than the array size, leads to "buffer is too small for requested array"

It is probably because the earlier call buffer = buffer(data, ct.c_int(size), 0x200) doesn't allocate enough memory. buffer is a PyMemoryView_FromMemory() function, which takes three arguments:

  • data is the memory location allocated upstream and passed to ndarray_from_esmf(). It was created by a data = ESMP_GridGetCoordPtr(...) call from addon/ESMPy/src/ESMF/api/grid.py. Then it goes into deep Fortran code which I haven't had time digging into.
  • ct.c_int(size) is the requested memory in bytes. For the large grid shape (20000, 15000) used here, size == 20000*15000*8 == 2400000000
  • 0x200 (512) is simply the constant PyBUF_WRITE.

Maybe @bekozi can elaborate more on this. Probably some internal memory constraints in ESMF?

@ZhangAllen98
Copy link

By cropping it to a regional domain, it finally works.
By the way, it seems that for 'conservative' method, the 'lat_b','lon_b' are needed. But is there methods to create these two variables from the original gird.
For example, the grid information of input data after cropping is
Grid coordinates : 1 : lonlat : points=3240000 (1800x1800) lon : -94.9958 to -80.0042 by 0.00833333 degrees_east lat : 30.0042 to 44.9958 by 0.00833333 degrees_north
The resolution is about 1km, the unit of given interested variable is ton/cell.
(Here does cell mean about 1km*1km???)
While for the out grid, it is
# gridID 1 gridtype = curvilinear gridsize = 9801 xname = XLONG xlongname = longitude xunits = degree_east yname = XLAT ylongname = latitude yunits = degree_north xsize = 99 ysize = 99 xvalues = ... yvalues = ...
the resolution may be 9 km.
Thanks!

@JiaweiZhuang
Copy link
Owner Author

By the way, it seems that for 'conservative' method, the 'lat_b','lon_b' are needed. But is there methods to create these two variables from the original gird.

Right, currently the grid cell bounds (lon_b & lat_b) are separated/decoupled from lon & lat. Slicing doesn't automatically happen for "size N+1" coordinate variables. See #13 (comment).

@thomasastanley
Copy link

For what it's worth, this is the use case for which I installed xesmf: producing 1km global grids. These are slow with other methods.

raphaeldussin pushed a commit to raphaeldussin/xESMF that referenced this issue Jul 21, 2021
raphaeldussin added a commit to raphaeldussin/xESMF that referenced this issue Jul 21, 2021
@Jing25
Copy link

Jing25 commented Sep 15, 2021

Same issue when trying to downscale 1000x1000 1km data to 30000x30000 30m. Any updates?

@ManyAngledOne
Copy link

Just ran into this today when trying to regrid high resolution land use and vegetation density data.

@hxawax
Copy link

hxawax commented Mar 31, 2023

Same issue with a Field of dim 1193x843x8736 grid (X Y T) with a total size of 70286291712
Look like the constraint is on the Fortran side.

@dluks
Copy link

dluks commented Aug 7, 2023

I am also running into this issue when attempting to resample 0.25 deg data to 30 second grids (for use with a global ML model). With the wide use now of ML for producing global maps, I think that the need for handling high-resolution global grids is no longer a fringe case.

@sdeastham
Copy link

I think this repo is unfortunately dead and/or dormant. The sparselt (https://github.com/LiamBindle/sparselt) package by @LiamBindle may be a viable alternative for regridding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants