Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zarr reads on the client don't work on latest cluster -- killed worker #265

Closed
rabernat opened this issue May 18, 2018 · 9 comments
Closed

Comments

@rabernat
Copy link
Member

When I go to actually load data from any of the zarr gcs datasets, I get an error like this

---------------------------------------------------------------------------
KilledWorker                              Traceback (most recent call last)
<ipython-input-6-0247f79e60da> in <module>()
      1 plt.rcParams['figure.figsize'] = (15, 8)
----> 2 ds.sla.sel(time='1982-08-07', method='nearest').plot()

/opt/conda/lib/python3.6/site-packages/xarray/plot/plot.py in __call__(self, **kwargs)
    372 
    373     def __call__(self, **kwargs):
--> 374         return plot(self._da, **kwargs)
    375 
    376     @functools.wraps(hist)

/opt/conda/lib/python3.6/site-packages/xarray/plot/plot.py in plot(darray, row, col, col_wrap, ax, rtol, subplot_kws, **kwargs)
    155     kwargs['ax'] = ax
    156 
--> 157     return plotfunc(darray, **kwargs)
    158 
    159 

/opt/conda/lib/python3.6/site-packages/xarray/plot/plot.py in newplotfunc(darray, x, y, figsize, size, aspect, ax, row, col, col_wrap, xincrease, yincrease, add_colorbar, add_labels, vmin, vmax, cmap, center, robust, extend, levels, infer_intervals, colors, subplot_kws, cbar_ax, cbar_kwargs, **kwargs)
    607 
    608         # Pass the data as a masked ndarray too
--> 609         zval = darray.to_masked_array(copy=False)
    610 
    611         _ensure_plottable(xval, yval)

/opt/conda/lib/python3.6/site-packages/xarray/core/dataarray.py in to_masked_array(self, copy)
   1453             Masked where invalid values (nan or inf) occur.
   1454         """
-> 1455         isnull = pd.isnull(self.values)
   1456         return np.ma.MaskedArray(data=self.values, mask=isnull, copy=copy)
   1457 

/opt/conda/lib/python3.6/site-packages/xarray/core/dataarray.py in values(self)
    402     def values(self):
    403         """The array's data as a numpy.ndarray"""
--> 404         return self.variable.values
    405 
    406     @values.setter

/opt/conda/lib/python3.6/site-packages/xarray/core/variable.py in values(self)
    385     def values(self):
    386         """The variable's data as a numpy.ndarray"""
--> 387         return _as_array_or_item(self._data)
    388 
    389     @values.setter

/opt/conda/lib/python3.6/site-packages/xarray/core/variable.py in _as_array_or_item(data)
    209     TODO: remove this (replace with np.asarray) once these issues are fixed
    210     """
--> 211     data = np.asarray(data)
    212     if data.ndim == 0:
    213         if data.dtype.kind == 'M':

/opt/conda/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    490 
    491     """
--> 492     return array(a, dtype, copy=False, order=order)
    493 
    494 

/opt/conda/lib/python3.6/site-packages/dask/array/core.py in __array__(self, dtype, **kwargs)
   1190 
   1191     def __array__(self, dtype=None, **kwargs):
-> 1192         x = self.compute()
   1193         if dtype and x.dtype != dtype:
   1194             x = x.astype(dtype)

/opt/conda/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
    152         dask.base.compute
    153         """
--> 154         (result,) = compute(self, traverse=False, **kwargs)
    155         return result
    156 

/opt/conda/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
    405     keys = [x.__dask_keys__() for x in collections]
    406     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 407     results = get(dsk, keys, **kwargs)
    408     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    409 

/opt/conda/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, **kwargs)
   2096             try:
   2097                 results = self.gather(packed, asynchronous=asynchronous,
-> 2098                                       direct=direct)
   2099             finally:
   2100                 for f in futures.values():

/opt/conda/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous)
   1506             return self.sync(self._gather, futures, errors=errors,
   1507                              direct=direct, local_worker=local_worker,
-> 1508                              asynchronous=asynchronous)
   1509 
   1510     @gen.coroutine

/opt/conda/lib/python3.6/site-packages/distributed/client.py in sync(self, func, *args, **kwargs)
    613             return future
    614         else:
--> 615             return sync(self.loop, func, *args, **kwargs)
    616 
    617     def __repr__(self):

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
    251             e.wait(10)
    252     if error[0]:
--> 253         six.reraise(*error[0])
    254     else:
    255         return result[0]

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in f()
    236             yield gen.moment
    237             thread_state.asynchronous = True
--> 238             result[0] = yield make_coro()
    239         except Exception as exc:
    240             error[0] = sys.exc_info()

/opt/conda/lib/python3.6/site-packages/tornado/gen.py in run(self)
   1097 
   1098                     try:
-> 1099                         value = future.result()
   1100                     except Exception:
   1101                         self.had_exception = True

/opt/conda/lib/python3.6/site-packages/tornado/gen.py in run(self)
   1105                     if exc_info is not None:
   1106                         try:
-> 1107                             yielded = self.gen.throw(*exc_info)
   1108                         finally:
   1109                             # Break up a reference to itself

/opt/conda/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1383                             six.reraise(type(exception),
   1384                                         exception,
-> 1385                                         traceback)
   1386                     if errors == 'skip':
   1387                         bad_keys.add(key)

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

KilledWorker: ('zarr-30761e7bf266279b7254c29da310d0bd', 'tcp://10.20.144.7:43398')

😭

@mrocklin
Copy link
Member

mrocklin commented May 18, 2018 via email

@rabernat
Copy link
Member Author

rabernat commented May 18, 2018 via email

@mrocklin
Copy link
Member

mrocklin commented May 18, 2018 via email

@rabernat
Copy link
Member Author

error from the worker

distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x04\x95}\n\x00\x00\x00\x00\x00\x00\x8c\x14xarray.core.indexing\x94\x8c!ImplicitToExplicitIndexingAdapter\x94\x93\x94)\x81\x94}\x94(\x8c\x05array\x94h\x00\x8c\x17LazilyOuterIndexedArray\x94\x93\x94)\x81\x94}\x94(h\x05\x8c\x17xarray.coding.variables\x94\x8c\x19_ElementwiseFunctionArray\x94\x93\x94)\x81\x94}\x94(h\x05h\x07)\x81\x94}\x94(h\x05\x8c\x14xarray.backends.zarr\x94\x8c\x10ZarrArrayWrapper\x94\x93\x94)\x81\x94}\x94(\x8c\tdatastore\x94h\x11\x8c\tZarrStore\x94\x93\x94)\x81\x94}\x94(\x8c\x02ds\x94\x8c\x0ezarr.hierarchy\x94\x8c\x05Group\x94\x93\x94)\x81\x94(\x8c\rgcsfs.mapping\x94\x8c\x06GCSMap\x94\x93\x94)\x81\x94\x8c\ngcsfs.core\x94\x8c\rGCSFileSystem\x94\x93\x94)\x81\x94}\x94(\x8c\x07project\x94\x8c\rpangeo-181919\x94\x8c\x06access\x94\x8c\x0cfull_control\x94\x8c\x05scope\x94\x8c7https://www.googleapis.com/auth/devstorage.full_control\x94\x8c\x0bconsistency\x94\x8c\x04none\x94\x8c\x05token\x94N\x8c\x07session\x94\x8c\x1egoogle.auth.transport.requests\x94\x8c\x11AuthorizedSession\x94\x93\x94)\x81\x94}\x94(\x8c\x07headers\x94\x8c\x13requests.structures\x94\x8c\x13CaseInsensitiveDict\x94\x93\x94)\x81\x94}\x94\x8c\x06_store\x94\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94(\x8c\nuser-agent\x94\x8c\nUser-Agent\x94\x8c\x16python-requests/2.18.4\x94\x86\x94\x8c\x0faccept-encoding\x94\x8c\x0fAccept-Encoding\x94\x8c\rgzip, deflate\x94\x86\x94\x8c\x06accept\x94\x8c\x06Accept\x94\x8c\x03*/*\x94\x86\x94\x8c\nconnection\x94\x8c\nConnection\x94\x8c\nkeep-alive\x94\x86\x94usb\x8c\x07cookies\x94\x8c\x10requests.cookies\x94\x8c\x11RequestsCookieJar\x94\x93\x94)\x81\x94}\x94(\x8c\x07_policy\x94\x8c\x0ehttp.cookiejar\x94\x8c\x13DefaultCookiePolicy\x94\x93\x94)\x81\x94}\x94(\x8c\x08netscape\x94\x88\x8c\x07rfc2965\x94\x89\x8c\x13rfc2109_as_netscape\x94N\x8c\x0chide_cookie2\x94\x89\x8c\rstrict_domain\x94\x89\x8c\x1bstrict_rfc2965_unverifiable\x94\x88\x8c\x16strict_ns_unverifiable\x94\x89\x8c\x10strict_ns_domain\x94K\x00\x8c\x1cstrict_ns_set_initial_dollar\x94\x89\x8c\x12strict_ns_set_path\x94\x89\x8c\x10_blocked_domains\x94)\x8c\x10_allowed_domains\x94N\x8c\x04_now\x94J\x93\x81\xffZub\x8c\x08_cookies\x94}\x94hkJ\x93\x81\xffZub\x8c\x04auth\x94N\x8c\x07proxies\x94}\x94\x8c\x05hooks\x94}\x94\x8c\x08response\x94]\x94s\x8c\x06params\x94}\x94\x8c\x06verify\x94\x88\x8c\x04cert\x94N\x8c\x08prefetch\x94N\x8c\x08adapters\x94hA)R\x94(\x8c\x08https://\x94\x8c\x11requests.adapters\x94\x8c\x0bHTTPAdapter\x94\x93\x94)\x81\x94}\x94(\x8c\x0bmax_retries\x94\x8c\x12urllib3.util.retry\x94\x8c\x05Retry\x94\x93\x94)\x81\x94}\x94(\x8c\x05total\x94K\x00\x8c\x07connect\x94N\x8c\x04read\x94\x89\x8c\x06status\x94N\x8c\x08redirect\x94N\x8c\x10status_forcelist\x94\x8f\x94\x8c\x10method_whitelist\x94(\x8c\x03GET\x94\x8c\x06DELETE\x94\x8c\x05TRACE\x94\x8c\x07OPTIONS\x94\x8c\x04HEAD\x94\x8c\x03PUT\x94\x91\x94\x8c\x0ebackoff_factor\x94K\x00\x8c\x11raise_on_redirect\x94\x88\x8c\x0fraise_on_status\x94\x88\x8c\x07history\x94)\x8c\x1arespect_retry_after_header\x94\x88ub\x8c\x06config\x94}\x94\x8c\x11_pool_connections\x94K\n\x8c\r_pool_maxsize\x94K\n\x8c\x0b_pool_block\x94\x89ub\x8c\x07http://\x94h\x7f)\x81\x94}\x94(h\x82h\x85)\x81\x94}\x94(h\x88K\x00h\x89Nh\x8a\x89h\x8bNh\x8cNh\x8d\x8f\x94h\x8fh\x96h\x97K\x00h\x98\x88h\x99\x88h\x9a)h\x9b\x88ubh\x9c}\x94h\x9eK\nh\x9fK\nh\xa0\x89ubu\x8c\x06stream\x94\x89\x8c\ttrust_env\x94\x88\x8c\rmax_redirects\x94K\x1eub\x8c\x06method\x94\x8c\x0egoogle_default\x94\x8c\rcache_timeout\x94N\x8c\x0e_listing_cache\x94}\x94ub\x8c0pangeo-data/cm2.6/control/temp_salt_u_v-5day_avg\x94\x86\x94b\x8c\x00\x94\x88N\x88Nt\x94b\x8c\n_read_only\x94\x88\x8c\r_synchronizer\x94N\x8c\x06_group\x94h\xb2\x8c\x06writer\x94\x8c\x16xarray.backends.common\x94\x8c\x0bArrayWriter\x94\x93\x94)\x81\x94}\x94(\x8c\x07sources\x94]\x94\x8c\x07targets\x94]\x94\x8c\x04lock\x94\x89ub\x8c\rdelayed_store\x94Nub\x8c\rvariable_name\x94\x8c\x04salt\x94\x8c\x05shape\x94(M\xb4\x05K2M\x8c\nM\x10\x0et\x94\x8c\x05dtype\x94\x8c\x05numpy\x94\x8c\x05dtype\x94\x93\x94\x8c\x02f4\x94K\x00K\x01\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bub\x8c\x03key\x94h\x00\x8c\x0cBasicIndexer\x94\x93\x94)\x81\x94}\x94\x8c\x04_key\x94(\x8c\x08builtins\x94\x8c\x05slice\x94\x93\x94NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94t\x94sbub\x8c\x04func\x94\x8c\tfunctools\x94\x8c\x07partial\x94\x93\x94h\n\x8c\x0b_apply_mask\x94\x93\x94\x85\x94R\x94(h\xe7)}\x94(\x8c\x13encoded_fill_values\x94\x8f\x94(\x8c\x15numpy.core.multiarray\x94\x8c\x06scalar\x94\x93\x94h\xcdC\x04\xecx\xad\xe0\x94\x86\x94R\x94\x90\x8c\x12decoded_fill_value\x94G\x7f\xf8\x00\x00\x00\x00\x00\x00h\xc7h\xcduNt\x94b\x8c\x06_dtype\x94h\xcdubh\xd0h\xd2)\x81\x94}\x94h\xd5(h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94h\xd8NNN\x87\x94R\x94t\x94sbub\x8c\x0bindexer_cls\x94h\x00\x8c\x0cOuterIndexer\x94\x93\x94ub.'
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 273, in __setstate__
    self.__init__(*state)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 106, in __init__
    if contains_array(store, path=self._path):
  File "/opt/conda/lib/python3.6/site-packages/zarr/storage.py", line 72, in contains_array
    return key in store
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/mapping.py", line 87, in __contains__
    return self.gcs.exists(self._key_to_str(key))
  File "<decorator-gen-16>", line 2, in exists
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 795, in exists
    return bool(self.info(path))
  File "<decorator-gen-17>", line 2, in info
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 829, in info
    return self._get_object(path)
  File "<decorator-gen-3>", line 2, in _get_object
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 519, in _get_object
    bucket, key))
  File "<decorator-gen-2>", line 2, in _call
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 452, in _call
    meth = getattr(self.session, method)
AttributeError: 'NoneType' object has no attribute 'get'
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/core.py", line 119, in loads
    value = _deserialize(head, fs)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 158, in deserialize
    return f(header, frames)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 20, in <lambda>
    deserializers = {None: lambda header, frames: pickle.loads(b''.join(frames))}
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 273, in __setstate__
    self.__init__(*state)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 106, in __init__
    if contains_array(store, path=self._path):
  File "/opt/conda/lib/python3.6/site-packages/zarr/storage.py", line 72, in contains_array
    return key in store
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/mapping.py", line 87, in __contains__
    return self.gcs.exists(self._key_to_str(key))
  File "<decorator-gen-16>", line 2, in exists
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 795, in exists
    return bool(self.info(path))
  File "<decorator-gen-17>", line 2, in info
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 829, in info
    return self._get_object(path)
  File "<decorator-gen-3>", line 2, in _get_object
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 519, in _get_object
    bucket, key))
  File "<decorator-gen-2>", line 2, in _call
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 452, in _call
    meth = getattr(self.session, method)
AttributeError: 'NoneType' object has no attribute 'get'
distributed.worker - ERROR - Worker failed to read message. This will likely cause the cluster to fail.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 1178, in compute_stream
    msgs = yield comm.read()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 203, in read
    msg = yield from_frames(frames, deserialize=self.deserialize)
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 315, in wrapper
    yielded = next(result)
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 75, in from_frames
    res = _from_frames()
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 61, in _from_frames
    return protocol.loads(frames, deserialize=deserialize)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/core.py", line 119, in loads
    value = _deserialize(head, fs)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 158, in deserialize
    return f(header, frames)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 20, in <lambda>
    deserializers = {None: lambda header, frames: pickle.loads(b''.join(frames))}
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 273, in __setstate__
    self.__init__(*state)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 106, in __init__
    if contains_array(store, path=self._path):
  File "/opt/conda/lib/python3.6/site-packages/zarr/storage.py", line 72, in contains_array
    return key in store
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/mapping.py", line 87, in __contains__
    return self.gcs.exists(self._key_to_str(key))
  File "<decorator-gen-16>", line 2, in exists
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 795, in exists
    return bool(self.info(path))
  File "<decorator-gen-17>", line 2, in info
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 829, in info
    return self._get_object(path)
  File "<decorator-gen-3>", line 2, in _get_object
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 519, in _get_object
    bucket, key))
  File "<decorator-gen-2>", line 2, in _call
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 452, in _call
    meth = getattr(self.session, method)
AttributeError: 'NoneType' object has no attribute 'get'
distributed.worker - ERROR - 'NoneType' object has no attribute 'get'
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 1178, in compute_stream
    msgs = yield comm.read()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 203, in read
    msg = yield from_frames(frames, deserialize=self.deserialize)
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 315, in wrapper
    yielded = next(result)
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 75, in from_frames
    res = _from_frames()
  File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 61, in _from_frames
    return protocol.loads(frames, deserialize=deserialize)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/core.py", line 119, in loads
    value = _deserialize(head, fs)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 158, in deserialize
    return f(header, frames)
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 20, in <lambda>
    deserializers = {None: lambda header, frames: pickle.loads(b''.join(frames))}
  File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 273, in __setstate__
    self.__init__(*state)
  File "/opt/conda/lib/python3.6/site-packages/zarr/hierarchy.py", line 106, in __init__
    if contains_array(store, path=self._path):
  File "/opt/conda/lib/python3.6/site-packages/zarr/storage.py", line 72, in contains_array
    return key in store
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/mapping.py", line 87, in __contains__
    return self.gcs.exists(self._key_to_str(key))
  File "<decorator-gen-16>", line 2, in exists
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 795, in exists
    return bool(self.info(path))
  File "<decorator-gen-17>", line 2, in info
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 829, in info
    return self._get_object(path)
  File "<decorator-gen-3>", line 2, in _get_object
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 519, in _get_object
    bucket, key))
  File "<decorator-gen-2>", line 2, in _call
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/gcsfs/core.py", line 452, in _call
    meth = getattr(self.session, method)
AttributeError: 'NoneType' object has no attribute 'get'
distributed.worker - INFO - Stopping worker at tcp://10.20.181.7:36747
distributed.nanny - INFO - Closing Nanny at 'tcp://10.20.181.7:45626'
distributed.dask_worker - INFO - End worker

@rabernat
Copy link
Member Author

Ok, false alarm.

I discovered that there are different versions of gcsfs on notebook and worker. I'm guessing this was the cause of the problem. The worker has 0.1.0, which is what is installed explicitly from pip in both notebook and worker dockerfiles, but my notebook had 0.0.5. This is because I had an old user-space install of gcsfs. Oops.

Unless anyone else has a similar problem, I'm pretty sure this was unique to me.

@dcherian
Copy link
Contributor

dcherian commented Jun 16, 2019

I'm having a very similar issue on ocean.pangeo.io.

I'm trying to use map_blocks to make a plot of each timestep of LLC4320 SST (so zarrr). (I extracted a small amount of data to natl-sub.zarr (20x800x700) as a test; and to make sure that I wasn't having memory issues).

It takes ages for the scheduler to map a task on 20 blocks. And then the cluster seems to died with no feedback on the diagnostics page. In the notebook I see

KilledWorker: ('zarr-a98aa4894e593ded18a80b1aa589d07d', <Worker 'tcp://10.32.11.3:41625', memory: 0, processing: 1>)

I found

pods = cluster.pods()
print(cluster.logs(pods[0]))

on another issue here and that showed me the following log:

distributed.nanny - INFO -         Start Nanny at: 'tcp://10.32.11.3:40661'
distributed.worker - INFO -       Start worker at:     tcp://10.32.11.3:41625
distributed.worker - INFO -          Listening to:     tcp://10.32.11.3:41625
distributed.worker - INFO -              nanny at:           10.32.11.3:40661
distributed.worker - INFO - Waiting to connect to:    tcp://10.32.1.220:37581
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          2
distributed.worker - INFO -                Memory:                   11.50 GB
distributed.worker - INFO -       Local Directory: /home/jovyan/worker-5_ugem2a
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:    tcp://10.32.1.220:37581
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x04\x95l\x02\x00\x00\x00\x00\x00\x00\x8c\x14xarray.core.indexing\x94\x8c!ImplicitToExplicitIndexingAdapter\x94\x93\x94)\x81\x94}\x94(\x8c\x05array\x94h\x00\x8c\x17LazilyOuterIndexedArray\x94\x93\x94)\x81\x94}\x94(h\x05\x8c\x14xarray.backends.zarr\x94\x8c\x10ZarrArrayWrapper\x94\x93\x94)\x81\x94}\x94(\x8c\tdatastore\x94h\n\x8c\tZarrStore\x94\x93\x94)\x81\x94}\x94(\x8c\x02ds\x94\x8c\x0ezarr.hierarchy\x94\x8c\x05Group\x94\x93\x94)\x81\x94(\x8c\x0czarr.storage\x94\x8c\x0eDirectoryStore\x94\x93\x94)\x81\x94}\x94\x8c\x04path\x94\x8c\x1a/home/jovyan/natl-sub.zarr\x94sb\x8c\x00\x94\x88N\x88Nt\x94b\x8c\n_read_only\x94\x88\x8c\r_synchronizer\x94N\x8c\x06_group\x94h \x8c\x15_consolidate_on_close\x94\x89ub\x8c\rvariable_name\x94\x8c\x03SST\x94\x8c\x05shape\x94K\x14M \x03M\xbc\x02\x87\x94\x8c\x05dtype\x94\x8c\x05numpy\x94\x8c\x05dtype\x94\x93\x94\x8c\x02f4\x94K\x00K\x01\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bub\x8c\x03key\x94h\x00\x8c\x0cBasicIndexer\x94\x93\x94)\x81\x94}\x94\x8c\x04_key\x94\x8c\x08builtins\x94\x8c\x05slice\x94\x93\x94NNN\x87\x94R\x94h;NNN\x87\x94R\x94h;NNN\x87\x94R\x94\x87\x94sbub\x8c\x0bindexer_cls\x94h\x00\x8c\x0cOuterIndexer\x94\x93\x94ub.'
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.6/site-packages/zarr/hierarchy.py", line 111, in __init__
    meta_bytes = store[mkey]
  File "/srv/conda/envs/notebook/lib/python3.6/site-packages/zarr/storage.py", line 728, in __getitem__
    raise KeyError(key)
KeyError: '.zgroup'

@rabernat Any ideas on what to do here?

@TomAugspurger
Copy link
Member

@dcherian can you do a basic operation like .blocks[0].compute() (which should be small) on the Dask Array? If even that fails, it suggests that the block is too large for your worker instances.

@rabernat
Copy link
Member Author

@dcherian - these issues are hard to debug. If you want help, it's best to post a fully reproducible example. We all have access to the same data and environment on ocean.pangeo.io, so it's relatively easy for someone else to look into the problem.

@dcherian
Copy link
Contributor

Thanks @rabernat & @TomAugspurger .blocks[0].compute is awesome for debugging.

It turns out that my issue stemmed from trying to pass an xarray Dataset.coords object to the function I was providing to map_blocks. Using Dataset.coords.to_dataset() worked really well. I guess this error was a bit of a red herring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants