Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leakage on code that was fine using a previous version #1945

Closed
simone-codeluppi opened this issue Apr 27, 2018 · 7 comments
Closed

Comments

@simone-codeluppi
Copy link

simone-codeluppi commented Apr 27, 2018

Hi

Intro to the error
I have a list of images saved as .npy. The numpy array occupy 300Mb when loaded in memory. I use client.map() to distribute the list of paths of the images to different workers. Therefore each worker has a small list of path to different images. Each worker load one image at the time, run some filtering and save the output as .npy file before loading the next image.

packages :

distributed               1.21.6                   py36_0    conda-forge
dask                      0.17.2                     py_0    conda-forge
dask-core                 0.17.2                   py36_0
tornado                   5.0.2                    py36_0

Full list pasted below

Error:

distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.  Perhaps some other process is leaking memory?  Process memory: 3.04 GB -- Worker memory limit: 4.29 GB

distributed.nanny - WARNING - Worker exceeded 95% memory budget.  Restarting
tornado.application - ERROR - Exception in callback <bound method Nanny.memory_monitor of <Nanny: tcp://127.0.0.1:59274, threads: 1>>
Traceback (most recent call last):
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_psosx.py", line 348, in catch_zombie
    yield
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_psosx.py", line 387, in _get_pidtaskinfo
    ret = cext.proc_pidtaskinfo_oneshot(self.pid)
ProcessLookupError: [Errno 3] No such process

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/tornado/ioloop.py", line 1208, in _run
    return self.callback()
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/distributed/nanny.py", line 262, in memory_monitor
    memory = proc.memory_info().rss
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_common.py", line 337, in wrapper
    return fun(self)
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/__init__.py", line 1048, in memory_info
    return self._proc.memory_info()
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_psosx.py", line 330, in wrapper
    return fun(self, *args, **kwargs)
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_psosx.py", line 456, in memory_info
    rawtuple = self._get_pidtaskinfo()
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_common.py", line 337, in wrapper
    return fun(self)
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_psosx.py", line 387, in _get_pidtaskinfo
    ret = cext.proc_pidtaskinfo_oneshot(self.pid)
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/simcod/anaconda3/envs/latest_env/lib/python3.6/site-packages/psutil/_psosx.py", line 361, in catch_zombie
    raise AccessDenied(proc.pid, proc._name)
psutil._exceptions.AccessDenied: psutil.AccessDenied (pid=10678)

I have been running the same code on the same images on the same computer and on another computer before and I didn’t get a memory error. If I am not wrong the images should not stay in memory after processing. If I change the number of cores I still get the error.

I got this error when i tried a fresh installation on a new computer.

Thanks for the help!
Simone

This are the settings of the conda env for the computer where the codes work:

bkcharts                  0.2                      py36_0
bokeh                     0.12.7                   py36_0
bzip2                     1.0.6                    vc14_1  [vc14]  conda-forge
ca-certificates           2017.11.5                     0    conda-forge
certifi                   2016.2.28                py36_0
click                     6.7                      py36_0
cloudpickle               0.4.0                    py36_0
cycler                    0.10.0                   py36_0
dask                      0.15.2                   py36_0
decorator                 4.1.2                    py36_0
distributed               1.18.1                   py36_0
freetype                  2.5.5                    vc14_2  [vc14]
h5py                      2.7.1                     <pip>
heapdict                  1.0.0                    py36_1
icu                       58.2                     vc14_0  [vc14]  conda-forge
jinja2                    2.9.6                    py36_0
jpeg                      9b                       vc14_2  [vc14]  conda-forge
libpng                    1.6.34                   vc14_0  [vc14]  conda-forge
libtiff                   4.0.6                    vc14_3  [vc14]
locket                    0.2.0                    py36_1
loompy                    1.1.0                     <pip>
markupsafe                1.0                      py36_0
matplotlib                2.0.2               np113py36_0
mkl                       2017.0.3                      0
mpmath                    0.19                     py36_1
msgpack-python            0.4.8                    py36_0
nd2reader                 2.1.3                     <pip>
networkx                  1.11                     py36_0
numpy                     1.13.1                   py36_0
olefile                   0.44                     py36_0
openssl                   1.0.2m                   vc14_0  [vc14]  conda-forge
pandas                    0.20.3                   py36_0
partd                     0.3.8                    py36_0
pillow                    4.2.1                    py36_0
pip                       9.0.1                    py36_1
psutil                    5.2.2                    py36_0
pyparsing                 2.2.0                    py36_0
pyqt                      5.6.0                    py36_2
python                    3.6.2                         0
python-dateutil           2.6.1                    py36_0
pytz                      2017.2                   py36_0
pywavelets                0.5.2               np113py36_0
pyyaml                    3.12                     py36_0
requests                  2.14.2                   py36_0
scikit-image              0.13.0              np113py36_0
scipy                     0.19.1              np113py36_0
setuptools                36.4.0                   py36_1
sip                       4.18                     py36_0
six                       1.10.0                   py36_0
sortedcontainers          1.5.7                    py36_0
sympy                     1.1.1                    py36_0
tblib                     1.3.2                    py36_0
toolz                     0.8.2                    py36_0
tornado                   4.5.2                    py36_0
typing                    3.6.2                     <pip>
vc                        14                            0
vs2015_runtime            14.0.25420                    0
wheel                     0.29.0                   py36_0
wincertstore              0.2                      py36_0
xmltodict                 0.11.0                    <pip>
zict                      0.1.2                    py36_0
zlib                      1.2.11                   vc14_0  [vc14]  conda-forge

This are the setting of the env where the code breaks

# Name                    Version                   Build  Channel
alabaster                 0.7.10                    <pip>
appnope                   0.1.0                    py36_0    conda-forge
Babel                     2.5.3                     <pip>
backcall                  0.1.0                      py_0    conda-forge
blas                      1.1                    openblas    conda-forge
bleach                    2.1.3                      py_0    conda-forge
bokeh                     0.12.15                  py36_0
ca-certificates           2018.4.16                     0    conda-forge
certifi                   2018.4.16                py36_0    conda-forge
chardet                   3.0.4                     <pip>
click                     6.7              py36hec950be_0
cloudpickle               0.5.2                    py36_1
cycler                    0.10.0           py36hfc81398_0
cytoolz                   0.9.0.1          py36h1de35cc_0
dask                      0.17.2                     py_0    conda-forge
dask-core                 0.17.2                   py36_0
decorator                 4.3.0                    py36_0
distributed               1.21.6                   py36_0    conda-forge
docutils                  0.14                      <pip>
entrypoints               0.2.3                    py36_1    conda-forge
freetype                  2.8                  h12048fb_1
h5py                      2.7.1            py36h39cdac5_0
hdf5                      1.10.1               ha036c08_1
heapdict                  1.0.0                    py36_2
html5lib                  1.0.1                      py_0    conda-forge
idna                      2.6                       <pip>
imageio                   2.3.0                    py36_0
imagesize                 1.0.0                     <pip>
intel-openmp              2018.0.0                      8
ipykernel                 4.8.2                    py36_0    conda-forge
ipympl                    0.1.1                    py36_0    conda-forge
ipython                   6.3.1                    py36_0    conda-forge
ipython_genutils          0.2.0                    py36_0    conda-forge
ipywidgets                7.2.1                    py36_1    conda-forge
jedi                      0.12.0                   py36_0    conda-forge
jinja2                    2.10             py36hd36f9c5_0
jpeg                      9b                   he5867d9_2
jsonschema                2.6.0                    py36_1    conda-forge
jupyter_client            5.2.3                    py36_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
jupyterlab                0.32.0                   py36_1    conda-forge
jupyterlab_launcher       0.10.5                   py36_0    conda-forge
kiwisolver                1.0.1            py36h792292d_0
libcxx                    4.0.1                h579ed51_0
libcxxabi                 4.0.1                hebd6815_0
libedit                   3.1                  hb4e282d_0
libffi                    3.2.1                h475c297_4
libgfortran               3.0.1                h93005f0_2
libpng                    1.6.34               he12f830_0
libsodium                 1.0.16                        0    conda-forge
libtiff                   4.0.9                h0dac147_0
locket                    0.2.0            py36hca03003_1
markupsafe                1.0              py36h3a1e703_1
matplotlib                2.2.2            py36ha7267d0_0
mistune                   0.8.3                    py36_1    conda-forge
mkl                       2018.0.2                      1
mkl_fft                   1.0.1            py36h917ab60_0
mkl_random                1.0.1            py36h78cc56f_0
mpi4py                    3.0.0                     <pip>
mpmath                    1.0.0                     <pip>
msgpack-python            0.5.6            py36h04f5b5a_0
nbconvert                 5.3.1                      py_1    conda-forge
nbformat                  4.4.0                    py36_0    conda-forge
ncurses                   6.0                  hd04f020_2
nd2reader                 2.1.3                     <pip>
networkx                  2.1                      py36_0
nodejs                    9.11.1                        0    conda-forge
notebook                  5.4.1                    py36_0    conda-forge
numpy                     1.14.2          py36_blas_openblas_200  [blas_openblas]  conda-forge
olefile                   0.45.1                   py36_0
openblas                  0.2.20                        7    conda-forge
openssl                   1.0.2o                        0    conda-forge
packaging                 17.1                     py36_0
pandas                    0.22.0           py36h0a44026_0
pandoc                    2.1.3                         0    conda-forge
pandocfilters             1.4.2                    py36_0    conda-forge
parso                     0.2.0                      py_0    conda-forge
partd                     0.3.8            py36hf5c4cb8_0
pexpect                   4.5.0                    py36_0    conda-forge
pickleshare               0.7.4                    py36_0    conda-forge
pillow                    5.1.0            py36hfcce615_0
pip                       10.0.1                    <pip>
pip                       9.0.3                    py36_0
pkginfo                   1.4.2                     <pip>
prompt_toolkit            1.0.15                   py36_0    conda-forge
psutil                    5.4.5            py36h1de35cc_0
ptyprocess                0.5.2                    py36_0    conda-forge
pygments                  2.2.0                    py36_0    conda-forge
pyparsing                 2.2.0            py36hb281f35_0
python                    3.6.5                hc167b69_1
python-dateutil           2.7.2                    py36_0
pytz                      2018.4                   py36_0
pywavelets                0.5.2            py36h2710a04_0
pyyaml                    3.12             py36h2ba1e63_1
pyzmq                     17.0.0                   py36_4    conda-forge
readline                  7.0                  hc1231fa_4
requests                  2.18.4                    <pip>
requests-toolbelt         0.8.0                     <pip>
ruamel.yaml               0.15.37                   <pip>
scikit-image              0.13.1           py36h1de35cc_1
scikit-learn              0.19.1          py36_blas_openblas_201  [blas_openblas]  conda-forge
scipy                     1.0.1           py36_blas_openblas_200  [blas_openblas]  conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                39.0.1                   py36_0
simplegeneric             0.8.1                    py36_0    conda-forge
six                       1.11.0           py36h0e22d5e_1
snowballstemmer           1.2.1                     <pip>
sortedcontainers          1.5.10                   py36_0
Sphinx                    1.7.4                     <pip>
sphinx-rtd-theme          0.3.0                     <pip>
sphinxcontrib-websupport  1.0.1                     <pip>
sqlite                    3.23.1               hf1716c9_0
sympy                     1.1.1                     <pip>
tblib                     1.3.2            py36hda67792_0
terminado                 0.8.1                    py36_0    conda-forge
testpath                  0.3.1                    py36_0    conda-forge
tk                        8.6.7                h35a86e2_3
toolz                     0.9.0                    py36_0
tornado                   5.0.2                    py36_0
tqdm                      4.23.1                    <pip>
traitlets                 4.3.2                    py36_0    conda-forge
twine                     1.11.0                    <pip>
urllib3                   1.22                      <pip>
wcwidth                   0.1.7                    py36_0    conda-forge
webencodings              0.5.1                    py36_0    conda-forge
wheel                     0.31.0                   py36_0
widgetsnbextension        3.2.1                    py36_0    conda-forge
xmltodict                 0.11.0                    <pip>
xz                        5.2.3                h727817e_4
yaml                      0.1.7                hc338f04_2
zeromq                    4.2.5                         1    conda-forge
zict                      0.1.3            py36h71da714_0
zlib                      1.2.11               hf3cbc9b_2
@mrocklin
Copy link
Member

Hi @simone-codeluppi thank you for submitting an issue. If you are able to produce a minimal reproducible example that generates the memory leak that you're observing that would be very helpful.

@simone-codeluppi
Copy link
Author

Hi
I made a simplified version of the code where the images are loaded and filtered. I uploaded some images in a zip file on my google drive in order to have a real case scenario (link to files). The script requires distributed, skimage, numpy argparse glob. Sorry if it is still slightly complicate! and thanks for the help!

The script is :

import numpy as np
import glob
from distributed import Client
from skimage import img_as_float,filters
import argparse

def load_img(img_path):
    img_stack = np.load(img_path)
    img_stack = img_as_float(img_stack)
    # Clean the image from the background
    img_stack = img_stack-filters.gaussian(img_stack,sigma=(2,100,100))
    # Remove the negative values        
    img_stack[img_stack<0] = 0
    # Flatten the image
    filtered_image = np.amax(img_stack,axis=0)
    # in the real code I save the filtered_image in as .npy in a folder

def test_issue():
    parser = argparse.ArgumentParser(description='test code')
    parser.add_argument('-path', help='path to the files')
    args = parser.parse_args()
    fdir = args.path
    files_list = glob.glob(fdir+'*.npy')

    # Start the distributed client
    ncores = 2
    client = Client(n_workers = ncores, threads_per_worker=1)

    futures_processes=client.map(load_img,files_list)

    client.gather(futures_processes)

    client.close()


if __name__ == "__main__":
    test_issue()

save the script as test_issue.py and run it by python test_issue.py -path path-to-directory

@mrocklin
Copy link
Member

mrocklin commented Apr 28, 2018 via email

@simone-codeluppi
Copy link
Author

Hi,
sorry about that!
I have been running some extra tests on my laptop. The image filtering step img_stack = img_stack-filters.gaussian(img_stack,sigma=(2,100,100)) is needed in order to reproduce the error and can be mimicked by

img_stack2= np.copy(img_stack) 
img_stack = img_stack-img_stack2'

and copying the array is not enough.

I think that the issue has nothing to do with distributed but with the memory in my computer. I think the few extra arrays generated during the filtering fill up the RAM assigned to the core. I will test it again using the cluster we have in the lab with more RAM.
I am just puzzled because the same code previously processed the same dataset (on 2 cores) without issues. The only thing I noticed is that no matter if i use 1-2 or 3 cores the max memory/worker is 4.29Gb (~ 1/4 of the total available RAM).

I apologise for wasting your time.

FYI: this is what we are using distributed for: http://linnarssonlab.org/osmFISH/

Here is the testing code I used:

import numpy as np
from distributed import Client
from skimage import img_as_float,filters
import argparse

def images_generator(fdir):
        Img = np.random.randint(0,65535,(40,2048,2048),dtype=np.uint16)
        files_list = []
        # In order to repeat the error you need @ least 4 images
        for i in np.arange(1,5):
            img_name = fdir+str(i)+'.npy'
            np.save(img_name,Img)
            files_list.append(img_name)
        return files_list

def load_img(img_path):
    img_stack = np.load(img_path)
    img_stack = img_as_float(img_stack)
    # # Clean the image from the background
    # img_stack = img_stack-filters.gaussian(img_stack,sigma=(2,100,100))
    img_stack2= np.copy(img_stack)
    img_stack = img_stack-img_stack2

def test_issue():
    parser = argparse.ArgumentParser(description='test code')
    parser.add_argument('-path', help='path to the files')
    args = parser.parse_args()
    fdir = args.path
    # Create the img set
    files_list = images_generator(fdir)
    # Start the distributed client
    ncores = 2
    client = Client(n_workers = ncores, threads_per_worker=1)
    futures_processes=client.map(load_img,files_list)
    client.gather(futures_processes)
    client.close()


if __name__ == "__main__":
    test_issue()

@mrocklin
Copy link
Member

So perhaps you were close to your memory limit before, but now some internal change in scheduling is keeping around a few more intermediate variables, which puts you over the limit? You might consider using the diagnostic dashboard to help understand your memory performance over time. This might help to provide some insight.

My experience has been that many skimage functions use a lot of memory during execution (several times their input size). So if your images are quite large then presumably having a few of these functions running in parallel might cause problems. Just a thought though, I recommend watching the dashboard of the scheduler and perhaps one of the workers during execution to try to get a sense of what is going on.

@mrocklin
Copy link
Member

Regarding http://linnarssonlab.org/osmFISH/ that looks like a very cool project.

You might be interested in this upcoming sprint on scaling scikit-image: https://scisprints.github.io/#may2-june-joint-scikit-learn-scikit-image-dask-sprint

And in particular I'd like to encourage you to engage on this issue asking for challenge problems in imaging: scisprints/2018_05_sklearn_skimage_dask#2

@simone-codeluppi
Copy link
Author

Great!
Thanks a lot for the help!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants