-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leakage on code that was fine using a previous version #1945
Comments
Hi @simone-codeluppi thank you for submitting an issue. If you are able to produce a minimal reproducible example that generates the memory leak that you're observing that would be very helpful. |
Hi The script is : import numpy as np
import glob
from distributed import Client
from skimage import img_as_float,filters
import argparse
def load_img(img_path):
img_stack = np.load(img_path)
img_stack = img_as_float(img_stack)
# Clean the image from the background
img_stack = img_stack-filters.gaussian(img_stack,sigma=(2,100,100))
# Remove the negative values
img_stack[img_stack<0] = 0
# Flatten the image
filtered_image = np.amax(img_stack,axis=0)
# in the real code I save the filtered_image in as .npy in a folder
def test_issue():
parser = argparse.ArgumentParser(description='test code')
parser.add_argument('-path', help='path to the files')
args = parser.parse_args()
fdir = args.path
files_list = glob.glob(fdir+'*.npy')
# Start the distributed client
ncores = 2
client = Client(n_workers = ncores, threads_per_worker=1)
futures_processes=client.map(load_img,files_list)
client.gather(futures_processes)
client.close()
if __name__ == "__main__":
test_issue() save the script as |
If you're able to create an example that doesn't require downloading data
that would be ideal. See
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
…On Fri, Apr 27, 2018 at 4:21 PM, simone-codeluppi ***@***.***> wrote:
Hi
I made a simplified version of the code where the images are loaded and
filtered. I uploaded some images in a zip file on my google drive in order
to have a real case scenario (link to files).
<https://drive.google.com/file/d/1_wprhx0vfDpwBr-jr6u4RQe5EUNqxrDc/view?usp=sharing>
The script requires distributed, skimage, numpy argparse glob. Sorry if
it is still slightly complicate! and thanks for the help!
The script is :
import numpy as npimport globfrom distributed import Clientfrom skimage import img_as_float,filtersimport argparse
def load_img(img_path):
img_stack = np.load(img_path)
img_stack = img_as_float(img_stack)
# Clean the image from the background
img_stack = img_stack-filters.gaussian(img_stack,sigma=(2,100,100))
# Remove the negative values
img_stack[img_stack<0] = 0
# Flatten the image
filtered_image = np.amax(img_stack,axis=0)
# in the real code I save the filtered_image in as .npy in a folder
def test_issue():
parser = argparse.ArgumentParser(description='test code')
parser.add_argument('-path', help='path to the files')
args = parser.parse_args()
fdir = args.path
files_list = glob.glob(fdir+'*.npy')
# Start the distributed client
ncores = 2
client = Client(n_workers = ncores, threads_per_worker=1)
futures_processes=client.map(load_img,files_list)
client.gather(futures_processes)
client.close()
if __name__ == "__main__":
test_issue()
save the script as test_issue.py and run it by python test_issue.py -path
path-to-directory
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1945 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszL9Qz0wQj4-dZobqbJovcI9Z5x4gks5ts33ggaJpZM4TqwCv>
.
|
Hi,
and copying the array is not enough. I think that the issue has nothing to do with distributed but with the memory in my computer. I think the few extra arrays generated during the filtering fill up the RAM assigned to the core. I will test it again using the cluster we have in the lab with more RAM. I apologise for wasting your time. FYI: this is what we are using distributed for: http://linnarssonlab.org/osmFISH/ Here is the testing code I used: import numpy as np
from distributed import Client
from skimage import img_as_float,filters
import argparse
def images_generator(fdir):
Img = np.random.randint(0,65535,(40,2048,2048),dtype=np.uint16)
files_list = []
# In order to repeat the error you need @ least 4 images
for i in np.arange(1,5):
img_name = fdir+str(i)+'.npy'
np.save(img_name,Img)
files_list.append(img_name)
return files_list
def load_img(img_path):
img_stack = np.load(img_path)
img_stack = img_as_float(img_stack)
# # Clean the image from the background
# img_stack = img_stack-filters.gaussian(img_stack,sigma=(2,100,100))
img_stack2= np.copy(img_stack)
img_stack = img_stack-img_stack2
def test_issue():
parser = argparse.ArgumentParser(description='test code')
parser.add_argument('-path', help='path to the files')
args = parser.parse_args()
fdir = args.path
# Create the img set
files_list = images_generator(fdir)
# Start the distributed client
ncores = 2
client = Client(n_workers = ncores, threads_per_worker=1)
futures_processes=client.map(load_img,files_list)
client.gather(futures_processes)
client.close()
if __name__ == "__main__":
test_issue() |
So perhaps you were close to your memory limit before, but now some internal change in scheduling is keeping around a few more intermediate variables, which puts you over the limit? You might consider using the diagnostic dashboard to help understand your memory performance over time. This might help to provide some insight. My experience has been that many skimage functions use a lot of memory during execution (several times their input size). So if your images are quite large then presumably having a few of these functions running in parallel might cause problems. Just a thought though, I recommend watching the dashboard of the scheduler and perhaps one of the workers during execution to try to get a sense of what is going on. |
Regarding http://linnarssonlab.org/osmFISH/ that looks like a very cool project. You might be interested in this upcoming sprint on scaling scikit-image: https://scisprints.github.io/#may2-june-joint-scikit-learn-scikit-image-dask-sprint And in particular I'd like to encourage you to engage on this issue asking for challenge problems in imaging: scisprints/2018_05_sklearn_skimage_dask#2 |
Great! |
Hi
Intro to the error
I have a list of images saved as
.npy
. The numpy array occupy 300Mb when loaded in memory. I useclient.map()
to distribute the list of paths of the images to different workers. Therefore each worker has a small list of path to different images. Each worker load one image at the time, run some filtering and save the output as .npy file before loading the next image.packages :
Error:
I have been running the same code on the same images on the same computer and on another computer before and I didn’t get a memory error. If I am not wrong the images should not stay in memory after processing. If I change the number of cores I still get the error.
I got this error when i tried a fresh installation on a new computer.
Thanks for the help!
Simone
This are the settings of the conda env for the computer where the codes work:
This are the setting of the env where the code breaks
The text was updated successfully, but these errors were encountered: