Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxsize not being respected for process.map #55

Closed
robdmc opened this issue Sep 22, 2020 · 3 comments · Fixed by #59
Closed

maxsize not being respected for process.map #55

robdmc opened this issue Sep 22, 2020 · 3 comments · Fixed by #59

Comments

@robdmc
Copy link

robdmc commented Sep 22, 2020

Hello.
First of all. Let me just say that you changed my world yesterday when I found pypeln. I've wanted exactly this for a very long time. Thank you for writing it!!

Since I'm a brand new user, I might be misunderstanding, but I think I may have found a bug. I am running the following

  • conda python 3.6.8
  • pypeln==0.4.4
  • Running in Jupyter Lab with the following installed to view progress bars
pip install ipywidgets
jupyter labextension install @jupyter-widgets/jupyterlab-manager

Here is the code I am running

from tqdm.auto import tqdm
import pypeln as pyp
import time

in_list = list(range(300))
bar1 = tqdm(total=len(in_list), desc='stage1')
bar2 = tqdm(total=len(in_list), desc='stage2')
bar3 = tqdm(total=len(in_list), desc='stage3')

def func1(x):
    time.sleep(.01)
    bar1.update()
    return x

def func2(x):
    time.sleep(.2)
    return x
    
def func2_monitor(x):
    bar2.update()
    return x
    
def func3(x):
    time.sleep(.6)
    bar3.update()
    return x

(
    in_list
    | pyp.thread.map(func1, maxsize=1, workers=1)
    | pyp.process.map(func2, maxsize=1, workers=2)
    | pyp.thread.map(func2_monitor, maxsize=1, workers=1)
    | pyp.thread.map(func3, maxsize=1, workers=1)
    | list
    
);

This code runs stages while showing progress bars of when each node has processed data. Here is what I am seeing.

Screen Shot 2020-09-22 at 11 30 30 AM

It appears that the first stage is consuming the entire source without respecting the maxsize argument. If this is expected behavior, I would like to understand more.

Thank you.

@robdmc robdmc changed the title maxsize not being respected for processs.map maxsize not being respected for process.map Sep 22, 2020
@robdmc robdmc changed the title maxsize not being respected for process.map maxsize not being respected for process.map Sep 22, 2020
@cgarciae
Copy link
Owner

Hey @robdmc !

Sorry for the late response, for some reason I overlooked this issue. I will look into it, thanks for the detailed example!

@cgarciae
Copy link
Owner

cgarciae commented Oct 11, 2020

I think I found the culprit, when converting from one stage type to another there is an internal use of .to_iterable that wasn't taking into account the possibility of having a maxsize.

@cgarciae
Copy link
Owner

@robdmc Fixed in version 0.4.6, please update :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants