Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: wrapper for run and run_thread #5

Closed
metaist opened this issue May 5, 2023 · 9 comments
Closed

add: wrapper for run and run_thread #5

metaist opened this issue May 5, 2023 · 9 comments

Comments

@metaist
Copy link
Owner

metaist commented May 5, 2023

Given #4, long function names like endq_and_wait feel weird.

@metaist
Copy link
Owner Author

metaist commented May 5, 2023

What's the most common situation I find myself in:

Step 1: Create Queues
I usually make 1-3 queues.

  • 1 queue means I have a bunch of things that need to be transformed (typically files or names of things) and the output is going to be to disk in a way that no two processes are going to step on each other.
  • 2 queues is the most common. It usually means I have an input queue and an output queue and I'm going to want the results at the end.
  • 3 queues means I have two kinds of processing I'm doing, usually at different speeds (incidentally, this might be another case for add: support for threading #1: one queue for the processor-based work, one for the thread-based work, one for output).

Step 2: Create Workers
Currently, these are always lists of Process objects. If I have a many-readers-one-writer situation, then I might have a list of readers and one writer.

Step 3: Put Messages
Putting messages on the input queue gets the ball rolling.

Step 4: Wait
This is where I put a bunch of end messages on the queue (the same number as the number of workers) and wait for the workers to finish. Each worker that reads the end message stops processing and returns.

(Optional) Step 5: Post-processing
If the queues have been generating data to the output queue, here's where I'd do something with that data.

@metaist
Copy link
Owner Author

metaist commented May 5, 2023

In many ways this issue is connected to #2 because this wrapper would probably also be a contextmanager.

@metaist
Copy link
Owner Author

metaist commented May 5, 2023

What might this wrapper look like:

q = ezq.Q()
with ProcessPool(f, q, other_arg, size=3) as workers:
    # something

Yeah, not loving it. The problem is that you'd have to have the input queue as the first arg by convention and it's a little too magical.

@metaist
Copy link
Owner Author

metaist commented May 5, 2023

Meh, I guess I'll live with the long name until I come up with a thing that properly unites the workers with the input queue.

@metaist metaist closed this as completed May 5, 2023
@metaist
Copy link
Owner Author

metaist commented May 5, 2023

Heh. I actually thought of a way to add this to #4: have the queue wrapper wait for workers.

q.wait(workers)

@metaist
Copy link
Owner Author

metaist commented May 9, 2023

Another thing I tried, but discarded:

class Worker:
    worker: Union[Process, Thread]

    def __init__(self, func, *args, thread=False, **kwargs):
        if thread:
            self.worker = Thread(daemon=True, target=func, args=args, kwargs=kwargs)
        else:
            self.worker = Process(daemon=True, target=func, args=args, kwargs=kwargs)
        self.worker.start()

    def __getattr__(self, name: str) -> Any:
        return getattr(self.worker, name)

    def __enter__(self):
        return self

    def __exit__(self, *_):
        self.worker.join()

I may eventually make a wrapper like this, but I don't really see the benefit at the moment. There's not much value beyond the actual Process and Thread this class wraps. And the contextlib / with benefit is non-existent since you need some way to actually notify the worker to stop before you join.

@metaist
Copy link
Owner Author

metaist commented May 11, 2023

So now I think the value of this comes from having a level of indirection that is easier to manipulate in the future.

Perhaps I should name the class Task and rename "worker" to "task" everywhere. It's a little bit generic, but I'll test out how well the wording works.

@metaist metaist reopened this May 11, 2023
@metaist
Copy link
Owner Author

metaist commented May 17, 2023

I've been debating the name and chose to ask GPT-4.


metaist: For this library, I'm thinking of wrapping run and run_thread in a new class. I'm considering the names Worker and Task. Please provide pros and cons for each.

GPT: Sure, let's explore the pros and cons of each name:

  1. Worker:

Pros:

  • Industry Standard: The term Worker is widely recognized in parallel computing and is often used to describe a separate execution context that performs tasks concurrently. It could be a separate thread or process.
  • Intuitive: The term Worker communicates that it is responsible for executing some workload or task, which can make the code easier to understand.

Cons:

  • Ambiguity: In some contexts, Worker might be associated with other concepts, such as worker nodes in a distributed system, which could lead to confusion.
  1. Task:

Pros:

  • Clarity: Task communicates clearly that this is a piece of work to be done or undertaken. It emphasizes the work itself rather than the executor.
  • Broadly applicable: The term Task is applicable across various programming paradigms and doesn't necessarily imply a specific implementation, which could make it a more flexible choice.

Cons:

  • Misleading: In concurrent programming, a Task often refers to the job at hand that is to be processed, not the executor (thread or process). Using Task to refer to an executor might be misleading.

In conclusion, the decision should be based on how you want users to perceive the class. If you want to emphasize the entity that does the work, Worker is a better choice. If you want to highlight the work being done, you might prefer Task, but be aware of the potential for confusion. Remember, the name should convey the purpose of the class as clearly as possible to users.

metaist: What are some pros and cons of a name like Context?

GPT: Context is another potential name for the class. Here are some pros and cons:

Pros:

  1. General applicability: Context is a widely used term in programming and can apply to a variety of situations, making it a flexible choice.
  2. Conveys encapsulation: The term Context often implies some form of encapsulation or bundling of related data and operations, which might be accurate if your class is bundling together related methods and data.

Cons:

  1. Lack of specificity: Context is quite a general term and doesn't communicate what the class specifically does. It could be a context for anything, not necessarily a context for running tasks or workers, which might be confusing for users.
  2. Ambiguity: Context is used in many programming and software contexts (no pun intended), including context managers in Python, execution contexts in concurrency, and more. This broad usage could potentially lead to confusion.

Overall, Context could be a good choice if the class is meant to bundle together related methods and data. However, you might want to consider a more specific name if the class has a more specific purpose. For example, if the class is meant to manage workers, a name like WorkerContext could be more descriptive and less likely to cause confusion.

@metaist
Copy link
Owner Author

metaist commented May 17, 2023

Guess I'll go with Worker. Also, looks like I won't remove run and run_thread for now, but just have them return Worker objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant