Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[architecture] Specify pipeline in task_decorator #111

Open
shouldsee opened this issue Mar 20, 2019 · 10 comments
Open

[architecture] Specify pipeline in task_decorator #111

shouldsee opened this issue Mar 20, 2019 · 10 comments

Comments

@shouldsee
Copy link

There is no reason to hard code the Pipeline to be main_pipeline. Can we please fix this?

task = main_pipeline._create_task(task_func)

@Acribbs
Copy link

Acribbs commented Mar 23, 2019

Sorry I'm new to looking at the code in this repository. To me calling the class and naming it main_pipeline at the beginning makes sense. Are you able to elaborate on your issue so I can understand it further? Do you suggest that the Pipeline class isn't declared as a global variable?

@shouldsee
Copy link
Author

@Acribbs This concerns class vs. instances. I am perfectly happy to define an instance as a global variable, but I don't see attaching a global dictionary to class definition as appropriate. If you want a global dictionray, why don't make it ruffus.main_pipeline={}

@Acribbs
Copy link

Acribbs commented Mar 23, 2019

Ah I see, interpreted your issue wrongly, thanks for the further explanation. What do you think @AndreasHeger ?

@shouldsee
Copy link
Author

shouldsee commented Mar 23, 2019 via email

@Acribbs
Copy link

Acribbs commented Mar 23, 2019

ok no worries, I'm looking at the code and a little confused. But iv only really started to understand ruffus recently so wasn't surprised, but if you say there may be a confusion then I will wait for your comments. thanks

@shouldsee
Copy link
Author

shouldsee commented Mar 23, 2019

Expanded snippet:

class task_decorator(object):

    """
        Forwards to functions within Task
    """

    def __init__(self, *decoratorArgs, **decoratorNamedArgs):
        """
            saves decorator arguments
        """
        self.args = decoratorArgs
        self.named_args = decoratorNamedArgs

    def __call__(self, task_func):
        """
            calls func in task with the same name as the class
        """
        # add task to main pipeline
        # check for duplicate tasks inside _create_task
        task = main_pipeline._create_task(task_func)

So we are talking about class task_decorator here. Say I instantiated a decorator by dec = task_decorate() and used it to decorate f = lambda x: 'hello', with g = dec(f). I would find myself unable to use anything other than main_pipeline. So if I created another pipeline pp1 = Pipeline(name='pp1') , there would be no way of using pp1 instead of main_pipeline.

Pipeline should be configurable at some step, either __init__ or __call__. A simplest edition would be

    def __call__(self, task_func, pipeline = None):
        """
            calls func in task with the same name as the class
        """
        if pipeline is None:
            pipeline = main_pipeline
        # add task to main pipeline
        # check for duplicate tasks inside _create_task
        task = pipeline._create_task(task_func)

Or

    def __init__(self, pipeline= None, *decoratorArgs, **decoratorNamedArgs):
        """
            saves decorator arguments
        """
        self.args = decoratorArgs
        self.named_args = decoratorNamedArgs
        if pipeline is None:
            pipeline = main_pipeline
        self.pipeline = pipeline
    def __call__(self, task_func,):
        """
            calls func in task with the same name as the class
        """
        # add task to main pipeline
        # check for duplicate tasks inside _create_task
        task = self.pipeline._create_task(task_func)        

@IanSudbery
Copy link
Contributor

IanSudbery commented Mar 25, 2019

This is almost certainly an incompatibility of the two modes of ruffus - the functional paradigm and the object oriented one.

Certainly in the functional paradigm (which is the one that I think most people use), there is no concept of main and non-main pipelines - a pipeline is a file. I don't know how many people use the object oriented interface, but I think part of the reason for it was to allow self contained sub pipelines.

If we wanted to allow this we should use the second format, because the once decorated, functions are never directly called.

However, having the pipeline as the first arguement would break all existing pipelines, so I suggest:

   def __init__(self, *decoratorArgs, **decoratorNamedArgs):
       """
           saves decorator arguments
       """
       self.args = decoratorArgs
       self.named_args = decoratorNamedArgs
       if pipeline in decoratorNameArgs:
           self.pipeline = decoratorNamedArgs['pipeline']
       else:
           self.pipeline = main_pipeline

   def __call__(self, task_func,):
       """
           calls func in task with the same name as the class
       """
       # add task to main pipeline
       # check for duplicate tasks inside _create_task
       task = self.pipeline._create_task(task_func)  

But it would need to be well documented somewhere.

@shouldsee
Copy link
Author

shouldsee commented Mar 25, 2019

This is almost certainly an incompatibility of the two modes of ruffus - the functional paradigm and the object oriented one.

Certainly in the functional paradigm (which is the one that I think most people use), there is no concept of main and non-main pipelines - a pipeline is a file. I don't know how many people use the object oriented interface, but I think part of the reason for it was to allow self contained sub pipelines.
...

@IanSudbery

I am not very familiar with how ruffus supposed to be used in general, but I would quite like to realise the functionality in my toy example, bascially factoring out the "pipeline building" from the "job mapping" as two separate processes.

My guess is that current architecture must have been adopted to simplify certain syntax that was commonly re-used, but might quite unfriendly to maintainer at my first glance... Apology if my guess is too wild here.

In terms of code, your proposal would trigger Name 'pipeline' not found, but we can use instead the dict.get method:

   def __init__(self, *decoratorArgs, **decoratorNamedArgs):
       """
           saves decorator arguments
       """
       self.args = decoratorArgs
       self.named_args = decoratorNamedArgs
       self.pipeline= decoratorNameArgs.get('pipeline', main_pipeline)

   def __call__(self, task_func,):
       """
           calls func in task with the same name as the class
       """
       # add task to main pipeline
       # check for duplicate tasks inside _create_task
       task = self.pipeline._create_task(task_func)  

@IanSudbery
Copy link
Contributor

IanSudbery commented Mar 25, 2019

I am not very familiar with how ruffus supposed to be used in general,

I'm not really sure about "supposed" to be used, only how i've seen people use it. Must people I know that use ruffus, would use the syntax in simple.py here.

If we are ever keen to seperate business logic from pipeline flow we do as in seperate_logic.py here.

Personally I think this syntax is less abstract, cleaner, more readable and more straight forwardly pythonic, without the nested function calls etc. For me this concreteness, and pure python approach is why I prefer ruffus to snakemake, but I admit this is purely a matter of personal taste

I see no reason not to enable separate pipelines to be specified, although I didn't think it was necessary to what you wanted to do. Mind you, my mind has always shruken away whenever I've tried to look too hard under the hood of ruffus.

The code I posted shouldn't throw NameError because it checks for it first, but i agree your suggestion is equivalent but cleaner.

@shouldsee
Copy link
Author

shouldsee commented Mar 25, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants