You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At a project am working on, we are heavily utilizing the @requires decorator to ease parameter inheritance pain. This is a big project so we have split up our monolithic pipeline into smaller pipelines each being worked on by a different team. Some of these pipelines also derive from others using normal Python inheritance. Because of our use case (many interdependent pipelines and use of @requires) we have found ourselves in the situation where two or several tasks in the pipelines chain define the same parameter name in different contexts. Since the context is different the existing functionality of @requires (and @inherits) of silently masking a similarly named upstream parameter is undesirable. It would be better if Luigi could avoid this silent parameter masking or give a warning at the very least.
This problem that I have described, is what we are unofficially calling the "parameter collision" problem. There are two flavors of this problem that we have encountered;
In the first case (Case A), TaskC will silently mask param_a that had been passed down from the upstream TaskA. In the 2nd case (Case B), TaskZ will possibly also mask param_a from either TaskX or TaskY. Such a masking could have been unintentional and therefore it's dangerous for parameters to be silently overwritten (masked).
In a big project, with different teams working on different parts, it might be impossible to realize you have masked/overwritten a parameter that had been defined in an upstream task. Therefore, we need a mechanism for detecting and warning the developer when such a collision of parameter names occurs. We evaluated a few options but the most promising solution seems to be;
Modify the @requires and @inherits decorator to throw an exception when a parameter collision is detected. In addition, add a new argument, ignore_collisions=(), to allow excluding select parameters from the collision detection. The intention is that when a parameter collision is detected at a particular point in the task inheritance chain, the developer has the choice of either renaming one of the parameters or include it in the ignore_collisions list if the collision is desired. I have included a WIP pull request here.
Has anyone else encountered this problem and if so do you think the suggested solution is sufficient? If the answer to both of these questions is in the affirmative, I will rework the pull request to include tests and documentation.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.
I have added a PR (#3169) based on the work of @KenyaDonDraper that also adds a config switch so this functionality can be enabled without possibly affecting existing workflows.
At a project am working on, we are heavily utilizing the @requires decorator to ease parameter inheritance pain. This is a big project so we have split up our monolithic pipeline into smaller pipelines each being worked on by a different team. Some of these pipelines also derive from others using normal Python inheritance. Because of our use case (many interdependent pipelines and use of @requires) we have found ourselves in the situation where two or several tasks in the pipelines chain define the same parameter name in different contexts. Since the context is different the existing functionality of @requires (and @inherits) of silently masking a similarly named upstream parameter is undesirable. It would be better if Luigi could avoid this silent parameter masking or give a warning at the very least.
This problem that I have described, is what we are unofficially calling the "parameter collision" problem. There are two flavors of this problem that we have encountered;
Case A:
Case B:
In the first case (Case A), TaskC will silently mask
param_a
that had been passed down from the upstream TaskA. In the 2nd case (Case B), TaskZ will possibly also maskparam_a
from either TaskX or TaskY. Such a masking could have been unintentional and therefore it's dangerous for parameters to be silently overwritten (masked).In a big project, with different teams working on different parts, it might be impossible to realize you have masked/overwritten a parameter that had been defined in an upstream task. Therefore, we need a mechanism for detecting and warning the developer when such a collision of parameter names occurs. We evaluated a few options but the most promising solution seems to be;
ignore_collisions=()
, to allow excluding select parameters from the collision detection. The intention is that when a parameter collision is detected at a particular point in the task inheritance chain, the developer has the choice of either renaming one of the parameters or include it in the ignore_collisions list if the collision is desired. I have included a WIP pull request here.Has anyone else encountered this problem and if so do you think the suggested solution is sufficient? If the answer to both of these questions is in the affirmative, I will rework the pull request to include tests and documentation.
The text was updated successfully, but these errors were encountered: