You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a thread to discuss the new experimental pipeline.
My couple of minor suggestions:
why use the name "pipe" if "apply" if there is the more standard name "apply"?
Also, "pipe" implies that multiple functions are supposed to be applied to the data. Yet, it's very possible that a large share of "pipelines" will contain a single function, as it reduces the amount of time spent copying data when using multiprocessing.
I find the usage pattern of the data argument to be... a bit raw/poorly defined/restrictive?
I understand that 'data' is trying to solve the patterns where different steps of the pipeline must create and pass extra information besides the chunks themselves. But there several issues with the current implementation:
2.1) Having an optional argument "data" poses a major block to creating reusable components, as they now have to come in two varieties - one taking chunk as an argument; another taking (chunk, data).
2.2) More importantly, this extra "data" argument does not really solve the issue that, in complicated pipelines, different functions must be custom "fitted" to each other. There is no single "data" that functions can expect and pass downstream. Designing a library that would anticipate what kind of extra data is passed between functions is futile.
2.3) Finally, the only place where data is currently used is during balancing, where it stores filtered pixel counts. Correct me if I'm wrong, but, is this case, it's actually fine to modify chunks since the downstream functions do not use the original weights! I'd say, modifying chunks is great, because it enables combinatorial composition of filtering and computing functions w/o custom interfaces.
My proposal:
drop prepare; if needed, developers themselves can design custom functions take a chunk and output (chunk, extra_data).
it's okay to modify chunks, unless I am missing something big here.
use docs to teach the developers that the functions of their pipelines can generate extra data and pass it downstream.
rename pipe -> apply
The text was updated successfully, but these errors were encountered:
a thread to discuss the new experimental pipeline.
My couple of minor suggestions:
why use the name "pipe" if "apply" if there is the more standard name "apply"?
Also, "pipe" implies that multiple functions are supposed to be applied to the data. Yet, it's very possible that a large share of "pipelines" will contain a single function, as it reduces the amount of time spent copying data when using multiprocessing.
I find the usage pattern of the
data
argument to be... a bit raw/poorly defined/restrictive?I understand that 'data' is trying to solve the patterns where different steps of the pipeline must create and pass extra information besides the chunks themselves. But there several issues with the current implementation:
2.1) Having an optional argument "data" poses a major block to creating reusable components, as they now have to come in two varieties - one taking chunk as an argument; another taking (chunk, data).
2.2) More importantly, this extra "data" argument does not really solve the issue that, in complicated pipelines, different functions must be custom "fitted" to each other. There is no single "data" that functions can expect and pass downstream. Designing a library that would anticipate what kind of extra data is passed between functions is futile.
2.3) Finally, the only place where
data
is currently used is during balancing, where it stores filtered pixel counts. Correct me if I'm wrong, but, is this case, it's actually fine to modify chunks since the downstream functions do not use the original weights! I'd say, modifying chunks is great, because it enables combinatorial composition of filtering and computing functions w/o custom interfaces.My proposal:
The text was updated successfully, but these errors were encountered: