-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support applying Lightweight Python components in Pipeline SDK #750
Comments
I also think a decorator will be the cleanest and clearest way of achieving this. This means that an We also need to maybe think about the eager execution interface: This is what I have now (calling
but it might be nicer to keep
That way we keep the pipeline definition as is and can to iterative development by calling certain components while creating. |
+1
Agree with keeping
One other thing is that if we want to support Eager execution on different runners, we need to know the runner up front. In Apache Beam for instance, you pass the runner to the pipeline when instantiating. |
I think this ticket can be limited to loading a Python component into a If we look at the
Since we keep the same So this ticket needs to be able to translate a Python component into a component spec, which consists of the following necessary elements:
|
Yes! this is very similar to what I wrote for the xmas demo: Minus the docker image (my demo has eager execution) |
Fixes #751 This PR introduces functionality to infer the arguments from a `Component` class. The result is a dictionary with the argument names as keys, and `Argument` instances as values, which is the format of [`component_spec.args`.](https://github.com/ml6team/fondant/blob/8e828441eec8ff91074e5c8ccf16fe405b719594/src/fondant/core/component_spec.py#L193) We can leverage this behavior for Lightweight Python components as described in #750. Did some TDD here, let me know if I missed any cases.
We want to be able to apply Lightweight Python components as part of a pipeline just like we do with docker components.
Lightweight Python components will require some additional arguments compared to docker components. I can think of two already:
image
: docker image to run the code in.dependencies
: additional Python dependencies to install in the container before executing.I see two options:
apply
(andread
andwrite
) methods on the dataset / pipeline to support these arguments. Since they are not relevant for docker components, we might then want to split this into two separateapply
methods for clarity.I think I would prefer the second option so we can keep a single
apply
interface.The text was updated successfully, but these errors were encountered: