-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update components to implement the PandasTransformComponent #203
Labels
Components
Implementation of components
Comments
RobbeSneyders
added a commit
that referenced
this issue
Jun 14, 2023
Fixes #183 There's some todo's left, we should - [ ] Look into the redefinition of the divisions after we clear them. Now that we take this out of the hands of the user, we should define which strategy we want to follow here. (#205) - [ ] Move to `hierarchical columns`. Pandas can work with hierarchical columns, which would be a lot nicer as a user interface. I want to check if I can make this work with Dask, and otherwise move the translation from underscored names to hierarchical columns and back at the level of the `PandasTransformComponent` (#204) - [ ] Update the reusable components to leverage the `PandasTransformComponent` (#203)
RobbeSneyders
moved this from Ready for development
to In Progress
in Fondant development
Jun 16, 2023
RobbeSneyders
added a commit
that referenced
this issue
Jun 20, 2023
…omponent (#219) Contributes to #203 This PR updates the reusable components of the stable diffusion / controlnet pipeline to use the new Pandas interface. I submit these separately since I've already tested them. Some simplifications are clear: - No more `map_partitions` needed for batching - No more `meta` needed for `apply` or similar methods Since we now set the `meta` of the `map_partitions` in the `PandasTransformComponent` based on the component spec, the returned dataframes are validated quite strictly. It does not allow any additional columns. The Pandas implementation can probably be improved further as I'm not the biggest Pandas pro.
#219 did this for all relevant components in the Controlnet and Stable Diffusion pipelines. We still need to do this for the others. |
2 tasks
github-project-automation
bot
moved this from In Progress
to Done
in Fondant development
Jul 18, 2023
satishjasthi
pushed a commit
to satishjasthi/fondant
that referenced
this issue
Jul 21, 2023
This PR follows up on the PoC presented in ml6team#268 --- Fixes ml6team#257 It splits the implementation and execution of components, this has some advantages: - Pandas components can use `__init__` instead of setup, which is probably more familiar to users - Other components can use `__init__` as well instead of receiving all arguments to their transform or equivalent method, aligning implementation of different component types - Component implementation and execution should be easier to test separately I borrowed the executor terminology from KfP. --- Fixes ml6team#203 Since I had to update all the components, I also switched some of them to subclass `PandasTransformComponent` instead of `DaskTransformComponent`. --- These changes open some opportunities for additional improvements, but I propose to tackle those as separate PRs as this PR is already quite huge due to all the changes to the components. - [ ] ml6team#300 - [ ] ml6team#301
Hakimovich99
pushed a commit
that referenced
this issue
Oct 16, 2023
Fixes #183 There's some todo's left, we should - [ ] Look into the redefinition of the divisions after we clear them. Now that we take this out of the hands of the user, we should define which strategy we want to follow here. (#205) - [ ] Move to `hierarchical columns`. Pandas can work with hierarchical columns, which would be a lot nicer as a user interface. I want to check if I can make this work with Dask, and otherwise move the translation from underscored names to hierarchical columns and back at the level of the `PandasTransformComponent` (#204) - [ ] Update the reusable components to leverage the `PandasTransformComponent` (#203)
Hakimovich99
pushed a commit
that referenced
this issue
Oct 16, 2023
…omponent (#219) Contributes to #203 This PR updates the reusable components of the stable diffusion / controlnet pipeline to use the new Pandas interface. I submit these separately since I've already tested them. Some simplifications are clear: - No more `map_partitions` needed for batching - No more `meta` needed for `apply` or similar methods Since we now set the `meta` of the `map_partitions` in the `PandasTransformComponent` based on the component spec, the returned dataframes are validated quite strictly. It does not allow any additional columns. The Pandas implementation can probably be improved further as I'm not the biggest Pandas pro.
Hakimovich99
pushed a commit
that referenced
this issue
Oct 16, 2023
This PR follows up on the PoC presented in #268 --- Fixes #257 It splits the implementation and execution of components, this has some advantages: - Pandas components can use `__init__` instead of setup, which is probably more familiar to users - Other components can use `__init__` as well instead of receiving all arguments to their transform or equivalent method, aligning implementation of different component types - Component implementation and execution should be easier to test separately I borrowed the executor terminology from KfP. --- Fixes #203 Since I had to update all the components, I also switched some of them to subclass `PandasTransformComponent` instead of `DaskTransformComponent`. --- These changes open some opportunities for additional improvements, but I propose to tackle those as separate PRs as this PR is already quite huge due to all the changes to the components. - [ ] #300 - [ ] #301
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As follow up of #200, we should update our reusable transform components to leverage the
PandasTransformComponent
The text was updated successfully, but these errors were encountered: