Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from_registry #224

Closed
wants to merge 1 commit into from
Closed

Update from_registry #224

wants to merge 1 commit into from

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Jun 22, 2023

Currently the from_registry method accepts both name and component_spec arguments, and as seen here both are provided.

However, the name is only being used in case the user doesn't specify a component_spec_path as seen here.

Hence - please correct me if I'm wrong - there are 3 ways to create a component:

  1. use a custom component spec:
generate_prompts_op = ComponentOp(
    component_spec_path="components/generate_prompts/fondant_component.yaml",
    arguments={"n_rows_to_load": None},
)
  1. use a reusable component:
generate_prompts_op = ComponentOp.from_registry(
   name="load_from_hf_hub",
   arguments={"n_rows_to_load": None},
)

now there's a third case, in which you want to use a reusable component (like load_from_hf_hub) but want to provide a custom component spec. I guess for this use case you can just use ComponentOp with your own spec that you overwrote from one of the existing ones, or do you need to use from_registry?

@RobbeSneyders
Copy link
Member

I think ideally there are only 2 cases:

  • You want to use a reusable component --> ComponentOp.from_registry(name=...)
  • You want to use a custom component --> ComponentOp(component_spec_path=...)
    • A custom component can either use a custom image, or a pre-built one

I think this should work out of the box except for the flag we currently set to indicate if it is a local component or not:

local_component: bool = True,

This is used by the DockerCompiler to check if it needs to add a build step for a component or not. So we would have to find another mechanism to check this. (@GeorgesLorre)

@GeorgesLorre
Copy link
Collaborator

hmmm:

I'm not attached to the current way (it has its downsides) so I'm fine with changing it but indeed we need to have a robust and transparent way of determining when to build what.

Braindump:

  • We let the user put '.' as the image in the component spec meaning we have to build it. For the kubeflow compiler we can then check for this.
  • We let the user specify a flag in the componentOP: development: bool = False and we use this to build these images. But then the user will have to remove these flags.

@RobbeSneyders
Copy link
Member

  • We let the user put '.' as the image in the component spec meaning we have to build it. For the kubeflow compiler we can then check for this.

How would we handle this in the Kubeflow compiler? I would think this breaks the non-development way of running the component.

  • We let the user specify a flag in the componentOP: development: bool = False and we use this to build these images. But then the user will have to remove these flags.

I don't mind making this explicit. The Kubeflow compiler could just ignore this or also build the images if we know which registry to push to.

Another option is that we expect a fixed directory structure for a component, where the Dockerfile is located at the same level as the component_spec.yaml, and just check if we can find the Dockerfile?

@GeorgesLorre
Copy link
Collaborator

I think we need to decide if we want to add build functionality to the kubeflow compiler aswel?

@RobbeSneyders
Copy link
Member

I think we need to decide if we want to add build functionality to the kubeflow compiler aswel?

Not sure if we want to do it right now, but it would be nice to have the option. I don't think this should be the deciding factor though, we can always ignore the mechanism for runners that don't support building.

The more I think about it, the more I think working with a fixed directory structure and detecting the DockerFile is the best approach.

@GeorgesLorre
Copy link
Collaborator

I think we need to decide if we want to add build functionality to the kubeflow compiler aswel?

Not sure if we want to do it right now, but it would be nice to have the option. I don't think this should be the deciding factor though, we can always ignore the mechanism for runners that don't support building.

The more I think about it, the more I think working with a fixed directory structure and detecting the DockerFile is the best approach.

Ok let's do this then, we can make it very explicit in the logs.

@RobbeSneyders
Copy link
Member

I summarized the discussion here in #251. Will close this for now.

@janvanlooyml6 janvanlooyml6 deleted the update_from_registry branch January 9, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants