Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use from_registry for generic components #251

Closed
2 tasks done
RobbeSneyders opened this issue Jun 29, 2023 · 0 comments · Fixed by #285
Closed
2 tasks done

Don't use from_registry for generic components #251

RobbeSneyders opened this issue Jun 29, 2023 · 0 comments · Fixed by #285
Assignees
Labels
Core Core framework

Comments

@RobbeSneyders
Copy link
Member

RobbeSneyders commented Jun 29, 2023

Summary of discussion in #224

Currently the operation for a generic component (a reusable component that dynamically takes a component specification), needs to be created as follows:

component_op = ComponentOp.from_registry(
    name="generic_component",
    component_spec_path="components/custom_generic_component/fondant_component.yaml",
)

But the name isn't actually used. Instead the component specification contains a reference to the generic image that it should use. So ideally, we could create it as follows instead:

component_op = ComponentOp(
    component_spec_path="components/custom_generic_component/fondant_component.yaml",
)

One issue here is that we currently use the way a ComponentOp is initialized to decide if the local runner should add a build step for it. Since this looks like a custom component, it will try to build it, while there is no Dockerfile or code available at the specified location.

We should therefore find a different mechanism to decide if a component needs to be built. If we agree that a component should have a fixed directory structure, we can instead detect if a Dockerfile is present. We currently already use the same directory structure for each component, which is:

|- my_component
   |- src
      |- main.py
   |- Dockerfile
   |- fondant_component.yaml
   |- README.md
   |- requirements.txt

If we fix this structure, we can also deduct the location of the component specification and further simplify the definition of a component to:

component_op = ComponentOp(
    path="components/custom_generic_component",
)

Such a fixed directory structure also provides opportunities to implement a component generator / cookie-cutter in the future.

Tasks

Preview Give feedback
  1. Core
    RobbeSneyders
  2. Core
    RobbeSneyders
@RobbeSneyders RobbeSneyders converted this from a draft issue Jun 29, 2023
@RobbeSneyders RobbeSneyders added the Core Core framework label Jun 29, 2023
@RobbeSneyders RobbeSneyders moved this from Ready for development to In Progress in Fondant development Jul 9, 2023
@RobbeSneyders RobbeSneyders self-assigned this Jul 9, 2023
@RobbeSneyders RobbeSneyders moved this from In Progress to Validation in Fondant development Jul 10, 2023
RobbeSneyders added a commit that referenced this issue Jul 18, 2023
Fixes #251, #252, #253

Before this PR, the operation for a generic component (a reusable
component that dynamically takes a component specification), needed to
be created as follows:

```python
component_op = ComponentOp.from_registry(
    name="generic_component",
    component_spec_path="components/custom_generic_component/fondant_component.yaml",
)
```

But the provided name wasn't actually used, since the component
specification already contains a reference to the reusable image that it
should use. Now we can define both custom and generic components as
follows:

```python
component_op = ComponentOp(
    component_dir="components/custom_generic_component",
)
```

There is still a difference in how we want to handle custom and generic
components though. Custom components should be built by the local
runner, while generic components should not, since they use a reusable
image. To make this differentiation, we now simply check if a
`Dockerfile` is present in the provided `Component_dir`. This will be
the case for a custom component, but not for a generic component.

---

This has a nice implicit result for reusable components as well, which
can still be defined as:

```python
component_op = ComponentOp.from_registry(
    name="reusable_component",
)
```

Which `fondant` now resolves to:
```python
component_op = ComponentOp(
    component_dir="{fondant_install_dir}/components/custom_generic_component",
)
```

I added a change to this PR which no longer packages the reusable
component code with the `fondant` package, but only the component
specifications, as those are all that is needed since they contain a
reference to the reusable image on the registry. This means that the
`component_dir` above doesn't contain a `Dockerfile` when `fondant` is
installed from PyPI, but does when you locally install `fondant` using
`poetry install`. So the local runner doesn't build reusable components
when users install `fondant` from PyPI, but it does when you're working
on the `fondant` repo, which is useful for us `fondant` developers.

---

The only thing we still need is an option on the runner to provide
`build_args`, so we can pass in the `fondant` version to build into the
image. But I'll open a separate PR for that.
@github-project-automation github-project-automation bot moved this from Validation to Done in Fondant development Jul 18, 2023
satishjasthi pushed a commit to satishjasthi/fondant that referenced this issue Jul 21, 2023
Fixes ml6team#251, ml6team#252, ml6team#253

Before this PR, the operation for a generic component (a reusable
component that dynamically takes a component specification), needed to
be created as follows:

```python
component_op = ComponentOp.from_registry(
    name="generic_component",
    component_spec_path="components/custom_generic_component/fondant_component.yaml",
)
```

But the provided name wasn't actually used, since the component
specification already contains a reference to the reusable image that it
should use. Now we can define both custom and generic components as
follows:

```python
component_op = ComponentOp(
    component_dir="components/custom_generic_component",
)
```

There is still a difference in how we want to handle custom and generic
components though. Custom components should be built by the local
runner, while generic components should not, since they use a reusable
image. To make this differentiation, we now simply check if a
`Dockerfile` is present in the provided `Component_dir`. This will be
the case for a custom component, but not for a generic component.

---

This has a nice implicit result for reusable components as well, which
can still be defined as:

```python
component_op = ComponentOp.from_registry(
    name="reusable_component",
)
```

Which `fondant` now resolves to:
```python
component_op = ComponentOp(
    component_dir="{fondant_install_dir}/components/custom_generic_component",
)
```

I added a change to this PR which no longer packages the reusable
component code with the `fondant` package, but only the component
specifications, as those are all that is needed since they contain a
reference to the reusable image on the registry. This means that the
`component_dir` above doesn't contain a `Dockerfile` when `fondant` is
installed from PyPI, but does when you locally install `fondant` using
`poetry install`. So the local runner doesn't build reusable components
when users install `fondant` from PyPI, but it does when you're working
on the `fondant` repo, which is useful for us `fondant` developers.

---

The only thing we still need is an option on the runner to provide
`build_args`, so we can pass in the `fondant` version to build into the
image. But I'll open a separate PR for that.
Hakimovich99 pushed a commit that referenced this issue Oct 16, 2023
Fixes #251, #252, #253

Before this PR, the operation for a generic component (a reusable
component that dynamically takes a component specification), needed to
be created as follows:

```python
component_op = ComponentOp.from_registry(
    name="generic_component",
    component_spec_path="components/custom_generic_component/fondant_component.yaml",
)
```

But the provided name wasn't actually used, since the component
specification already contains a reference to the reusable image that it
should use. Now we can define both custom and generic components as
follows:

```python
component_op = ComponentOp(
    component_dir="components/custom_generic_component",
)
```

There is still a difference in how we want to handle custom and generic
components though. Custom components should be built by the local
runner, while generic components should not, since they use a reusable
image. To make this differentiation, we now simply check if a
`Dockerfile` is present in the provided `Component_dir`. This will be
the case for a custom component, but not for a generic component.

---

This has a nice implicit result for reusable components as well, which
can still be defined as:

```python
component_op = ComponentOp.from_registry(
    name="reusable_component",
)
```

Which `fondant` now resolves to:
```python
component_op = ComponentOp(
    component_dir="{fondant_install_dir}/components/custom_generic_component",
)
```

I added a change to this PR which no longer packages the reusable
component code with the `fondant` package, but only the component
specifications, as those are all that is needed since they contain a
reference to the reusable image on the registry. This means that the
`component_dir` above doesn't contain a `Dockerfile` when `fondant` is
installed from PyPI, but does when you locally install `fondant` using
`poetry install`. So the local runner doesn't build reusable components
when users install `fondant` from PyPI, but it does when you're working
on the `fondant` repo, which is useful for us `fondant` developers.

---

The only thing we still need is an option on the runner to provide
`build_args`, so we can pass in the `fondant` version to build into the
image. But I'll open a separate PR for that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Core framework
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant