Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new data format #667

Conversation

mrchtr
Copy link
Contributor

@mrchtr mrchtr commented Nov 23, 2023

This PR applies the usage of the new data format:

  • fixes all tests
  • update component specifications and component code
  • remove subset field usage in pipeline.py

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr!

All the test files in `tests/pipeline/examples are shown as newly added. I think this is because you didn't remove the old ones (or at least didn't include this in git).

I would be in favor of choosing better column names. Eg.

  • images_data -> image
  • captions_data -> caption

But for me it's fine to include it like this for now. We can change them later.

components/chunk_text/tests/chunk_text_test.py Outdated Show resolved Hide resolved
components/embed_text/fondant_component.yaml Outdated Show resolved Hide resolved
components/image_cropping/src/main.py Outdated Show resolved Hide resolved
@@ -238,7 +238,7 @@ def remove_field(self, name: str) -> None:

del self._specification["fields"][name]

def evolve( # noqa : PLR0912 (too many branches)
def evolve( # : PLR0912 (too many branches)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something happened to the comment here.

src/fondant/pipeline/pipeline.py Outdated Show resolved Hide resolved
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr!

@RobbeSneyders RobbeSneyders merged commit 9f057ad into feature/redesign-dataset-format-and-interface Nov 24, 2023
3 of 4 checks passed
@RobbeSneyders RobbeSneyders deleted the feature/use-new-data-format branch November 24, 2023 07:46
@mrchtr mrchtr linked an issue Nov 24, 2023 that may be closed by this pull request
RobbeSneyders added a commit that referenced this pull request Nov 24, 2023
This PR applies the usage of the new data format:

- fixes all tests
- update component specifications and component code
- remove subset field usage in `pipeline.py`

---------

Co-authored-by: Robbe Sneyders <[email protected]>
RobbeSneyders added a commit that referenced this pull request Nov 27, 2023
This PR applies the usage of the new data format:

- fixes all tests
- update component specifications and component code
- remove subset field usage in `pipeline.py`

---------

Co-authored-by: Robbe Sneyders <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update components and manifests
2 participants