Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update core package #653

Conversation

mrchtr
Copy link
Contributor

@mrchtr mrchtr commented Nov 20, 2023

First PR related to the data structure redesign.

Implements the following:

  • New manifest structure (including validation, and evolution)
  • New ComponentSpec structure (including validation)
  • Removes Subsets and Index

Not all tests are running successfully. But this are already quite a few changes. Therefore, I've created PR on feature branch feature/redesign-dataset-format-and-interface, to have quicker feedback loops.

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr! I did a first quick review.

I would focus on getting the core module ported first and make sure the tests for those are passing. So everything related to the component spec and manifest.

Could you remove all the other changes from this branch and open those as PRs to this one? That would make it a lot easier to review :)

src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/schema.py Outdated Show resolved Hide resolved
src/fondant/core/schema.py Outdated Show resolved Hide resolved
src/fondant/core/schemas/common.json Outdated Show resolved Hide resolved
src/fondant/core/schemas/manifest.json Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
@mrchtr mrchtr force-pushed the feature/implement-new-dataset-format branch from 6bb9aa0 to d8ecd01 Compare November 21, 2023 08:26
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr, clean work!

I still need to review the tests. I would split the test examples per module as well though.

src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Show resolved Hide resolved
src/fondant/core/schema.py Outdated Show resolved Hide resolved
src/fondant/core/schemas/component_spec.json Outdated Show resolved Hide resolved
src/fondant/core/schemas/component_spec.json Outdated Show resolved Hide resolved
src/fondant/core/schemas/component_spec.json Outdated Show resolved Hide resolved
@mrchtr mrchtr changed the title Implement new data structure, ComponentSpec and Manifest Update component package Nov 22, 2023
@mrchtr mrchtr linked an issue Nov 22, 2023 that may be closed by this pull request
@mrchtr mrchtr changed the title Update component package Update core package Nov 22, 2023
@mrchtr
Copy link
Contributor Author

mrchtr commented Nov 22, 2023

I still need to review the tests. I would split the test examples per module as well though.

I've cleaned the test a bit more. I moved the relevant sample files into tests/core/examples.
Removed some of the test cases for manifest evolution. Mainly the ones related to additionalFields. We have to keep this in mind when we are implementing #656. We should add new test (re-add relevant) cases after the implementation.

@mrchtr mrchtr mentioned this pull request Nov 22, 2023
else:
self._specification["fields"][field.name] = {
"location": f"/{self.component_id}",
"type": field.type.name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below. We should store them as json mainly to match the format of the component spec

Suggested change
"type": field.type.name,
"type": field.type.to_json(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Wasn't aware of this, but it makes totally sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke some of the tests though.

>>> type.to_json()

{
  "type": type
}

So now we get:

{
  "location": ...,
  "type": {
    "type": type
  }
}

I think we can solve it like this:

self._specification["fields"][field.name] = {
  "location": f"/{self.component_id}",
  **field.type.name,
}

src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
src/fondant/core/manifest.py Outdated Show resolved Hide resolved
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr, I think these are the last ones. I think the tests should pass with these changes.

src/fondant/core/manifest.py Outdated Show resolved Hide resolved
tests/core/test_component_specs.py Outdated Show resolved Hide resolved
tests/core/test_manifest_evolution.py Outdated Show resolved Hide resolved
type: array
items:
type: float32
additionalFields: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed for now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would leave it here for now. The file is moved to a different folder anyway in #655.

else:
self._specification["fields"][field.name] = {
"location": f"/{self.component_id}",
"type": field.type.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke some of the tests though.

>>> type.to_json()

{
  "type": type
}

So now we get:

{
  "location": ...,
  "type": {
    "type": type
  }
}

I think we can solve it like this:

self._specification["fields"][field.name] = {
  "location": f"/{self.component_id}",
  **field.type.name,
}

@RobbeSneyders
Copy link
Member

FYI, as merge strategy, let's squash merge your separate PRs into your feature branch, and then rebase and merge the feature branch when it's ready. Then we keep the PRs as separate commits on main.

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @mrchtr!

@RobbeSneyders RobbeSneyders merged commit 06186c1 into feature/redesign-dataset-format-and-interface Nov 23, 2023
1 of 5 checks passed
@RobbeSneyders RobbeSneyders deleted the feature/implement-new-dataset-format branch November 23, 2023 09:37
RobbeSneyders added a commit that referenced this pull request Nov 24, 2023
First PR related to the data structure redesign. 

Implements the following: 
- New manifest structure (including validation, and evolution)
- New ComponentSpec structure (including validation)
- Removes `Subsets` and `Index`

Not all tests are running successfully. But this are already quite a few
changes. Therefore, I've created PR on feature branch
`feature/redesign-dataset-format-and-interface`, to have quicker
feedback loops.

---------

Co-authored-by: Robbe Sneyders <[email protected]>
Co-authored-by: Philippe Moussalli <[email protected]>
RobbeSneyders added a commit that referenced this pull request Nov 27, 2023
First PR related to the data structure redesign. 

Implements the following: 
- New manifest structure (including validation, and evolution)
- New ComponentSpec structure (including validation)
- Removes `Subsets` and `Index`

Not all tests are running successfully. But this are already quite a few
changes. Therefore, I've created PR on feature branch
`feature/redesign-dataset-format-and-interface`, to have quicker
feedback loops.

---------

Co-authored-by: Robbe Sneyders <[email protected]>
Co-authored-by: Philippe Moussalli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update core package
3 participants