Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

Closed
1 task done
rashidakanchwala opened this issue Sep 25, 2023 · 3 comments · Fixed by #1565
Closed
1 task done

[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

rashidakanchwala opened this issue Sep 25, 2023 · 3 comments · Fixed by #1565
Assignees

Comments

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Sep 25, 2023

Description

We recently had a look at our backend structure while reviewing the Shareable Viz PR, we realised that using both Pydantic Base Models and data classes in our backend might be unnecessary. This ticket is to investigate this issue and refactor the code accordingly.

Context:

FastAPI's documentation recommends the use of Pydantic for input validation and data serialization. It mentions data classes offering a more advanced approach. Using both Pydantic models and data classes for similar purposes has lead to redundancy and has caused issues with mypy 1.0.

In this issue; we want to understand the below:-

  • Are there performance implications between the two? If one is more efficient than the other for specific tasks, we should clarify their roles. Was the decision to use both related to our data access manager and repositories?
  • Given that Pydantic models are classes as well, can't we achieve similar functionalities with them as with data classes?

Checklist

  • Include labels so that we can categorise your feature request
@rashidakanchwala rashidakanchwala added Issue: Feature Request Python Pull requests that update Python code labels Sep 25, 2023
@rashidakanchwala rashidakanchwala moved this to Inbox in Kedro-Viz Sep 25, 2023
@tynandebold tynandebold added this to the Refactoring and tech debt milestone Sep 25, 2023
@tynandebold tynandebold moved this from Inbox to Backlog in Kedro-Viz Sep 25, 2023
@tynandebold tynandebold moved this from Backlog to Todo in Kedro-Viz Oct 2, 2023
@tynandebold tynandebold moved this from Todo to In Progress in Kedro-Viz Oct 3, 2023
@ravi-kumar-pilla ravi-kumar-pilla moved this from In Progress to In Review in Kedro-Viz Oct 12, 2023
@rashidakanchwala
Copy link
Contributor Author

rashidakanchwala commented Oct 17, 2023

Refactoring this ticket has proven to be challenging due to discrepancies between the Pydantic base classes in response.py and the dataclasses in models/flowchart.py. Our original plan of exclusively using Pydantic classes, populating data repositories based on these classes, and sending them as responses to the front-end may not be as feasible. Here are some alternative approaches to consider:

  • Separate Request and Response Models: We're considering creating two Pydantic BaseModels for each of the models, such as TaskNodeRequest and TaskNodeResponse. The rationale behind this is that we load data from the Kedro Project and populate our repository using the flowchart models. In this context, we're essentially creating the dataclass objects. On the other hand, when we send data to the front end, it becomes the response. The process of creating and storing in the repository and the data we send to the front end can have slight differences. So now we convert these dataclasses in flowchart to pydantic request models.
  • Use Pydantic Dataclasses: Another option is to use pydantic.dataclasses for the models and keep the response models as they are. When Ravi initially started doing the refactor and changing the dataclasses to Pydantic, he also realised that there were a lot of issues in tests that were not captured due to lack of pydantic validation. So maybe having pydantic classes helps here.
  • Combine Models: We could explore combining the two models into one. Although there are a few properties that don't match, most of them do. We might create these properties but not include them in the data sent to the front end. This was the original plan but because some properties dont' match, we are unsure now if we should combine for the sake of simplicity or have separation of concerns here.
  • Reconsider the Refactor: Finally, we're also open to the possibility that this refactor might not be the best approach. When we attempted to merge the two i.e dataclasses and pydantic models, we encountered issues and began to understand why the original separation made sense.

@ravi-kumar-pilla @noklam @merelcht @antonymilne @tynandebold

@ravi-kumar-pilla
Copy link
Contributor

Thank you @antonymilne for the time. Based on the discussion today, I would like to follow what Antony mentioned -

  1. Focus on bumping FAST API and add pydantic support this sprint
  2. Work on refactoring request models based on the suggestions of not using __init__, using pydantic base models and any PR comments. Keep the separation between request and response models.

@rashidakanchwala @noklam please add on anything if I missed. Thank you !

@ravi-kumar-pilla ravi-kumar-pilla moved this from In Review to Done in Kedro-Viz Oct 24, 2023
@antonymilne
Copy link
Contributor

Some more context and notes - sorry for the delay in writing this up!

When it comes to handling the responses, this is where things stand now:

  • FastAPI responses live in api/rest/responses.py and uses pydantic BaseModel
  • underlying data model lives in models/flowchart.py and uses dataclass

I definitely think it's a good move to convert the data model to use BaseModel as well.

Why should these things be separate in the first place? This gives some good principles, although in our case much of it is not relevant. In practice, for us, the main distinction between the data model and the API responses will be that the responses must be JSON serialisable while the data model can have arbitrary objects in it (e.g. an AbstractDataSet).

But the responses and data model will be very similar in kedro-viz, so while they should remain distinct there should be a good way to write them without just copying and pasting code. You should definitely read this page to understand the different options. Basically for us I think there's 3 options:

  1. subclass. This is a conventional way to handle things and generally a good plan. In our case though note that you need to be careful about getting the hierarchy correct since we want the API to depend on the data model and not vice versa. This is possible but potentially a bit awkward because inheritance makes it easy to add fields but not to remove them.
  2. if all that's different between data model and response is removing fields then response_model_exclude/include would be a very easy way to handle it. In general this is not a recommended approach but for us actually I think it would be fine and maybe the clearest way
  3. make the fields that should be excluded from the response private: How to hide certain fields in Pydantic models? fastapi/fastapi#1378

Overall I would start with one of the models (e.g. GraphNodeMetadata and its subclasses) and briefly prototype all three approaches to get a feel for which one will be best, and then roll out that to all the other models in the file.

Just so it doesn't get forgotten: TrackingDatasetModel in experiment_tracking.py should also be converted to pydantic.

@ravi-kumar-pilla ravi-kumar-pilla moved this from Done to In Progress in Kedro-Viz Oct 25, 2023
@ravi-kumar-pilla ravi-kumar-pilla moved this from In Progress to In Review in Kedro-Viz Nov 7, 2023
@rashidakanchwala rashidakanchwala added Type: Technical Design and removed Python Pull requests that update Python code labels Nov 9, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro-Viz Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants