[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

rashidakanchwala · 2023-09-25T12:30:53Z

Description

We recently had a look at our backend structure while reviewing the Shareable Viz PR, we realised that using both Pydantic Base Models and data classes in our backend might be unnecessary. This ticket is to investigate this issue and refactor the code accordingly.

Context:

FastAPI's documentation recommends the use of Pydantic for input validation and data serialization. It mentions data classes offering a more advanced approach. Using both Pydantic models and data classes for similar purposes has lead to redundancy and has caused issues with mypy 1.0.

In this issue; we want to understand the below:-

Are there performance implications between the two? If one is more efficient than the other for specific tasks, we should clarify their roles. Was the decision to use both related to our data access manager and repositories?
Given that Pydantic models are classes as well, can't we achieve similar functionalities with them as with data classes?

Checklist

Include labels so that we can categorise your feature request

rashidakanchwala · 2023-10-17T15:28:13Z

Refactoring this ticket has proven to be challenging due to discrepancies between the Pydantic base classes in response.py and the dataclasses in models/flowchart.py. Our original plan of exclusively using Pydantic classes, populating data repositories based on these classes, and sending them as responses to the front-end may not be as feasible. Here are some alternative approaches to consider:

Separate Request and Response Models: We're considering creating two Pydantic BaseModels for each of the models, such as TaskNodeRequest and TaskNodeResponse. The rationale behind this is that we load data from the Kedro Project and populate our repository using the flowchart models. In this context, we're essentially creating the dataclass objects. On the other hand, when we send data to the front end, it becomes the response. The process of creating and storing in the repository and the data we send to the front end can have slight differences. So now we convert these dataclasses in flowchart to pydantic request models.
Use Pydantic Dataclasses: Another option is to use pydantic.dataclasses for the models and keep the response models as they are. When Ravi initially started doing the refactor and changing the dataclasses to Pydantic, he also realised that there were a lot of issues in tests that were not captured due to lack of pydantic validation. So maybe having pydantic classes helps here.
Combine Models: We could explore combining the two models into one. Although there are a few properties that don't match, most of them do. We might create these properties but not include them in the data sent to the front end. This was the original plan but because some properties dont' match, we are unsure now if we should combine for the sake of simplicity or have separation of concerns here.
Reconsider the Refactor: Finally, we're also open to the possibility that this refactor might not be the best approach. When we attempted to merge the two i.e dataclasses and pydantic models, we encountered issues and began to understand why the original separation made sense.

@ravi-kumar-pilla @noklam @merelcht @antonymilne @tynandebold

ravi-kumar-pilla · 2023-10-19T18:17:45Z

Thank you @antonymilne for the time. Based on the discussion today, I would like to follow what Antony mentioned -

Focus on bumping FAST API and add pydantic support this sprint
Work on refactoring request models based on the suggestions of not using __init__, using pydantic base models and any PR comments. Keep the separation between request and response models.

@rashidakanchwala @noklam please add on anything if I missed. Thank you !

antonymilne · 2023-10-25T11:08:34Z

Some more context and notes - sorry for the delay in writing this up!

When it comes to handling the responses, this is where things stand now:

FastAPI responses live in api/rest/responses.py and uses pydantic BaseModel
underlying data model lives in models/flowchart.py and uses dataclass

I definitely think it's a good move to convert the data model to use BaseModel as well.

Why should these things be separate in the first place? This gives some good principles, although in our case much of it is not relevant. In practice, for us, the main distinction between the data model and the API responses will be that the responses must be JSON serialisable while the data model can have arbitrary objects in it (e.g. an AbstractDataSet).

But the responses and data model will be very similar in kedro-viz, so while they should remain distinct there should be a good way to write them without just copying and pasting code. You should definitely read this page to understand the different options. Basically for us I think there's 3 options:

subclass. This is a conventional way to handle things and generally a good plan. In our case though note that you need to be careful about getting the hierarchy correct since we want the API to depend on the data model and not vice versa. This is possible but potentially a bit awkward because inheritance makes it easy to add fields but not to remove them.
if all that's different between data model and response is removing fields then response_model_exclude/include would be a very easy way to handle it. In general this is not a recommended approach but for us actually I think it would be fine and maybe the clearest way
make the fields that should be excluded from the response private: How to hide certain fields in Pydantic models? fastapi/fastapi#1378

Overall I would start with one of the models (e.g. GraphNodeMetadata and its subclasses) and briefly prototype all three approaches to get a feel for which one will be best, and then roll out that to all the other models in the file.

Just so it doesn't get forgotten: TrackingDatasetModel in experiment_tracking.py should also be converted to pydantic.

rashidakanchwala added Issue: Feature Request Python Pull requests that update Python code labels Sep 25, 2023

rashidakanchwala added this to Kedro-Viz Sep 25, 2023

rashidakanchwala moved this to Inbox in Kedro-Viz Sep 25, 2023

tynandebold removed the Issue: Feature Request label Sep 25, 2023

tynandebold added this to the Refactoring and tech debt milestone Sep 25, 2023

tynandebold moved this from Inbox to Backlog in Kedro-Viz Sep 25, 2023

tynandebold moved this from Backlog to Todo in Kedro-Viz Oct 2, 2023

ravi-kumar-pilla assigned rashidakanchwala and ravi-kumar-pilla Oct 2, 2023

tynandebold moved this from Todo to In Progress in Kedro-Viz Oct 3, 2023

ravi-kumar-pilla mentioned this issue Oct 9, 2023

Refactor Flowchart models from dataclass to pydantic base models #1565

Merged

5 tasks

tynandebold mentioned this issue Oct 12, 2023

bump fastapi and add pydantic compatibility #1529

Closed

5 tasks

ravi-kumar-pilla moved this from In Progress to In Review in Kedro-Viz Oct 12, 2023

ravi-kumar-pilla moved this from In Review to Done in Kedro-Viz Oct 24, 2023

ravi-kumar-pilla moved this from Done to In Progress in Kedro-Viz Oct 25, 2023

ravi-kumar-pilla moved this from In Progress to In Review in Kedro-Viz Nov 7, 2023

rashidakanchwala added Type: Technical Design and removed Python Pull requests that update Python code labels Nov 9, 2023

ravi-kumar-pilla closed this as completed in #1565 Nov 14, 2023

github-project-automation bot moved this from In Review to Done in Kedro-Viz Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

rashidakanchwala commented Sep 25, 2023 •

edited by tynandebold

Loading

rashidakanchwala commented Oct 17, 2023 •

edited

Loading

ravi-kumar-pilla commented Oct 19, 2023

antonymilne commented Oct 25, 2023

[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

[Spike] Clarifying Overlap: Pydantic Models vs. Data Classes in Backend #1538

Comments

rashidakanchwala commented Sep 25, 2023 • edited by tynandebold Loading

Description

Context:

Checklist

rashidakanchwala commented Oct 17, 2023 • edited Loading

ravi-kumar-pilla commented Oct 19, 2023

antonymilne commented Oct 25, 2023

rashidakanchwala commented Sep 25, 2023 •

edited by tynandebold

Loading

rashidakanchwala commented Oct 17, 2023 •

edited

Loading