-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #1745] Setup transformation process structure & transform an opportunity #1794
Conversation
logger = logging.getLogger(__name__) | ||
|
||
|
||
class TransformOracleData(Task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesbursa - FYI I'm imagining that the script we'll have for the copy + transform will just look like:
def whatever_entrypoint_func(db_session):
ExtractLoadOracleData(db_session).run()
TransformOracleData(db_session).run()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks
api/src/data_migration/transformation/transform_oracle_data_task.py
Outdated
Show resolved
Hide resolved
api/src/data_migration/transformation/transform_oracle_data_task.py
Outdated
Show resolved
Hide resolved
raise ValueError("Cannot delete opportunity as it does not exist") | ||
|
||
self.increment(self.Metrics.TOTAL_RECORDS_DELETED) | ||
self.db_session.delete(target_opportunity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to review this once we have foreign keys pointing to it, and change it to setting is_deleted
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the cascades setup on the relationships, so its safe to delete with the foreign keys.
api/src/data_migration/transformation/transform_oracle_data_task.py
Outdated
Show resolved
Hide resolved
updated_timestamp = convert_est_timestamp_to_utc(source.last_upd_date) # type: ignore[attr-defined] | ||
|
||
if created_timestamp is not None: | ||
target.created_at = created_timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we have two pairs of columns. Keep the ones in TimestampMixin
managed by SQLAlchemy with that meaning, and add two new columns for these upstream datetimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you suggest we set this up and use it? I set them this way so when someone fetches them from our API, they have the same values as the legacy system.
How would we want to map these for the API if we add two more columns? A user shouldn't need to be aware that we technically updated everything when we imported it (that's irrelevant to them).
This is a follow-up to #1794 - which it builds upon. ## Summary Fixes #1746 ### Time to review: __10 mins__ ## Changes proposed Adds transformation logic for the assistance listing (formerly CFDA) tables. ## Context for reviewers The transformations are pretty uneventful, the only complexity is that the legacy Oracle database doesn't have a foreign key between the `TopportunityCfda` table and the `Topportunity` table and there are ~2300 orphaned cfda records that we wouldn't be able to import, so we additionally need to validate that the opportunity exists when we try to transform the data, and if not, we just mark it as "transformed" and do nothing with it. There is some basic work on relationships between the staging tables + more factories for setting up the data which will be ongoing / @jamesbursa is also looking into. --------- Co-authored-by: nava-platform-bot <[email protected]>
Summary
Fixes #1745
Time to review: 10 mins
Changes proposed
Setup the transformation process script structure
Implement some shared utilities that subsequent PRs will use
Implement the transformation logic for opportunities
Context for reviewers
A lot of setup in this PR, a lot that can be reused in the subsequent PRs to add transformations for the other sets of tables. Tried to make sure those would require refactoring or pulling out implementation details by setting up utils like the timestamp conversions + initial query to the DB to fetch the transforming records.
As far as the implementation goes, determining what needs to be transformed is pretty simple - the
transformed_at
column is null. There is then a second column that says whether the record should be deleted or not. When we query the staging tables, we also join with the relevant table we'll transform to (opportunity in this case), that way we already have the "source" and "destination" records and just need to modify the destination record (or create it if its an insert).