Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #1745] Setup transformation process structure & transform an opportunity #1794

Merged
merged 21 commits into from
May 3, 2024

Conversation

chouinar
Copy link
Collaborator

@chouinar chouinar commented Apr 22, 2024

Summary

Fixes #1745

Time to review: 10 mins

Changes proposed

Setup the transformation process script structure

Implement some shared utilities that subsequent PRs will use

Implement the transformation logic for opportunities

Context for reviewers

A lot of setup in this PR, a lot that can be reused in the subsequent PRs to add transformations for the other sets of tables. Tried to make sure those would require refactoring or pulling out implementation details by setting up utils like the timestamp conversions + initial query to the DB to fetch the transforming records.

As far as the implementation goes, determining what needs to be transformed is pretty simple - the transformed_at column is null. There is then a second column that says whether the record should be deleted or not. When we query the staging tables, we also join with the relevant table we'll transform to (opportunity in this case), that way we already have the "source" and "destination" records and just need to modify the destination record (or create it if its an insert).

logger = logging.getLogger(__name__)


class TransformOracleData(Task):
Copy link
Collaborator Author

@chouinar chouinar Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesbursa - FYI I'm imagining that the script we'll have for the copy + transform will just look like:

def whatever_entrypoint_func(db_session):
     ExtractLoadOracleData(db_session).run()
     TransformOracleData(db_session).run()

jamesbursa
jamesbursa previously approved these changes May 2, 2024
Copy link
Collaborator

@jamesbursa jamesbursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

raise ValueError("Cannot delete opportunity as it does not exist")

self.increment(self.Metrics.TOTAL_RECORDS_DELETED)
self.db_session.delete(target_opportunity)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to review this once we have foreign keys pointing to it, and change it to setting is_deleted or similar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the cascades setup on the relationships, so its safe to delete with the foreign keys.

updated_timestamp = convert_est_timestamp_to_utc(source.last_upd_date) # type: ignore[attr-defined]

if created_timestamp is not None:
target.created_at = created_timestamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we have two pairs of columns. Keep the ones in TimestampMixin managed by SQLAlchemy with that meaning, and add two new columns for these upstream datetimes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest we set this up and use it? I set them this way so when someone fetches them from our API, they have the same values as the legacy system.

How would we want to map these for the API if we add two more columns? A user shouldn't need to be aware that we technically updated everything when we imported it (that's irrelevant to them).

api/src/util/datetime_util.py Outdated Show resolved Hide resolved
@chouinar chouinar merged commit 480f395 into main May 3, 2024
10 checks passed
@chouinar chouinar deleted the chouinar/1745-setup-transformations branch May 3, 2024 14:00
chouinar added a commit that referenced this pull request May 3, 2024
This is a follow-up to
#1794 - which it builds
upon.

## Summary
Fixes #1746

### Time to review: __10 mins__

## Changes proposed
Adds transformation logic for the assistance listing (formerly CFDA)
tables.

## Context for reviewers
The transformations are pretty uneventful, the only complexity is that
the legacy Oracle database doesn't have a foreign key between the
`TopportunityCfda` table and the `Topportunity` table and there are
~2300 orphaned cfda records that we wouldn't be able to import, so we
additionally need to validate that the opportunity exists when we try to
transform the data, and if not, we just mark it as "transformed" and do
nothing with it.

There is some basic work on relationships between the staging tables +
more factories for setting up the data which will be ongoing /
@jamesbursa is also looking into.

---------

Co-authored-by: nava-platform-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Setup the transformation process script
2 participants