Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OL facets - PR3 - migrate data to facet tables #2359

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Jan 13, 2023

Signed-off-by: Pawel Leszczynski [email protected]

Problem

The last PR of Openlineage facets' PRs. It contains a migration that backfills newly created facets' tables (job_facets, dataset_facets and run_facets) with data contained within lineage_events tables. The migration has to be done manually for users with more than 100K lineage_events stored in Marquez.

Closes: #ISSUE-NUMBER

Solution

  • Prepare migration script in Java that backfills facets' tables.
  • Store in database migration_lock to be able to pause/continue migration and know if it has finished.
  • At the end of successful migration, replace existing job_facets_vew, dataset_facets_view and run_facets_view to point at newly created tables. This will make application logic make use of new tables.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added api API layer changes docs labels Jan 13, 2023
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR2-read-data-from-views branch 4 times, most recently from 0d322ef to bde93bd Compare January 13, 2023 13:29
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR3-data-migration-to-facet-tables branch from 117fdba to 846437a Compare January 13, 2023 13:59
@codecov
Copy link

codecov bot commented Jan 13, 2023

Codecov Report

Merging #2359 (98435c5) into ol-facets/PR2-read-data-from-views (45ecf15) will increase coverage by 0.26%.
The diff coverage is 87.01%.

@@                           Coverage Diff                            @@
##             ol-facets/PR2-read-data-from-views    #2359      +/-   ##
========================================================================
+ Coverage                                 76.85%   77.11%   +0.26%     
- Complexity                                 1208     1234      +26     
========================================================================
  Files                                       227      228       +1     
  Lines                                      5495     5572      +77     
  Branches                                    444      447       +3     
========================================================================
+ Hits                                       4223     4297      +74     
- Misses                                      774      775       +1     
- Partials                                    498      500       +2     
Impacted Files Coverage Δ
api/src/main/java/marquez/MarquezApp.java 63.75% <0.00%> (-2.49%) ⬇️
...a/marquez/db/migrations/V57_1__BackfillFacets.java 90.54% <90.54%> (ø)
api/src/main/java/marquez/db/RunFacetsDao.java 80.95% <0.00%> (+4.76%) ⬆️
...main/java/marquez/service/models/LineageEvent.java 94.52% <0.00%> (+8.21%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR3-data-migration-to-facet-tables branch from 846437a to 951d60a Compare January 16, 2023 13:59
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review January 16, 2023 14:00
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR3-data-migration-to-facet-tables branch from 951d60a to b84b4b0 Compare January 17, 2023 07:22
@pawel-big-lebowski pawel-big-lebowski mentioned this pull request Jan 17, 2023
7 tasks
@pawel-big-lebowski pawel-big-lebowski self-assigned this Jan 17, 2023
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR2-read-data-from-views branch 6 times, most recently from 80116ab to 1d1fec4 Compare January 19, 2023 08:50
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR3-data-migration-to-facet-tables branch from b84b4b0 to 83929cd Compare January 19, 2023 09:10
Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise 🚀 💯

@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR3-data-migration-to-facet-tables branch 2 times, most recently from d77e6f4 to 8c69e78 Compare January 26, 2023 09:32
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR2-read-data-from-views branch 2 times, most recently from 298b4e1 to b4b1903 Compare January 27, 2023 07:07
@pawel-big-lebowski pawel-big-lebowski force-pushed the ol-facets/PR3-data-migration-to-facet-tables branch from 8c69e78 to 98435c5 Compare January 27, 2023 13:30
@wslulciuc wslulciuc merged commit 88b8c7d into ol-facets/PR2-read-data-from-views Jan 31, 2023
@wslulciuc wslulciuc deleted the ol-facets/PR3-data-migration-to-facet-tables branch January 31, 2023 08:59
wslulciuc added a commit that referenced this pull request Jan 31, 2023
#2355)

* OL facets - PR2 - read facets from views pointing to lineage_events table

Signed-off-by: Pawel Leszczynski <[email protected]>

* OL facets - PR3 - migrate data to facet tables (#2359)

Signed-off-by: Pawel Leszczynski <[email protected]>

---------

Signed-off-by: Pawel Leszczynski <[email protected]>
Co-authored-by: Willy Lulciuc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants