Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of repository and integrity meta analysis #846

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nscuro
Copy link
Member

@nscuro nscuro commented Aug 11, 2024

Description

Improves efficiency of repository and integrity meta analysis.

  • Removes "preparation" logic of IntegrityMetaComponent records in BomUploadProcessingTask. Preparing records one-by-one is too resource-intensive, but doing it in batches has a too high potential for deadlocks since the table is written to by many threads in parallel.
  • Removes the differentiation between fetching latest version, fetching integrity metadata, and fetching both. The amount of bookkeeping necessary to enable this sadly makes the entire undertaking inefficient.
  • Refactors RepositoryMetaResultProcessor to consume records in batches. Since incoming records are keyed by PURL coordinates, we can safely perform batch operations in the database without the risk of running into deadlocks.

Warning

This is a breaking change:

  • /api/v1/component/integritymetadata is removed
  • /api/v1/component/integritycheckstatus is removed

The removed endpoints are not used by the frontend. Integrity data is already part of the /api/v1/component response, making dedicated endpoints not a requirement right now.

Addressed Issue

Closes DependencyTrack/hyades#1306

Additional Details

Hyades PR: DependencyTrack/hyades#1446

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have updated the migration changelog accordingly
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@nscuro nscuro added the enhancement New feature or request label Aug 11, 2024
@nscuro nscuro added this to the 5.6.0 milestone Aug 11, 2024
@nscuro nscuro force-pushed the issue-1306-fix branch 5 times, most recently from 0e6a977 to c210703 Compare August 12, 2024 15:00
Copy link

codacy-production bot commented Aug 12, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.22% (target: -1.00%) 71.15% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (c5dcf4d) 22242 18392 82.69%
Head commit (acc711d) 21974 (-268) 18122 (-270) 82.47% (-0.22%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#846) 260 185 71.15%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@nscuro
Copy link
Member Author

nscuro commented Aug 12, 2024

Trying to figure out how we can still populate the REPOSITORY_META_COMPONENT.LAST_CHECK and INTEGRITY_META_COMPONENT.LAST_FETCH columns, without spamming the database with pointless UPDATE queries.

Ideally we wouldn't change any rows if their content didn't meaningfully change, but if we don't do it we can't keep LAST_CHECK and LAST_FETCH accurate.

@nscuro nscuro force-pushed the issue-1306-fix branch 4 times, most recently from bf70534 to ad50308 Compare August 14, 2024 15:26
@nscuro nscuro modified the milestones: 5.6.0, 5.7.0 Aug 22, 2024
@nscuro nscuro force-pushed the issue-1306-fix branch 2 times, most recently from a3631ca to 2224e95 Compare September 24, 2024 14:46
@nscuro nscuro force-pushed the issue-1306-fix branch 2 times, most recently from 671ca7b to 5835e57 Compare October 19, 2024 14:33
* Removes "preparation" logic of `IntegrityMetaComponent` records in `BomUploadProcessingTask`. Preparing records one-by-one is too resource-intensive, but doing it in batches has a too high potential for deadlocks since the table is written to by many threads in parallel.
* Refactors `RepositoryMetaResultProcessor` to consume records in batches. Since incoming records are keyed by PURL coordinates, we can safely perform batch operations in the database without the risk of running into deadlocks.

Signed-off-by: nscuro <[email protected]>

# Conflicts:
#	src/main/resources/migration/changelog-v5.6.0.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression: Preparing IntegrityMetaComponents after BOM processing takes too long
1 participant