Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create "client" matching pipeline #1042

Merged
merged 20 commits into from
Jan 29, 2024
Merged

Create "client" matching pipeline #1042

merged 20 commits into from
Jan 29, 2024

Conversation

JonoYang
Copy link
Member

@JonoYang JonoYang commented Jan 10, 2024

This PR introduces a new pipeline matching, which uses the new matching API endpoint from purldb/matchcode for Package matching on a codebase. This pipeline takes in an archive input, creates CodebaseResources, sends a JSON scan to purldb/matchcode, then processes the matched package results when complete.

  • Tests are passing

@pombredanne pombredanne changed the title Matching pipeline Create "client" matching pipeline Jan 10, 2024
@JonoYang JonoYang force-pushed the matching-update branch 2 times, most recently from 78a04e8 to 9cc1a6e Compare January 10, 2024 23:53
@JonoYang
Copy link
Member Author

The pipeline has been renamed from matching to match_codebase_against_purldb. I have also modified the pipeline to be an addon pipeline.

@JonoYang JonoYang requested a review from tdruez January 11, 2024 00:04
scanpipe/pipelines/match_codebase_to_purldb.py Outdated Show resolved Hide resolved
scanpipe/pipelines/match_codebase_to_purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
@tdruez
Copy link
Contributor

tdruez commented Jan 11, 2024

@JonoYang @pombredanne I'm wondering if we should split the logic into pipeline steps or have a single step:

Single step:

  • match_to_purldb

OR

Multiple step:

  • send_project_json_to_matchcode
  • poll_matching_results
  • create_packages_from_results

@tdruez
Copy link
Contributor

tdruez commented Jan 11, 2024

Also, missing CHANGELOG entry and adding the new pipeline in the docs.

@JonoYang
Copy link
Member Author

@JonoYang @pombredanne I'm wondering if we should split the logic into pipeline steps or have a single step:

Single step:

  • match_to_purldb

OR

Multiple step:

  • send_project_json_to_matchcode
  • poll_matching_results
  • create_packages_from_results

I ended up splitting match_to_purldb into multiple steps for easier testing and being able to see how long we spend on each step.

Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonoYang the new steps layout looks good. See my suggestions for improvements.

scanpipe/pipelines/match_to_purldb.py Show resolved Hide resolved

def poll_matching_results(self):
"""Wait until the match results are ready by polling the match run status."""
self.match_results = purldb.poll_until_success(self.run_url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this works as the poll_until_success expects 2 arguments run_url and results_url.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a unit test for the whole pipeline to catch this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an existing test that tests an entire pipeline? I just see tests where we test the individual pipeline steps.

scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
response = request_get(run_url)
if response:
status = response["status"]
if status == "success":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there's a failure (status != "success") on the PurlDB side, how do we handle it on the pipeline side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdruez My first thought would be to raise an Exception and stop the pipeline there and print the log message from the run response. Do you have a suggestion on what should be done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate the results_url = project_url + "results/" return request_get(results_url) logic from the polling function to simplify this. We can better handle report potential polling issues this way.

Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonoYang I think we're almost ready to merge this, see my comments for a few improvements.
Also, could you fix the merge conflicts?

scanpipe/pipelines/match_to_purldb.py Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Show resolved Hide resolved
@JonoYang JonoYang force-pushed the matching-update branch 2 times, most recently from 2d9bcd7 to a577df2 Compare January 27, 2024 01:04
    * Update docstrings

Signed-off-by: Jono Yang <[email protected]>
    * Update match_to_purldb description

Signed-off-by: Jono Yang <[email protected]>
    * Update poll_until_success to return True when a run is successful, instead of returning the match results

Signed-off-by: Jono Yang <[email protected]>
    * Create new test cases for poll_until_success

Signed-off-by: Jono Yang <[email protected]>
@JonoYang JonoYang force-pushed the matching-update branch 2 times, most recently from febbc6b to 8d23184 Compare January 27, 2024 02:18
    * Fix indent in send_project_json_to_matchcode
    * Update docstring

Signed-off-by: Jono Yang <[email protected]>
@tdruez tdruez merged commit bb98875 into main Jan 29, 2024
7 checks passed
@tdruez tdruez deleted the matching-update branch January 29, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants