Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question][DORA] Commits before first recorded deployments misattributed to random deployment #8249

Open
qpawelc opened this issue Dec 18, 2024 · 2 comments
Labels
devops Something about CI/CD (devops) type/question This issue is a question

Comments

@qpawelc
Copy link

qpawelc commented Dec 18, 2024

Question

Hey all!

I am using Devlake v1.0.1 and I am currently ingesting the commits from a project with the Gitlab integration, and then ingesting the deployments for the project via webhook (from Spinnaker). I do not have the all-time deployment data for my project, only the last couple of months. After I run the collect data job, I noticed that in the project_pr_metrics table, all of the commits for the project that occurred before I onboarded to Spinnaker seem to get associated with a "seemingly random" deployment.

It may be easier to visualize this. Take a look at this screenshot. The bottom blue dots represent commits. The top dots represent deployments. The lines represent the commit-to-deployment relationship in the project_pr_metrics table. As you can see, all of my commits that occurred before the beginning of the my deployment data seem to ALL become associated with a single deployment, causing certain metrics to be very skewed.

Does anyone have any advice on how I can:

  • Debug why devlake chooses to associated these commits to this deploy? For example, why wouldnt it be the first deploy? Or preferably, not create associations between commits and deployments where commits occurred before the first deployment?
  • Work around this issue? One idea I have is to set the blueprint for each project only to ingest data after the second successful production deployment.

Thanks for your time and support!

Screenshots

image

Additional context

Its worth noting that I have found a similar issue. In this comment @nicolavolpini mentioned that "still shows several PRs associated to the same deployment webhook" but no additional context was posted on how this issue could be resolved.

@qpawelc qpawelc added the type/question This issue is a question label Dec 18, 2024
@qpawelc qpawelc changed the title [Question][Module Name] Question title [Question][Module Name] Commits before deployments misattributed to random deployment Dec 18, 2024
@qpawelc qpawelc changed the title [Question][Module Name] Commits before deployments misattributed to random deployment [Question][DORA] Commits before deployments misattributed to random deployment Dec 18, 2024
@qpawelc qpawelc changed the title [Question][DORA] Commits before deployments misattributed to random deployment [Question][DORA] Commits before first recorded deployments misattributed to random deployment Dec 18, 2024
@dosubot dosubot bot added the devops Something about CI/CD (devops) label Dec 18, 2024
Copy link

dosubot bot commented Dec 18, 2024

Hey there, @qpawelc! 👋 I'm here to help you out with any bugs, questions, or even becoming a contributor. Let's tackle this together!

To address the issue of commits being associated with a "seemingly random" deployment in DevLake, you can consider the following steps:

  1. Debugging the Association Logic: DevLake associates commits with deployments by finding the commits difference between two consecutive deployments and connecting PR/MR and the commits difference through merge_commit or pr_commit [1]. If historical deployment data is missing, DevLake might default to associating older commits with the earliest available deployment. Unfortunately, specific fallback mechanisms for missing deployment data are not documented, so you might need to explore the codebase or logs to understand the exact logic being applied.

  2. Workaround with Blueprint Configuration: You can configure the data ingestion to start after a specific event, such as a deployment, by using the SyncPolicy component. This allows you to set a timeAfter property to limit data collection to a specific time range [2]. This approach can help you avoid associating older commits with recent deployments by only ingesting data after a certain point in time.

  3. Review Related Issues: Since you've found a similar issue in the DevLake repository, it might be helpful to follow up on that issue or any related discussions to see if there have been updates or solutions provided by the community.

By implementing these steps, you should be able to better manage the commit-to-deployment associations and mitigate the skewed metrics issue.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@petkostas
Copy link

Looks similar to #8188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops Something about CI/CD (devops) type/question This issue is a question
Projects
None yet
Development

No branches or pull requests

2 participants