Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: disable webhooks after N failed builds in a row #9690

Open
humitos opened this issue Oct 24, 2022 · 7 comments
Open

Build: disable webhooks after N failed builds in a row #9690

humitos opened this issue Oct 24, 2022 · 7 comments
Labels
Feature New feature Needed: design decision A core team decision is required

Comments

@humitos
Copy link
Member

humitos commented Oct 24, 2022

Analyzing the data we have, I've found that we have a bunch of projects that have been triggering builds where all of them have failed. During months. This consumes resources in our side, but also degrade the UX for valid projects since they have to wait for a build that we already know it's gonna fail, before their project's build is taken.

So, to reduce projects in this scenario, we are thinking about "auto-disable webhooks on projects that have N failed builds in a row" and communicate their maintainers. To re-enable the webhook, they would have to go to the project's admin page and re-activate/re-configure/re-add the webhook.

This is the metabase question for future reference: https://ethicalads.metabaseapp.com/question/309-project-with-all-failed-builds-in-the-last-3-months. It shows the amount of builds and minutes consumed per project.

@humitos humitos added Feature New feature Needed: design decision A core team decision is required labels Oct 24, 2022
@humitos humitos moved this to Planned in 📍Roadmap Oct 24, 2022
@agjohnson
Copy link
Contributor

Also, it would be great to have a maintainers site notification specific to this that goes out, so we can re-use this pattern when soft-disabling a project. A hard disable (setting Project.skip = True) can be reserved for abusive/spam projects, but that "notification" pattern is quite broken for normal users.

Or perhaps I'm describing two potential notifications -- a specific notification and a generic notification?

  • "Your project hasn't built successfully for 50 consecutive builds, we've disabled your project from automatically building"
  • "Your project has a problem, we've disabled your project from automatically building"

@humitos
Copy link
Member Author

humitos commented Oct 25, 2022

Yeah, I'm fine with better notifications. Maybe we want to make that part of the work on #9279 and #3399

We need to probably set two different thresholds here:

  1. N consecutive builds on branch/tags (lower, maybe 25) and disable the webhook completely
  2. N consecutive builds on PR builds (upper, maybe 50) and only disable PR building

I'm trying to say that "builds failing on PRs are more acceptable than failing on branch/tags and should be handled differently"

@agjohnson
Copy link
Contributor

Yeah, great point on PR builds. Maybe we are strictly only concerned with failed active versions?

I agree we can rethink notifications in a larger PR. If we add something here it will be using the new notification pattern anyways. Some of the old patterns can eventually be culled, and just adding a new mechanism for disabling projects here would be enough to start that process.

@humitos
Copy link
Member Author

humitos commented Oct 26, 2022

@agjohnson

Maybe we are strictly only concerned with failed active versions?

It makes sense to me, yeah.

@humitos
Copy link
Member Author

humitos commented Jun 8, 2023

I'd like to think about prioritizing this feature if possible because this will help us a lot to know "how many active/valid non-spam projects do we have in our platform" in an easier way. This data will be useful when migrating/deprecating things as we are doing with "building without a config file" since we will know "how many of the active/valid projects have already migrated".

Many non-spam projects will not migrate to the new configuration file just because they don't need it or have to. Mainly projects that are archived or where created just for testing/development/educational purposes, for example. I assume we have a lot of them.

The work required that I see here is:

@agjohnson
Copy link
Contributor

Because taking automated actions on projects has been hard to get right -- spam banning, build retry -- maybe we should start fairly conservatively here. I think I'm comfortable saying that any project who has a default version that has 25 consecutive failed builds is probably not monitoring their project.

We also don't have to take any automated action yet, we could start with just a notification or drip campaign.

At least disabling the webhook is non-destructive though, I think that is probably the key for us. Deleting the webhook is more destructive and I'd want to make sure we're not making more work for us here.

In the end, this is mostly a cost saving measure for instance cost, but that is not a strong concern right now. I'd happily increase our instance costs and pay for more instances if it frees up our time or makes for less headaches for us.

@humitos
Copy link
Member Author

humitos commented Mar 12, 2024

We also don't have to take any automated action yet, we could start with just a notification or drip campaign.

We can trigger a notification and attach it to the Project using the new system now 💯

At least disabling the webhook is non-destructive though, I think that is probably the key for us. Deleting the webhook is more destructive and I'd want to make sure we're not making more work for us here.

Sounds good to me. I think adding Integration.enabled(bool, default=True) field here and expose it in the form would be enough to implement this feature. This allows us to mark it as enabled=False via code and let users to re-enable it via the WebUI using the form.

In the end, this is mostly a cost saving measure for instance cost, but that is not a strong concern right now. I'd happily increase our instance costs and pay for more instances if it frees up our time or makes for less headaches for us.

This is true. However, "disabling projects their authors are not monitoring/using" seems like a good way to remove them when getting metrics from our database --which is useful to make decisions based on data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature Needed: design decision A core team decision is required
Projects
Status: Planned
Development

No branches or pull requests

2 participants