Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only evaluate spambug on bugs filed by people without "editbugs" permissions, then check if it's better to train on all or just non-editbugs #2787

Open
Tracked by #4323
marco-c opened this issue Mar 3, 2022 · 10 comments · May be fixed by #3376

Comments

@marco-c
Copy link
Collaborator

marco-c commented Mar 3, 2022

The spambug model is only applied to bugs filed by people without "editbugs" permissions, so it makes sense to only evaluate it on these kinds of bugs and not all bugs.

For training, we can keep using all bugs, but we should check if the performance improves or worsens in case we only use bugs filed by non-editbugs people.

@jpangas
Copy link
Collaborator

jpangas commented Mar 9, 2023

Trying to wrap myself around this issue, and correct me if I'm wrong:
Is the issue supposed to investigate training performance after training spambug on all bugs vs training on bugs filed by non-editbugs people?
Currently, we train on all bugs. Is their a field that concerned with the "editbugs" permisions? @suhaibmujahid

@marco-c
Copy link
Collaborator Author

marco-c commented Mar 9, 2023

The goal of this issue is two fold:

  1. Only evaluate the model on bugs filed by people without editbugs permissions;
  2. Compare the performance of the model when we train on all bugs and when we train only on bugs filed by people with editbugs permissions.

@jpangas unfortunately this issue might be problematic for you to fix, as you'd need special permissions to see which users have editbugs permissions.

@suhaibmujahid
Copy link
Member

A workaround could be checking if the user's email belongs to a Mozilla employee or not (e.g., ends with @mozilla.com).

This will not catch all cases, but it could perform better in the context of the training dataset (item 2) since it will catch cases such a bug was filled with users who had editbugs permissions but not anymore.

In the context of item 1, depending on the editbugs permissions will show more realistic results.

@marco-c wdyt?

@jpangas
Copy link
Collaborator

jpangas commented Mar 10, 2023

2.Compare the performance of the model when we train on all bugs and when we train only on bugs filed by people with editbugs permissions.

in 2) Did you mean when we train on all bugs vs when we train on bugs filed by people with editbugs or you actually meant to say when we train on bugs filed by people with non-editbugs permissions only. (which we do currently).

# Skip bugs filed by Mozillians, since we are sure they are not spam.
if "@mozilla" in bug_data["creator"]:
continue

Currently we train only on bugs filed by people with non-mozillians (I assume these people have non-editbugs permissions.) This would be one of the ways we can test out performance when we include bugs filed by mozillians. (inline with what @suhaibmujahid has suggested.)

@marco-c
Copy link
Collaborator Author

marco-c commented Mar 10, 2023

Yes sorry, I meant train on bugs filed by people without editbugs permissions.
Currently we skip @mozilla.com only, the goal of this issue would be to check what changes if we also skip, for training, people with editbugs permissions (since we are sure they are not filing spam bugs).

For evaluation we should always skip them, as we are doing it in production and we want to measure exactly what happens on production.

@marco-c
Copy link
Collaborator Author

marco-c commented Mar 10, 2023

You can retrieve the list of users with editbugs by doing bugzilla.get_groups_users(["editbugs", "editbugs-team"]), the problem is that you can't test it yourself but need us to test it.

In the model, we could do something like (pseudocode):

try:
   userswitheditbugs = ...
except PermissionDenied:
  userswitheditbugs = set()

...

if "@mozilla" in creator or creator in userswitheditbugs:
    skip

P.S.: as part of this, we should also skip people with "@Softvision" in their email address.

@jpangas
Copy link
Collaborator

jpangas commented Mar 10, 2023

Great. Thanks, I'm on it and I will open a PR once everything is ready.

@suhaibmujahid
Copy link
Member

suhaibmujahid commented Mar 10, 2023

@jpangas we already have a feature to check if the user is a mozillian, but it is not used in the spambug model:

class is_mozillian(single_bug_feature):
name = "Reporter has a @mozilla email"
def __call__(self, bug, **kwargs):
return any(
bug["creator_detail"]["email"].endswith(domain)
for domain in ["@mozilla.com", "@mozilla.org"]
)

@jpangas
Copy link
Collaborator

jpangas commented Mar 10, 2023

Thanks @suhaibmujahid

@marco-c
Copy link
Collaborator Author

marco-c commented Aug 22, 2024

Depends on #4407.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants