Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ListCommits() has empty committer name for "verified" commits #1543

Closed
laurentsimon opened this issue Jan 27, 2022 · 22 comments · Fixed by #1558 or #1695
Closed

BUG: ListCommits() has empty committer name for "verified" commits #1543

laurentsimon opened this issue Jan 27, 2022 · 22 comments · Fixed by #1558 or #1695
Assignees
Labels
kind/bug Something isn't working

Comments

@laurentsimon
Copy link
Contributor

The repo client's code https://github.com/ossf/scorecard/blob/main/clients/githubrepo/graphql.go#L218 contains empty string for the committer name for certain commits.

Here are example commits that have an empty committer name:
d50788f
5f9fff3
7a6eb28
16c0d37

and more.

Looking at https://github.com/ossf/scorecard/commits/main, all these PRs have one thing in common: they are marked "verified" This commit was created on GitHub.com. Not sure why this changes the behavior though.

This blocks #1524

@azeemsgoogle assigning to you since you've played with graphql and you may know an easy fix.

@laurentsimon laurentsimon added the kind/bug Something isn't working label Jan 27, 2022
@azeemshaikh38
Copy link
Contributor

When I looked at the commits which returned empty committer - committer.name is GitHub and committer.email is [email protected]. Most likely these are commits which had auto-merge enabled. So technically, there was no committer.

What we can do is, pull-in either committer.name and committer.email and check if it was auto-merged by GitHub. Can send out a PR for this soon.

@laurentsimon
Copy link
Contributor Author

When I looked at the commits which returned empty committer - committer.name is GitHub and committer.email is [email protected]. Most likely these are commits which had auto-merge enabled. So technically, there was no committer.

You're saying Login is ``, but name is GitHub? If that's the case, just knowing it's empty means it's GitHub and I can add this as part of the check itself. I'm worried that `GitHub` and `[email protected]` may be spoof-able by committers via their git config. I suspect GitHub checks for it but I don't know. I'd say empty string means merged by GitHub and I can implement this logic in the check by simply setting the results of the committer's login to `GitHub`. Nobody can spoof this field since logins are unique. wdut?

What we can do is, pull-in either committer.name and committer.email and check if it was auto-merged by GitHub. Can send out a PR for this soon.

If you agree on the above, no need for a follow-up PR :-)

@azeemshaikh38
Copy link
Contributor

My concern here is we'll bake in the assumption that nil means auto submitted. This might not be true for non-GitHub clients.

So, we can explicitly return [email protected] as the User.Login and let Scorecard make the call here?

@laurentsimon
Copy link
Contributor Author

My concern here is we'll bake in the assumption that nil means auto submitted. This might not be true for non-GitHub clients.

good point.

So, we can explicitly return [email protected] as the User.Login and let Scorecard make the call here?

+1 let's do that! Let's set it to github since GitHub logins don't contain @?

@azeemshaikh38
Copy link
Contributor

+1 let's do that! Let's set it to github since GitHub logins don't contain @?

Wfm. My thinking in choosing noreply@ was - since it's not a valid username, anyone can reason that it must be some automated bot. Whereas github is a valid GitHub username - https://github.com/github. I'll leave it upto you.

@laurentsimon
Copy link
Contributor Author

+1 let's do that! Let's set it to github since GitHub logins don't contain @?

Wfm. My thinking in choosing noreply@ was - since it's not a valid username, anyone can reason that it must be some automated bot. Whereas github is a valid GitHub username - https://github.com/github. I'll leave it upto you.

it's the platform's name though, the one that's used to run as GitHub user. For example, github-actions are run under user github. Does not really matter in the end, both are fine I think

@laurentsimon
Copy link
Contributor Author

When I looked at the commits which returned empty committer - committer.name is GitHub and committer.email is [email protected]. Most likely these are commits which had auto-merge enabled. So technically, there was no committer.

I'm curious about auto-merge. I use auto-merge for all my PRs (by clicking the button in the UX) and all my merge commits say laurentsimon as merge committer. There may be something else special about these commits.

@azeemshaikh38
Copy link
Contributor

I was quite wrong here. Looks like you can fake committer.name and committer.email after all like you suspected @laurentsimon. Here are some example commits where committer.name and committer.email is non-empty but not github:

criticalmanufacturing/dev-tasks@a781f66
ahmedalatawi/ngx-treant-js@cb5d84d

Basically, if GitHub is unable to assign a GH userID to a commit, it takes the name and email from the Git commit. This is very easily spoof-able and someone can also commit using a [email protected] email or github name as well. So, question is how do we proceed here? Should we go back to the way things were?

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Jan 31, 2022

I was quite wrong here. Looks like you can fake committer.name and committer.email after all like you suspected @laurentsimon. Here are some example commits where committer.name and committer.email is non-empty but not github:

criticalmanufacturing/dev-tasks@a781f66 AhmedAlatawi/ngx-treant-js@cb5d84d

Basically, if GitHub is unable to assign a GH userID to a commit, it takes the name and email from the Git commit. This is very easily spoof-able and someone can also commit using a [email protected] email or github name as well. So, question is how do we proceed here? Should we go back to the way things were?

I've created a support ticket for it. @azeemshaikh38 do you want to file an issue on GitHub for them to update their doc? This behavior should be explained. In which context does GH not know who authored a merge request? Only auto-merge. or are there other corner cases?

@josepalafox appreciate any input from your team.
Context: users want to add policies on top of scorecard results, e.g. "code was reviewed and merged by users in list [X,Y,Z, ...]". For this we need a reliable way retrieve the username/login of a user. We found a corner case where, if we use auto-merge in the GitHub UX, the login field is null. The name and email fields are available but those can be spoofed via git settings. Is there a way to retrieve this information reliably?

@azeemshaikh38
Copy link
Contributor

Sent out #1576 to revert this change for now since it'll be erroring out for wrong reasons.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Feb 2, 2022

I was quite wrong here. Looks like you can fake committer.name and committer.email after all like you suspected @laurentsimon. Here are some example commits where committer.name and committer.email is non-empty but not github:
criticalmanufacturing/dev-tasks@a781f66 AhmedAlatawi/ngx-treant-js@cb5d84d
Basically, if GitHub is unable to assign a GH userID to a commit, it takes the name and email from the Git commit. This is very easily spoof-able and someone can also commit using a [email protected] email or github name as well. So, question is how do we proceed here? Should we go back to the way things were?

I've created a support ticket for it. @azeemshaikh38 do you want to file an issue on GitHub for them to update their doc? This behavior should be explained. In which context does GH not know who authored a merge request? Only auto-merge. or are there other corner cases?

the commits we get from graphql when querying only the commits don't have the login information (https://github.com/ossf/scorecard/blob/main/clients/githubrepo/graphql.go#L247). But it looks like the CommitMerge.Committer.Login is correct when querying the AssociatedPullRequests (https://github.com/ossf/scorecard/blob/main/clients/githubrepo/graphql.go#L212). By looking at the commit hash, we can match one against the other and recover the login: this may be a workaround to the problem.

@azeemshaikh38 please tell me if you can reproduce my results.

@azeemshaikh38
Copy link
Contributor

Hmm, no. The committer and author info in commit vs. associated PR is the same. Maybe you are comparing commitAuthor with committer?

image

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Feb 2, 2022

ah yes. But for a commit to the main branch, the committer is the author of the merge commit (if there is one). Right?
What is the author of a commit to main supposed to be, if not the merge commit's committer? ^^

Let's try for me to create a PR and for you to auto-merge it. I'd like to see what the results will be. That should validate whether the node's author (in your image) is the committer's author or not - sorry, this is hard to follow :-)

Maybe you have a link to the doc, which may be easier.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Feb 2, 2022

according to https://docs.github.com/en/graphql/reference/objects#commit, there's both an Author and a Committer. The committer should be the same as the mergeCommit's committer... and so long as one is populated, we'd be able to get the committer. That's my hypothesis

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Feb 9, 2022

The commits we don't have author for today are all authored by GH and have the "verified" keyword next to it, because they were signed. So one way to ensure it's a commit authored by GH is to query the signature of a commit and verify it's signed by GH's public key. Need to find out where the GH's public keys are accessible. There's an API for it I think. Let's see how to account for key rotation/compromises after that :-)

Fyi, there's also a autoMergeRequest.enabledBy.login which may be an interested field. Not sure it's necessary, just fyi.

@laurentsimon
Copy link
Contributor Author

follow-up. We don't need to verify the signature to start with, we can simply use
isValid and wasSignedByGitHub field to verify it's GitHub.

@laurentsimon
Copy link
Contributor Author

additional info:

REST API's Meta endpoint which provides GitHub's SSH key fingerprints: https://api.github.com/meta. Doc is https://docs.github.com/en/rest/reference/meta

Those are/should be the same fingerprints included in our documentation:

https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/githubs-ssh-key-fingerprints

More about commit signature verification here:

https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification

It is currently not possible to retrieve revoked/older keys :/

@azeemshaikh38
Copy link
Contributor

azeemshaikh38 commented Feb 11, 2022

Is looking for wasSignedByGitHub not enough? Seems like that boolean is only set to true if committer is GitHub and is null otherwise.

Maybe we could trust GitHub to have done the signature verification for us?

@laurentsimon
Copy link
Contributor Author

Yes wasSignedByGitHub should be enough for us. I added the other links as fyi in case we want to more in the future.

@azeemshaikh38
Copy link
Contributor

Makes sense, thanks for finding this! Do you want to send a PR for this?

@laurentsimon
Copy link
Contributor Author

unless someone else wants to do it, happy to.

@azeemshaikh38
Copy link
Contributor

Thanks, assigning this to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
2 participants