-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub Project LLVM Failing and not indexing #1093
Comments
From the crawler log, it seems like there is an unexpected error in the GraphQL response, and the process is likely stuck retrying, which would explain why there is no data stored. The error is:
The crawler needs a fix to handle that error (e.g. by updating the query/schema or reporting this issue to the github api bugtracker). Though we could also skip such errors (using |
That could also be a transient error from the Github API ("SERVICE_UNAVAILABLE"). Does it still happen ? As we request Changes by bulk, using |
Seem to be getting less of those errors today but still nothing being stored. Although doing a manual hit to the graphql I would get an error trying to get 3 or more PRs. Would it be possible to have that number configurable in the config? If I knew Haskell I would give it a go, (would normally try and learn to do it myself, but in the middle of 3 months of training and my head becomes a shed) |
I just did a test with llvm-project repository on github, and I don't have any issue fetching by bulk of 25 PRs. The crawler reduces by itself the amount of PRs it attempts to fetch when it encounters server side timeout. With GitHub this could happen a lot when PRs are with plenty of comments and data. For llvm it does not seems to be the case. Regarding the other error, I see it, I don't have solution for now :( I need some time to experiment solutions for that issue. |
I've run the query in GitHub Explorer with failures as below. So I've raised a question in the community for help: https://github.com/orgs/community/discussions/79021 GetProjectPullRequests_graphql.txt I'm now getting some data compared to the weekend but not everything :( "errors": [
{
"type": "SERVICE_UNAVAILABLE",
"path": [
"repository",
"pullRequests",
"nodes",
0,
"commits",
"nodes",
0,
"commit",
"additions"
],
"locations": [
{
"line": 84,
"column": 15
}
],
"message": "The additions count for this commit is unavailable."
} |
Still playing with the query, taking the lines out of commits/commit has no errors now |
Thank you very much for the feedback. It looks like working around this issue from monocle side is not going to be easy as we would need to enable an extra query (without the additions/deletions request). |
I've found an offending pull request: llvm/llvm-project#74806 Must have worked at some point for the review to have happened. |
The pull request query is defined in this module, and the parameters are documented here (search for As discussed with @morucci, monocle may be improved by reducing the query size when such an unexpected error happens, and when the size is one, then perhaps we could skip the offending items by using the provided endCursor in the error body. The crawler indexes items by chunk of 500, so you should be getting some data by increasing the Thanks again for investigating that issue, it's a great feedback. |
Good catch, this deserves an ADR. You can learn more about this choice in: https://changemetrics.io/posts/2021-06-01-haskell-use-cases.html . The main reason being that the language is statistically typed with an advanced type system.
The schema that pulls additions/deletions is shared by two queries and it is defined here. If you remove these attributes, then the build will fails in the PullRequests and UserPullRequests modules, and you can replace the missing term with
I would recommend https://learn-haskell.blog/ |
@bigmadkev, the related PR is merged. We believe that the indexing issue is fixed/mitigated so please let us know if we can close that issue. |
Will clear my cache and let it run and see if it's able to get to it's current state (just missing 1 pull request out of 9k+) Cheers |
My checking github query had the wrong date in for updated since so actually all the data is there whoop whoop! Thank you so for much for the fix! |
With the following config I'm getting a lot of errors and left it to run over night with no data stored.
Using a fine grain token with the settings for private repos as the readme suggests.
I've attached the logs from the doker containers
api.log
crawler.log
elastic.log
Any help / pointers really apprecaited.
The text was updated successfully, but these errors were encountered: