Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rate limit errors with Github API #431 #470

Merged
merged 2 commits into from
Aug 11, 2017
Merged

Conversation

MichaelViveros
Copy link
Contributor

@MichaelViveros MichaelViveros commented Jul 26, 2017

@heathermiller @MasseGuillaume

This fixes #431 by updating the code that downloads repo info from Github to avoid getting errors about hitting the Github API rate limit and abuse rate limit (see #431 for more info about those errors).

I created a personal Scaladex Github account and a machine Scaladex account and generated 1 Github API token for each account which adheres to Github's terms of service of using at most 2 tokens when hitting their API (1 token from a personal account and 1 token from a machine account). I will commit these tokens to the scaladex-credentials repo once this PR is deployed.

This PR is similar to the solution described in #431:

  • Instead of splitting up the step that downloads the Github info into 2 steps (as mentioned in my previous solution), the code uses one step like it did before and if it hits the rate limit, it pauses until the rate limit resets. The progress bar keeps printing the same thing while the thread is paused which is annoying but I couldn't find a way around this (there's an open issue with ctongfei/progressbar to be able to pause it)
  • The download step hits the Github API serially

The whole data/reStart github step takes around 1.5 hours now (~50 mins downloading, ~40 mins pausing)

@MasseGuillaume
Copy link
Contributor


// use minimal concurrency to avoid abuse rate limit error which is triggered
// by making too many calls in a short period of time, see https://github.com/scalacenter/scaladex/issues/431
val parallelism = 1
Source(toDownload).mapAsyncUnordered(parallelism) { item =>
Copy link
Contributor

@MasseGuillaume MasseGuillaume Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use named parameters here: mapAsyncUnordered(parallelism = 1)

Can we set it to 4? We make all the effort to have a parallel code but set it sequential :S. There is probably a sweet spot, What kind of rate limit stats Github give us?

Copy link
Contributor Author

@MichaelViveros MichaelViveros Jul 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

And Github didn't give any stats about how the rate limit and abuse rate limits are triggered since if they shared that info people could use it to work around them. I tested with higher concurrency and got the abuse rate limit errors so I think parallelism = 4 is as high as we should go

@MasseGuillaume
Copy link
Contributor

Can you create a PR against https://github.com/scalacenter/scaladex-credentials to add the tokens and passwords?

import java.nio.charset.StandardCharsets
import java.nio.file.{Files, Path}

import com.typesafe.config.ConfigFactory
import org.slf4j.LoggerFactory

import scala.concurrent.ExecutionContext.Implicits.global
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should no import the global execution context.
import system.dispatcher at line 38.

more info here: http://doc.akka.io/docs/akka/current/scala/dispatchers.html

@MasseGuillaume
Copy link
Contributor

It looks like this travis failed test is unrelated with this PR: https://travis-ci.org/scalacenter/scaladex/builds/258947737#L2094

@MasseGuillaume MasseGuillaume merged commit 652b797 into master Aug 11, 2017
@MichaelViveros MichaelViveros deleted the github-api-errors branch August 11, 2017 13:42
@MasseGuillaume MasseGuillaume restored the github-api-errors branch February 4, 2018 11:47
@adpi2 adpi2 deleted the github-api-errors branch March 26, 2020 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Github API Errors
2 participants