Fix rate limit errors with Github API #431 #470

MichaelViveros · 2017-07-26T19:14:18Z

This fixes #431 by updating the code that downloads repo info from Github to avoid getting errors about hitting the Github API rate limit and abuse rate limit (see #431 for more info about those errors).

I created a personal Scaladex Github account and a machine Scaladex account and generated 1 Github API token for each account which adheres to Github's terms of service of using at most 2 tokens when hitting their API (1 token from a personal account and 1 token from a machine account). I will commit these tokens to the scaladex-credentials repo once this PR is deployed.

This PR is similar to the solution described in #431:

Instead of splitting up the step that downloads the Github info into 2 steps (as mentioned in my previous solution), the code uses one step like it did before and if it hits the rate limit, it pauses until the rate limit resets. The progress bar keeps printing the same thing while the thread is paused which is annoying but I couldn't find a way around this (there's an open issue with ctongfei/progressbar to be able to pause it)
The download step hits the Github API serially

The whole data/reStart github step takes around 1.5 hours now (~50 mins downloading, ~40 mins pausing)

MasseGuillaume · 2017-07-28T09:43:42Z

Can you run scalafmt ?
https://travis-ci.org/scalacenter/scaladex/builds/257859657#L517

MasseGuillaume · 2017-07-28T09:46:32Z

data/src/main/scala/ch.epfl.scala.index.data/download/PlayWsDownloader.scala

-
+      // use minimal concurrency to avoid abuse rate limit error which is triggered
+      // by making too many calls in a short period of time, see https://github.com/scalacenter/scaladex/issues/431
+      val parallelism = 1
      Source(toDownload).mapAsyncUnordered(parallelism) { item =>


you can use named parameters here: mapAsyncUnordered(parallelism = 1)

Can we set it to 4? We make all the effort to have a parallel code but set it sequential :S. There is probably a sweet spot, What kind of rate limit stats Github give us?

Done

And Github didn't give any stats about how the rate limit and abuse rate limits are triggered since if they shared that info people could use it to work around them. I tested with higher concurrency and got the abuse rate limit errors so I think parallelism = 4 is as high as we should go

MasseGuillaume · 2017-07-28T10:13:25Z

Can you create a PR against https://github.com/scalacenter/scaladex-credentials to add the tokens and passwords?

MasseGuillaume · 2017-07-28T10:16:41Z

data/src/main/scala/ch.epfl.scala.index.data/github/GithubDownload.scala

 import java.nio.charset.StandardCharsets
 import java.nio.file.{Files, Path}

 import com.typesafe.config.ConfigFactory
 import org.slf4j.LoggerFactory

+import scala.concurrent.ExecutionContext.Implicits.global


you should no import the global execution context.
import system.dispatcher at line 38.

more info here: http://doc.akka.io/docs/akka/current/scala/dispatchers.html

MasseGuillaume · 2017-07-30T12:18:24Z

It looks like this travis failed test is unrelated with this PR: https://travis-ci.org/scalacenter/scaladex/builds/258947737#L2094

MasseGuillaume added the in progress label Jul 26, 2017

MasseGuillaume reviewed Jul 28, 2017

View reviewed changes

MichaelViveros and others added 2 commits August 11, 2017 10:51

Fix rate limit errors with Github API #431

430a30e

scalafix is also a fix point library

936cc22

MasseGuillaume force-pushed the github-api-errors branch from cd9db41 to 936cc22 Compare August 11, 2017 08:59

MasseGuillaume merged commit 652b797 into master Aug 11, 2017

MichaelViveros deleted the github-api-errors branch August 11, 2017 13:42

MichaelViveros mentioned this pull request Aug 30, 2017

Add parallelism when hitting github api #493

Merged

MasseGuillaume restored the github-api-errors branch February 4, 2018 11:47

adpi2 deleted the github-api-errors branch March 26, 2020 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rate limit errors with Github API #431 #470

Fix rate limit errors with Github API #431 #470

MichaelViveros commented Jul 26, 2017 •

edited

Loading

MasseGuillaume commented Jul 28, 2017

MasseGuillaume Jul 28, 2017 •

edited

Loading

MichaelViveros Jul 29, 2017 •

edited

Loading

MasseGuillaume commented Jul 28, 2017

MasseGuillaume Jul 28, 2017

MasseGuillaume commented Jul 30, 2017

Fix rate limit errors with Github API #431 #470

Fix rate limit errors with Github API #431 #470

Conversation

MichaelViveros commented Jul 26, 2017 • edited Loading

MasseGuillaume commented Jul 28, 2017

MasseGuillaume Jul 28, 2017 • edited Loading

Choose a reason for hiding this comment

MichaelViveros Jul 29, 2017 • edited Loading

Choose a reason for hiding this comment

MasseGuillaume commented Jul 28, 2017

MasseGuillaume Jul 28, 2017

Choose a reason for hiding this comment

MasseGuillaume commented Jul 30, 2017

MichaelViveros commented Jul 26, 2017 •

edited

Loading

MasseGuillaume Jul 28, 2017 •

edited

Loading

MichaelViveros Jul 29, 2017 •

edited

Loading