-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rate limit errors with Github API #431 #470
Conversation
Can you run scalafmt ? |
|
||
// use minimal concurrency to avoid abuse rate limit error which is triggered | ||
// by making too many calls in a short period of time, see https://github.com/scalacenter/scaladex/issues/431 | ||
val parallelism = 1 | ||
Source(toDownload).mapAsyncUnordered(parallelism) { item => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use named parameters here: mapAsyncUnordered(parallelism = 1)
Can we set it to 4? We make all the effort to have a parallel code but set it sequential :S. There is probably a sweet spot, What kind of rate limit stats Github give us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Github didn't give any stats about how the rate limit and abuse rate limits are triggered since if they shared that info people could use it to work around them. I tested with higher concurrency and got the abuse rate limit errors so I think parallelism = 4
is as high as we should go
Can you create a PR against https://github.com/scalacenter/scaladex-credentials to add the tokens and passwords? |
import java.nio.charset.StandardCharsets | ||
import java.nio.file.{Files, Path} | ||
|
||
import com.typesafe.config.ConfigFactory | ||
import org.slf4j.LoggerFactory | ||
|
||
import scala.concurrent.ExecutionContext.Implicits.global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should no import the global execution context.
import system.dispatcher at line 38.
more info here: http://doc.akka.io/docs/akka/current/scala/dispatchers.html
It looks like this travis failed test is unrelated with this PR: https://travis-ci.org/scalacenter/scaladex/builds/258947737#L2094 |
cd9db41
to
936cc22
Compare
@heathermiller @MasseGuillaume
This fixes #431 by updating the code that downloads repo info from Github to avoid getting errors about hitting the Github API rate limit and abuse rate limit (see #431 for more info about those errors).
I created a personal Scaladex Github account and a machine Scaladex account and generated 1 Github API token for each account which adheres to Github's terms of service of using at most 2 tokens when hitting their API (1 token from a personal account and 1 token from a machine account). I will commit these tokens to the scaladex-credentials repo once this PR is deployed.
This PR is similar to the solution described in #431:
ctongfei/progressbar
to be able to pause it)The whole
data/reStart github
step takes around 1.5 hours now (~50 mins downloading, ~40 mins pausing)