Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry count for validate links task #490

Merged
merged 3 commits into from
Jun 10, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions core/src/main/scala/com/lightbend/paradox/ParadoxProcessor.scala
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ package com.lightbend.paradox
import com.lightbend.paradox.template.PageTemplate
import com.lightbend.paradox.markdown._
import com.lightbend.paradox.tree.Tree.{ Forest, Location }

import java.io.{ File, FileOutputStream, OutputStreamWriter }
import java.nio.charset.StandardCharsets

import org.jsoup.Jsoup
import org.jsoup.{ Connection, Jsoup }
import org.jsoup.nodes.Document
import org.pegdown.ast._

Expand Down Expand Up @@ -108,6 +108,7 @@ class ParadoxProcessor(reader: Reader = new Reader, writer: Writer = new Writer,
groups: Map[String, Seq[String]],
properties: Map[String, String],
ignorePaths: List[Regex],
retryCount: Int,
validateAbsolute: Boolean,
logger: ParadoxLogger): Int = {

Expand Down Expand Up @@ -151,7 +152,7 @@ class ParadoxProcessor(reader: Reader = new Reader, writer: Writer = new Writer,
reportErrorOnSources(errorCollector, c.allSources)(s"Could not find path [${uri.getPath}] in site")
}
case absolute if validateAbsolute =>
validateExternalLink(absolute, errorCollector, logger)
validateExternalLink(absolute, retryCount, errorCollector, logger)
case _ =>
// Ignore
}
Expand All @@ -160,20 +161,29 @@ class ParadoxProcessor(reader: Reader = new Reader, writer: Writer = new Writer,
errorCollector.errorCount
}

private def validateExternalLink(capturedLink: CapturedLink, errorContext: ErrorContext, logger: ParadoxLogger) = {
private def validateExternalLink(capturedLink: CapturedLink, retryCount: Int, errorContext: ErrorContext, logger: ParadoxLogger) = {
logger.info(s"Validating external link: ${capturedLink.link}")

def reportError = reportErrorOnSources(errorContext, capturedLink.allSources)(_)
val url = capturedLink.link.toString

try {
val response = Jsoup.connect(url)
def tryConnect = Jsoup.connect(url)
.userAgent("Paradox Link Validator <https://github.com/lightbend/paradox>")
.followRedirects(false)
.ignoreHttpErrors(true)
.ignoreContentType(true)
.execute()

@tailrec
def validateWithRetries(retryCount: Int): Connection.Response = {
val res = tryConnect
if (retryCount == 0 || res.statusCode() == 200) res
else validateWithRetries(retryCount - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it likely be the status codes that signals temporary unavailable rather than not being able to connect for example (so perhaps throws and is caught further down rather than retried right now)?

Also - should it be more selective with what error codes it retries for (at least some are permanent or likely permanent 400-407 for example), and perhaps back off a little since immediate retry if the server is overloaded will likely fail again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense. I was specifically targeting use case where site is slow and jsoup times out. Was hoping at least one of the retries will succeed. Was planning to keep it simple unless people file issues/enhancements 😉

I ll look into status codes.

Do you think fist class backoff is necessary? I can definitely try simplest backoff solution.
Have not looked at code completely, but does link check happen in parallel? If not, backoff will introduce lot of delay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if it is mostly about timeouts, would jsoup still return a response with a status code rather than throw an exception?

I don't think backoff is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it should throw exception if endpoint does not return any response.
In my case, some sites were returning 503 for very short period of time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about explicitly checking for 503 (and perhaps 502, 503 as well, maybe 500) + connection timeout throw and retry only for those rather than across all other (less likely to be temporary) status codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am planning to submit that over the weekend. (Maybe earlier If i get the opportunity)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am retrying following status codes: 500, 502, 503, 504 (included additional 504 - Gateway Timeout) and SocketTimeoutException

}

val response = validateWithRetries(retryCount)

// jsoup doesn't offer any simple way to clean up, the only way to close is to get the body stream and close it,
// but if you've already read the response body, that will throw an exception, and there's no way to check if
// you've already tried to read the response body, so we can't do that in a finally block, we have to do it
Expand Down
8 changes: 8 additions & 0 deletions docs/src/main/paradox/validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,11 @@ paradoxValidationIgnorePaths ++= Seq(
"/docs/version/(?!latest).*"
)
```

## Retrying links check

`paradoxValidateLinksRetryCount` setting allows retrying link check for the provided number of times in case link returns non success response code i.e., other than 200 status code.
johanandren marked this conversation as resolved.
Show resolved Hide resolved

```scala
paradoxValidateLinksRetryCount := 3 // retries link check 3 times in case of non 200 response code
```
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ trait ParadoxKeys {
val paradoxBrowse = taskKey[Unit]("Open the docs in the default browser")
val paradoxValidateInternalLinks = taskKey[Unit]("Validate internal, non ref paradox links.")
val paradoxValidateLinks = taskKey[Unit]("Validate all non ref paradox links.")
val paradoxValidateLinksRetryCount = taskKey[Int]("Number of retries for validate links task.")
val paradoxValidationIgnorePaths = settingKey[List[Regex]]("List of regular expressions to apply to paths to determine if they should be ignored.")
val paradoxValidationSiteBasePath = settingKey[Option[String]]("The base path that the documentation is deployed to, allows validating links on the docs site that are outside of the documentation root tree")
val paradoxSingle = taskKey[File]("Build the single page HTML Paradox site")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ object ParadoxPlugin extends AutoPlugin {
paradoxLeadingBreadcrumbs := Nil,
paradoxGroups := Map.empty,
libraryDependencies ++= paradoxTheme.value.toSeq map (_ % ParadoxTheme),
paradoxValidateLinksRetryCount := 0,
paradoxValidationIgnorePaths := List("http://localhost.*".r),
paradoxValidationSiteBasePath := None
)
Expand Down Expand Up @@ -378,6 +379,7 @@ object ParadoxPlugin extends AutoPlugin {
paradoxGroups.value,
paradoxProperties.value,
paradoxValidationIgnorePaths.value,
paradoxValidateLinksRetryCount.value,
validateAbsolute,
new SbtParadoxLogger(strms.log)
)
Expand Down