Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle HTTP redirects #754

Merged
merged 13 commits into from
Mar 13, 2023
72 changes: 23 additions & 49 deletions R/http-libcurl.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@ createCurlHandle <- function(timeout, certificate) {
if (httpVerbose())
curl::handle_setopt(handle, verbose = TRUE)

# turn off CURLOPT_FOLLOWLOCATION. the curl package makes this the default for new handles, but it
# causes a hang when attempting to follow some responses from shinyapps.io.
# suppress curl's automatically handling of redirects, since we have to
# handle ourselves in httpRequest()/httpRequestWithBody() due to our
# specialised auth needs
curl::handle_setopt(handle, followlocation = FALSE)

# return the newly created handle
Expand Down Expand Up @@ -78,40 +79,22 @@ httpLibCurl <- function(protocol,
con <- file(contentFile, "rb")
on.exit(if (!is.null(con)) close(con), add = TRUE)

if (identical(method, "POST")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly overly aggressive, but the comment doesn't make sense to me as the 3xx redirects only apply to the result of the upload, not the upload itself. This is possibly a misdiagnosis of an older problem, because I don't think there should be a general difference between PUTs and POSTs. It's possible that this might have also resolved #544.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this code existed from day one of the libcurl implementation: #348

# for POST operations, send all the file's content up at once. this is necessary because some
# POST endpoints return 3xx status codes, which require a seekfunction in order to replay the
# payload (the curl package does not currently allow specifying seekfunctions from R)
curl::handle_setopt(handle,
post = TRUE,
postfields = readBin(con,
what = "raw",
n = fileLength),
postfieldsize_large = fileLength)
} else if (identical(method, "PUT")) {
# for PUT operations, which are often used for larger content (bundle uploads), stream the
# file from disk instead of reading it from memory
curl::handle_setopt(handle,
upload = TRUE,
infilesize_large = fileLength)

curl::handle_setopt(handle,
readfunction = function(nbytes, ...) {
if (is.null(con)) {
return(raw())
}
bin <- readBin(con, "raw", nbytes)
if (length(bin) < nbytes) {
close(con)
con <<- NULL
}
bin
})
} else {
# why was a file specified for this endpoint?
warning("Content file specified, but not used because the '", method, "' request ",
"type does not accept a body.")
}
curl::handle_setopt(
handle,
upload = TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we lost post = TRUE, which uses a POST request? Is that redundant with the curl::handle_setopt(handle, customrequest = method)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, good catch and good question. The post is a pretty complicated option that changes a bunch of stuff behind the scenes (e.g. adds Content-Type: application/x-www-form-urlencoded, Expect: 100-continue, headers etc). I have a vague recollection that this also changes the redirect handling (because it'll follow the spec for 307/308), but that doesn't matter here since we've suppressed that.

I'm neutral on whether to preserve it or drop it.

  • In favour of dropping it: the code in this PR works and it's very clear exactly how we're modifying the request.
  • In favour of keeping: I know curl applies more special POST logic if it's set, and that may save us from grief down the road.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK dropping it.

infilesize_large = fileLength,
readfunction = function(nbytes, ...) {
if (is.null(con)) {
return(raw())
}
bin <- readBin(con, "raw", nbytes)
if (length(bin) < nbytes) {
close(con)
con <<- NULL
}
bin
}
)
}

# ensure we're using the requested method
Expand All @@ -122,19 +105,10 @@ httpLibCurl <- function(protocol,

# make the request
response <- NULL
time <- system.time(gcFirst = FALSE, tryCatch({
# fetch the response into a raw buffer in memory
response <- curl::curl_fetch_memory(url, handle = handle)
},
error = function(e, ...) {
# ignore errors resulting from timeout or user abort
if (identical(e$message, "Callback aborted") ||
identical(e$message, "transfer closed with outstanding read data remaining"))
return(NULL)
# bubble remaining errors through
else
stop(e)
}))
time <- system.time(
Copy link
Member Author

@hadley hadley Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also simplified this since the user interrupt error message has changed so that had no effect and I'm pretty sure that "transfer closed" is a legit error we do want to bubble up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that some of this logic was trying to trap auth errors during uploads. When there's an auth failure, the client may not see the entire body as sent. This code was trying to handle that situation, but never quite handled most of the cases.

In poking around a little, we see terminated uploads for rpubs -- things that are larger than the bundle upload limit. #450 (comment) for quite a bit of discussion about it, but I never really figured out the cleanest way to handle those errors.

We wanted to try to ignore the "upload" problem and extract the real HTTP error in the response, if possible.

I don't know the best way to react to this type of error, nor if we really need to maintain the old code -- mostly, this helps inform a testing strategy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll have a go with some uploading some really large files and see if I can recreate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I uploaded a 1 Gb app to shinyapps and Posit connect, and didn't see any problems. I'll follow up on the rpubs issue later since it's already individually tracked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect users generally see these errors when there is an intermediate proxy which limits the upload payload size. For example, nginx may use a restricted client_max_body_size.

https://docs.posit.co/connect/admin/proxy/#simple-configuration

Focus on the rpubs case, since I know that's reliable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I uploaded a 33 Mb html with no problems. You can see it in all its glory at https://rpubs.com/hadley/large-file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tested updating, since that uses a PUT instead of POST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that a large Rmd or a large bundle? Because we gzip, it's harder to get a large bundle. I ended up using a specific document when last playing with that issue, but it's likely changed since I tried.. I think I have a copy locally somewhere. #450 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aronatkins From the progress bar 😄 Uploaded 23796573 bytes... so ~22 Mb.

response <- curl::curl_fetch_memory(url, handle = handle),
gcFirst = FALSE
)
httpTrace(method, path, time)

# get list of HTTP response headers
Expand Down
45 changes: 41 additions & 4 deletions R/http.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,21 @@ httpRequest <- function(service,
timeout = timeout,
certificate = certificate
)

while (httpResponse$status >= 300 && httpResponse$status < 400) {
hadley marked this conversation as resolved.
Show resolved Hide resolved
service <- parseHttpUrl(httpResponse$location)
httpResponse <- http(
protocol = service$protocol,
host = service$host,
port = service$port,
method = method,
path = service$path,
headers = headers,
timeout = timeout,
certificate = certificate
)
}

handleResponse(httpResponse, error_call = error_call)
}

Expand Down Expand Up @@ -60,7 +75,7 @@ httpRequestWithBody <- function(service,
}

path <- buildPath(service$path, path, query)
headers <- c(headers, authHeaders(authInfo, method, path, file))
authed_headers <- c(headers, authHeaders(authInfo, method, path, file))
certificate <- requestCertificate(service$protocol, authInfo$certificate)

# perform request
Expand All @@ -71,18 +86,35 @@ httpRequestWithBody <- function(service,
port = service$port,
method = method,
path = path,
headers = headers,
headers = authed_headers,
contentType = contentType,
contentFile = file,
certificate = certificate
)
while (httpResponse$status >= 300 && httpResponse$status < 400) {
# This is a simplification of the spec, since we should preserve
# the method for 307 and 308
# https://www.rfc-editor.org/rfc/rfc9110.html#name-redirection-3xx
service <- parseHttpUrl(httpResponse$location)
authed_headers <- c(headers, authHeaders(authInfo, "GET", service$path))
httpResponse <- http(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redirect handling is assuming that the original request (and its payload) do not need to be re-sent. Is that the situation you're seeing?

It feels like the redirection that's set-up for colorado might demand a replay.

curl -v http://colorado.rstudio.com/rsc/__api__/server_settings
#> HTTP/1.1 301 Moved Permanently
#> Location: https://colorado.rstudio.com/rsc/__api__/server_settings
curl -v https://colorado.rstudio.com/rsc/__api__/server_settings
#> HTTP/1.1 302 Found
#> Location: http://colorado.posit.co/rsc/__api__/server_settings

The original request never landed.

Some systems support 100-continue request/response to avoid sending the payload before receiving confirmation; I'm not sure that any of these servers support that workflow.

Most resources will indicate that you should not automatically replay the request in cases like colorado - a shift from one hostname to another - without explicit confirmation.

We probably want at least:

  1. Be transparent about the redirect.
  2. Offer an option to err rather than walk through the redirect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To precisely follow the spec we should re-POST with body only for 307 and 308. I'm happy to do that, but I suspect it won't appreciably change behaviour because most folks won't be using those status codes.

To be clear, I don't think code is a panacea for all redirect related issues, it's just an important first step. If we really want to handle a server changing host names, I think we'll need code that detects the problem and then updates the record on disk. But I'm not sure how important that problem is and I'm not sure we need to do it proactively, or should wait on someone to file an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colorado examples give us two motivating examples:

  1. HTTP => HTTPS
  2. Host rename

I'm not sure that there's a good answer, especially since the colorado example shows us that 301/302 are much more common than 307/308.

Thinking out loud:

  • Automatic replay of all redirects requires no user intervention, which is a smooth experience.
  • Server URLs are left stale, meaning every subsequent request hits the redirect penalty.
  • Server names are left stale, meaning the UI / metadata will always reflect the old name.
  • Account records reflect the stale server name.

If we ignore updating the server and account records for a moment, we could run a pre-flight check against the client URL to make sure it is still "correct" and update the URL in-memory. That would correct the following requests without hitting multiple redirects and would give us a single place to ask for confirmation, log, or reject the redirect.

In other words, run something like validateServerUrl before using the client to do meaningful work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes to self: need something like validateConnectServer() that we run in deployApp() before hitting any endpoints. If url has redirected, need to prompt the user to update the record on disk (if interactive), or otherwise warn or continue. Might want to consider automatically updating if the hostname stays the name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While making coffee this morning, I suddenly realised that we can't just update an existing server record on disk, because that will break existing deployments. We need some way to also "redirect" existing server records to a new server record. So I won't tackle that here, but consider it in a future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; the association between the server records and the accounts and deployments records came up during our chat. The pre-flight can ask if folks should proceed through the redirect, but it's not straightforward to rewrite the records.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking this in #760.

protocol = service$protocol,
host = service$host,
port = service$port,
method = "GET",
path = service$path,
headers = authed_headers,
certificate = certificate
)
httpResponse
}

handleResponse(httpResponse, error_call = error_call)
}

handleResponse <- function(response, error_call = caller_env()) {
reportError <- function(msg) {
req <- response$req
url <- paste0(req$protocol, "://", req$host, req$port, req$path)
url <- buildHttpUrl(response$req)

cli::cli_abort(
c("<{url}> failed with HTTP status {response$status}", msg),
Expand Down Expand Up @@ -295,6 +327,10 @@ parseHttpUrl <- function(urlText) {
url
}

buildHttpUrl <- function(x) {
paste0(x$protocol, "://", x$host, x$port, x$path)
}

urlDecode <- function(x) {
curl::curl_unescape(x)
}
Expand Down Expand Up @@ -355,6 +391,7 @@ authHeaders <- function(authInfo, method, path, file = NULL) {
}
}

# https://github.com/rstudio/connect/wiki/token-authentication#request-signing-rsconnect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: this is an internal link.

signatureHeaders <- function(authInfo, method, path, file = NULL) {
# headers to return
headers <- list()
Expand Down
91 changes: 26 additions & 65 deletions R/ide.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,84 +17,45 @@ validateServerUrl <- function(url, certificate = NULL) {
# The URL may be specified with or without the protocol and port; this function
# will try both http and https and follow any redirects given by the server.
validateConnectUrl <- function(url, certificate = NULL) {
tryAllProtocols <- TRUE

if (!grepl("://", url, fixed = TRUE))
{
# Add protocol if missing, assuming https except for local installs
if (!grepl("://", url, fixed = TRUE)) {
if (grepl(":3939", url, fixed = TRUE)) {
# assume http for default (local) connect installations
url <- paste0("http://", url)
} else {
# assume https elsewhere
url <- paste0("https://", url)
}
}

# if the URL ends with a port number, don't try http/https on the same port
if (grepl(":\\d+/?$", url)) {
tryAllProtocols <- FALSE
}

settingsEndpoint <- "/server_settings"
url <- ensureConnectServerUrl(url)
is_http <- grepl("^http://", url)

GET_server_settings <- function(url) {
auth_info <- list(certificate = inferCertificateContents(certificate))
GET(
parseHttpUrl(url),
auth_info,
"/server_settings",
timeout = getOption("rsconnect.http.timeout", 10)
)
}

# populate certificate if supplied
certificate <- inferCertificateContents(certificate)

# begin trying URLs to discover the correct one
response <- NULL
errMessage <- ""
retry <- TRUE
while (retry) {
tryCatch({
response <- GET(parseHttpUrl(url),
list(certificate = certificate),
settingsEndpoint,
timeout = getOption("rsconnect.http.timeout", 10))

httpResponse <- attr(response, "httpResponse")
# check for redirect
if (httpResponse$status == 307 &&
!is.null(httpResponse$location)) {

# we were served a redirect; try again with the new URL
url <- httpResponse$location
if (substring(url, (nchar(url) - nchar(settingsEndpoint)) + 1) ==
settingsEndpoint) {
# chop /server_settings from the redirect path to get the raw API path
url <- substring(url, 1, nchar(url) - nchar(settingsEndpoint))
}
next
}
if (!isContentType(httpResponse, "application/json")) {
response <- NULL
errMessage <- "Endpoint did not return JSON"
}
cnd <- catch_cnd(response <- GET_server_settings(url), "error")
if (is_http && cnd_inherits(cnd, "OPERATION_TIMEDOUT")) {
url <- gsub("^http://", "https://", url)
cnd <- catch_cnd(response <- GET_server_settings(url), "error")
}

# got a real response; stop trying now
retry <- FALSE
}, error = function(e) {
if (inherits(e, "OPERATION_TIMEDOUT") && tryAllProtocols) {
# if the operation timed out on one protocol, try the other one (note
# that we don't do this if a port is specified)
if (substring(url, 1, 7) == "http://") {
url <<- paste0("https://", substring(url, 8))
} else if (substring(url, 1, 8) == "https://") {
url <<- paste0("http://", substring(url, 9))
}
tryAllProtocols <<- FALSE
return()
}
errMessage <<- e$message
retry <<- FALSE
})
if (!is.null(cnd)) {
return(list(valid = FALSE, message = conditionMessage(cnd)))
}

if (is.null(response)) {
list(valid = FALSE, message = errMessage)
} else {
list(valid = TRUE, url = url, response = response)
httpResponse <- attr(response, "httpResponse")
if (!isContentType(httpResponse, "application/json")) {
return(list(valid = FALSE, message = "Endpoint did not return JSON"))
}

url <- gsub("/server_settings$", "", buildHttpUrl(httpResponse$req))
list(valid = TRUE, url = url, response = response)
}

# given a server URL, returns that server's short name. if the server is not
Expand Down
19 changes: 0 additions & 19 deletions tests/testthat/test-http-libcurl.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,3 @@ test_that("basic HTTP methods work", {
test_http_POST_file()
test_http_headers()
})

test_that("can stream PUT content from disk", {
service <- httpbin_service()

path <- withr::local_tempfile()
con <- file(path, "wb")
writeLines(c("1", "2", "3"), con = con)
close(con)

resp <- PUT(
service,
authInfo = NULL,
path = "put",
contentType = "text/plain",
file = path
)
expect_equal(attr(resp, "httpResponse")$status, 200)
expect_equal(resp$data, "1\n2\n3\n")
})
9 changes: 7 additions & 2 deletions tests/testthat/test-ide.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
test_that("validateConnectUrl() returns expected return for some known endpoints", {

expect_false(validateConnectUrl("http://posit.cloud")$valid)
expect_false(validateConnectUrl("http://shinyapps.io")$valid)
expect_false(validateConnectUrl("https://posit.cloud")$valid)
expect_false(validateConnectUrl("https://shinyapps.io")$valid)
expect_true(validateConnectUrl("https://connect.rstudioservices.com/")$valid)
expect_true(validateConnectUrl("https://colorado.posit.co/rsc")$valid)
})
Expand All @@ -12,3 +12,8 @@ test_that("validateConnectUrl() normalises urls", {
expect_equal(validateConnectUrl("connect.rstudioservices.com")$url, api_url)
expect_equal(validateConnectUrl("https://connect.rstudioservices.com/")$url, api_url)
})

test_that("validateConnectUrl() follows redirects", {
api_url <- "https://connect.rstudioservices.com:443/__api__"
expect_equal(validateConnectUrl("http://connect.rstudioservices.com")$url, api_url)
})