-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle HTTP redirects #754
Changes from 3 commits
1435998
a1dbd1d
2626fbd
b1f29af
40adec3
07ae847
bb78ca3
bc96f64
04795c3
98fed09
7a8e908
4302c69
9dd2437
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,8 +31,9 @@ createCurlHandle <- function(timeout, certificate) { | |
if (httpVerbose()) | ||
curl::handle_setopt(handle, verbose = TRUE) | ||
|
||
# turn off CURLOPT_FOLLOWLOCATION. the curl package makes this the default for new handles, but it | ||
# causes a hang when attempting to follow some responses from shinyapps.io. | ||
# suppress curl's automatically handling of redirects, since we have to | ||
# handle ourselves in httpRequest()/httpRequestWithBody() due to our | ||
# specialised auth needs | ||
curl::handle_setopt(handle, followlocation = FALSE) | ||
|
||
# return the newly created handle | ||
|
@@ -78,40 +79,22 @@ httpLibCurl <- function(protocol, | |
con <- file(contentFile, "rb") | ||
on.exit(if (!is.null(con)) close(con), add = TRUE) | ||
|
||
if (identical(method, "POST")) { | ||
# for POST operations, send all the file's content up at once. this is necessary because some | ||
# POST endpoints return 3xx status codes, which require a seekfunction in order to replay the | ||
# payload (the curl package does not currently allow specifying seekfunctions from R) | ||
curl::handle_setopt(handle, | ||
post = TRUE, | ||
postfields = readBin(con, | ||
what = "raw", | ||
n = fileLength), | ||
postfieldsize_large = fileLength) | ||
} else if (identical(method, "PUT")) { | ||
# for PUT operations, which are often used for larger content (bundle uploads), stream the | ||
# file from disk instead of reading it from memory | ||
curl::handle_setopt(handle, | ||
upload = TRUE, | ||
infilesize_large = fileLength) | ||
|
||
curl::handle_setopt(handle, | ||
readfunction = function(nbytes, ...) { | ||
if (is.null(con)) { | ||
return(raw()) | ||
} | ||
bin <- readBin(con, "raw", nbytes) | ||
if (length(bin) < nbytes) { | ||
close(con) | ||
con <<- NULL | ||
} | ||
bin | ||
}) | ||
} else { | ||
# why was a file specified for this endpoint? | ||
warning("Content file specified, but not used because the '", method, "' request ", | ||
"type does not accept a body.") | ||
} | ||
curl::handle_setopt( | ||
handle, | ||
upload = TRUE, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have we lost There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, good catch and good question. The I'm neutral on whether to preserve it or drop it.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK dropping it. |
||
infilesize_large = fileLength, | ||
readfunction = function(nbytes, ...) { | ||
if (is.null(con)) { | ||
return(raw()) | ||
} | ||
bin <- readBin(con, "raw", nbytes) | ||
if (length(bin) < nbytes) { | ||
close(con) | ||
con <<- NULL | ||
} | ||
bin | ||
} | ||
) | ||
} | ||
|
||
# ensure we're using the requested method | ||
|
@@ -122,19 +105,10 @@ httpLibCurl <- function(protocol, | |
|
||
# make the request | ||
response <- NULL | ||
time <- system.time(gcFirst = FALSE, tryCatch({ | ||
# fetch the response into a raw buffer in memory | ||
response <- curl::curl_fetch_memory(url, handle = handle) | ||
}, | ||
error = function(e, ...) { | ||
# ignore errors resulting from timeout or user abort | ||
if (identical(e$message, "Callback aborted") || | ||
identical(e$message, "transfer closed with outstanding read data remaining")) | ||
return(NULL) | ||
# bubble remaining errors through | ||
else | ||
stop(e) | ||
})) | ||
time <- system.time( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also simplified this since the user interrupt error message has changed so that had no effect and I'm pretty sure that "transfer closed" is a legit error we do want to bubble up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure that some of this logic was trying to trap auth errors during uploads. When there's an auth failure, the client may not see the entire body as sent. This code was trying to handle that situation, but never quite handled most of the cases. In poking around a little, we see terminated uploads for rpubs -- things that are larger than the bundle upload limit. #450 (comment) for quite a bit of discussion about it, but I never really figured out the cleanest way to handle those errors. We wanted to try to ignore the "upload" problem and extract the real HTTP error in the response, if possible. I don't know the best way to react to this type of error, nor if we really need to maintain the old code -- mostly, this helps inform a testing strategy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I'll have a go with some uploading some really large files and see if I can recreate it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I uploaded a 1 Gb app to shinyapps and Posit connect, and didn't see any problems. I'll follow up on the rpubs issue later since it's already individually tracked. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Connect users generally see these errors when there is an intermediate proxy which limits the upload payload size. For example, nginx may use a restricted https://docs.posit.co/connect/admin/proxy/#simple-configuration Focus on the rpubs case, since I know that's reliable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I uploaded a 33 Mb html with no problems. You can see it in all its glory at https://rpubs.com/hadley/large-file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also tested updating, since that uses a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was that a large Rmd or a large bundle? Because we gzip, it's harder to get a large bundle. I ended up using a specific document when last playing with that issue, but it's likely changed since I tried.. I think I have a copy locally somewhere. #450 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aronatkins From the progress bar 😄 |
||
response <- curl::curl_fetch_memory(url, handle = handle), | ||
gcFirst = FALSE | ||
) | ||
httpTrace(method, path, time) | ||
|
||
# get list of HTTP response headers | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,21 @@ httpRequest <- function(service, | |
timeout = timeout, | ||
certificate = certificate | ||
) | ||
|
||
while (httpResponse$status >= 300 && httpResponse$status < 400) { | ||
hadley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
service <- parseHttpUrl(httpResponse$location) | ||
httpResponse <- http( | ||
protocol = service$protocol, | ||
host = service$host, | ||
port = service$port, | ||
method = method, | ||
path = service$path, | ||
headers = headers, | ||
timeout = timeout, | ||
certificate = certificate | ||
) | ||
} | ||
|
||
handleResponse(httpResponse, error_call = error_call) | ||
} | ||
|
||
|
@@ -60,7 +75,7 @@ httpRequestWithBody <- function(service, | |
} | ||
|
||
path <- buildPath(service$path, path, query) | ||
headers <- c(headers, authHeaders(authInfo, method, path, file)) | ||
authed_headers <- c(headers, authHeaders(authInfo, method, path, file)) | ||
certificate <- requestCertificate(service$protocol, authInfo$certificate) | ||
|
||
# perform request | ||
|
@@ -71,18 +86,35 @@ httpRequestWithBody <- function(service, | |
port = service$port, | ||
method = method, | ||
path = path, | ||
headers = headers, | ||
headers = authed_headers, | ||
contentType = contentType, | ||
contentFile = file, | ||
certificate = certificate | ||
) | ||
while (httpResponse$status >= 300 && httpResponse$status < 400) { | ||
# This is a simplification of the spec, since we should preserve | ||
# the method for 307 and 308 | ||
# https://www.rfc-editor.org/rfc/rfc9110.html#name-redirection-3xx | ||
service <- parseHttpUrl(httpResponse$location) | ||
authed_headers <- c(headers, authHeaders(authInfo, "GET", service$path)) | ||
httpResponse <- http( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This redirect handling is assuming that the original request (and its payload) do not need to be re-sent. Is that the situation you're seeing? It feels like the redirection that's set-up for colorado might demand a replay.
The original request never landed. Some systems support 100-continue request/response to avoid sending the payload before receiving confirmation; I'm not sure that any of these servers support that workflow. Most resources will indicate that you should not automatically replay the request in cases like colorado - a shift from one hostname to another - without explicit confirmation. We probably want at least:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To precisely follow the spec we should re-POST with body only for 307 and 308. I'm happy to do that, but I suspect it won't appreciably change behaviour because most folks won't be using those status codes. To be clear, I don't think code is a panacea for all redirect related issues, it's just an important first step. If we really want to handle a server changing host names, I think we'll need code that detects the problem and then updates the record on disk. But I'm not sure how important that problem is and I'm not sure we need to do it proactively, or should wait on someone to file an issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The colorado examples give us two motivating examples:
I'm not sure that there's a good answer, especially since the colorado example shows us that 301/302 are much more common than 307/308. Thinking out loud:
If we ignore updating the server and account records for a moment, we could run a pre-flight check against the client URL to make sure it is still "correct" and update the URL in-memory. That would correct the following requests without hitting multiple redirects and would give us a single place to ask for confirmation, log, or reject the redirect. In other words, run something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Notes to self: need something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While making coffee this morning, I suddenly realised that we can't just update an existing server record on disk, because that will break existing deployments. We need some way to also "redirect" existing server records to a new server record. So I won't tackle that here, but consider it in a future PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah; the association between the server records and the accounts and deployments records came up during our chat. The pre-flight can ask if folks should proceed through the redirect, but it's not straightforward to rewrite the records. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tracking this in #760. |
||
protocol = service$protocol, | ||
host = service$host, | ||
port = service$port, | ||
method = "GET", | ||
path = service$path, | ||
headers = authed_headers, | ||
certificate = certificate | ||
) | ||
httpResponse | ||
} | ||
|
||
handleResponse(httpResponse, error_call = error_call) | ||
} | ||
|
||
handleResponse <- function(response, error_call = caller_env()) { | ||
reportError <- function(msg) { | ||
req <- response$req | ||
url <- paste0(req$protocol, "://", req$host, req$port, req$path) | ||
url <- buildHttpUrl(response$req) | ||
|
||
cli::cli_abort( | ||
c("<{url}> failed with HTTP status {response$status}", msg), | ||
|
@@ -295,6 +327,10 @@ parseHttpUrl <- function(urlText) { | |
url | ||
} | ||
|
||
buildHttpUrl <- function(x) { | ||
paste0(x$protocol, "://", x$host, x$port, x$path) | ||
} | ||
|
||
urlDecode <- function(x) { | ||
curl::curl_unescape(x) | ||
} | ||
|
@@ -355,6 +391,7 @@ authHeaders <- function(authInfo, method, path, file = NULL) { | |
} | ||
} | ||
|
||
# https://github.com/rstudio/connect/wiki/token-authentication#request-signing-rsconnect | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fyi: this is an internal link. |
||
signatureHeaders <- function(authInfo, method, path, file = NULL) { | ||
# headers to return | ||
headers <- list() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possibly overly aggressive, but the comment doesn't make sense to me as the 3xx redirects only apply to the result of the upload, not the upload itself. This is possibly a misdiagnosis of an older problem, because I don't think there should be a general difference between PUTs and POSTs. It's possible that this might have also resolved #544.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this code existed from day one of the libcurl implementation: #348