-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gateway): _redirects file support #8816
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Ah, I see now that draft PRs kick off welcome processes. 😅 When anyone from the IPFS project looks at this, please let me know if draft PRs are discouraged. I'm just getting started on this work and was assuming a draft PR would be a reasonable way to track my progress. Thanks. 🙏🏻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up! ❤️
Quick feedback:
- do we need to look at every level? (is netlify / cloudflare looking for
_redirects
in subdirs, only check if it is present in the top one?) - this should be only applied to default unixfs response types, and ignored by other ones (see comment inline)
- this should be enabled only when we have Origin isolation
- (when URL.pathname
/
starts at the IPFS content root/ipfs/{cid}/
and cant go beyond it) - namely, when DNSLink website is loaded based on
Host
header or ifHost
is a subdomain gateway
- (when URL.pathname
eee1dd5
to
e2707f1
Compare
// For Unixfs, when a path can't be resolved we need to check for redirects and pretty 404 page files. | ||
if responseFormat == "" { | ||
logger.Debugw("dispatching to getOrHeadHandlerUnixfs") | ||
i.getOrHeadHandlerUnixfs(w, r, begin, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to dispatch to getOrHeadHandlerUnixfs
here, to avoid any early 404s, which would then short circuit the _redirects lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit invasive change, duplicates code paths which are mostly the same.
This makes it harder to maintain, reason about metrics etc.
Could we keep the old way where we had a single getOrHeadHandler
?
See my idea in https://github.com/ipfs/go-ipfs/pull/8816/files#r840944672
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link just takes me to my comment "Moved pretty 404 logic over to gateway_handler_unixfs.go". Is that what you intended to link me?
switch err { | ||
case nil: | ||
case coreiface.ErrOffline: | ||
webError(w, "ipfs resolve -r "+debugStr(contentPath.String()), err, http.StatusServiceUnavailable) | ||
return | ||
default: | ||
// if Accept is text/html, see if ipfs-404.html is present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved pretty 404 logic over to gateway_handler_unixfs.go
// TODO(JJ): During tests we get multibase.ErrUnsupportedEncoding | ||
// This comes from multibase and I assume is due to a fake or otherwise bad CID being in the test. | ||
// So for now any errors getting the redirect file are silently ignored. | ||
// internalWebError(w, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question about how we should handle errors reading and parsing the redirects if present. Also note my comment about issues with test. I need to dive into this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to return an error than to ignore it – website owner should get immediate feedback they made a typo or something.
return | ||
default: | ||
// if Accept is text/html, see if ipfs-404.html is present | ||
if i.servePretty404IfPresent(w, r, contentPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty 404 logic moved here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Not the best, this introduced a lot of code duplication, even with extracted functions we duplicate ResolvePath, switch etc.
Can we move it back and have a single getOrHeadHandler
? 🙏
The refactor does not seem to be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments at #8816 (comment).
} | ||
} | ||
|
||
// TODO(JJ): I was thinking about changing this to just look at the root path as well, but the docs say it searches up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I was not involved in this feature and not sure what relies on it aside from docs.ipfs.io
To limit the blast radius of this PR I'd keep the existing logic.
Hopefully it will die eventually. We won't be documenting this, we will be documenting _redirects
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense. So I assume I should keep existing pretty 404 logic, but only fall back to it if there isn't any netlify style custom 404 page handling entry in the _redirects file. And only the new _redirects
file 404 logic will be documented. 👌
return | ||
} | ||
logger.Debugw("serving unixfs directory", "path", contentPath) | ||
i.serveDirectory(w, r, resolvedPath, contentPath, dir, begin, logger) | ||
} | ||
|
||
// redirect returns redirected, newPath (if rewrite), error | ||
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't touched any of this redirect file parsing logic yet. Will need to settle on a format, ensure we handle optional bits like status code, etc. The example file I've been using is...
# redirects file
test / 301
^/hir$ /hi.html 302
^/hi /hi.html 200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if we could reuse https://github.com/tj/go-redirects (or other library, if you find a better one) and avoid the need for maintaining the parser for _redirects
?
You may find some prior art use in https://github.com/search?l=Go&q=tj%2Fgo-redirects&type=Code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'm guessing at the very least we'll have to contribute to another library to get feature parity with what Netlify currently supports (and I'm assuming that's our target to start with), but I'd rather not completely reinvent the wheel.
return "", "", fmt.Errorf("there is no 404 file for the requested content types") | ||
} | ||
|
||
// TODO(JJ): Pretty sure this is incorrect. Validate the correct approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any guidance is appreciated. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, may not be enough to work on DNSLink websites.
- Perhaps you could set
dnslink-hostname
incore/corehttp/hostname.go
(around line 222)? That way you could check it here, similar togw-hostname
. - We need sharness test for DNSLink case
- See
DNSLINK_FQDN
tests int0114-gateway-subdomains.sh
to see how you can inject fake DNSLink mapping for use in tests.
- See
e2707f1
to
51ee5b0
Compare
- _redirects: add support for 200 rewrite - add support for regex in redirects
…nixfs reponse format - Write functions for logic in getOrHeadHandler, to make it easier to read and to enable reuse in getOrHeadHandlerUnixfs - Move Unixfs specific functions to gateway_handler_unixfs.go - Check for _redirects file if we have origin isolation
51ee5b0
to
0395527
Compare
- Add test for no redirect due to no origin isolation
- Use http.* for status codes instead of hardcoded numbers - Test default of no status code in redirects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a stab at this! ❤️
Wrote comments inline, but a quick summary:
- let's try to keep the changes minimal – cleanup refactors are good )and ok to keep them, but in the future, such things could be a separate PR :)
- bit worried about impact of
servePretty404IfPresent
being bigger than the actual_redirects
– I am ok with us keeping it the way it was before just to limit the surface of this PR, seems that I overestimated how easy it will be to limit its scope, apologies! 🙈- namely, we should get back to a single
getOrHeadHandler
🙏
- namely, we should get back to a single
- needs DNSLink and catch-all tests
- try reusing existing parser lib, if possible. If anything is missing in existing one(s), fine to fork it for now.
return "", "", fmt.Errorf("there is no 404 file for the requested content types") | ||
} | ||
|
||
// TODO(JJ): Pretty sure this is incorrect. Validate the correct approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, may not be enough to work on DNSLink websites.
- Perhaps you could set
dnslink-hostname
incore/corehttp/hostname.go
(around line 222)? That way you could check it here, similar togw-hostname
. - We need sharness test for DNSLink case
- See
DNSLINK_FQDN
tests int0114-gateway-subdomains.sh
to see how you can inject fake DNSLink mapping for use in tests.
- See
return | ||
} | ||
logger.Debugw("serving unixfs directory", "path", contentPath) | ||
i.serveDirectory(w, r, resolvedPath, contentPath, dir, begin, logger) | ||
} | ||
|
||
// redirect returns redirected, newPath (if rewrite), error | ||
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind moving this to gateway_handler_unixfs__redirects.go
?
And renaming to something like
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) { | |
func (i *gatewayHandler) handledRedirectsFile(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) { |
return false, "", fmt.Errorf("redirect, could not convert node to file") | ||
} | ||
|
||
redirs := newRedirs(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with https://github.com/tj/go-redirects
redirs := newRedirs(f) | |
redirs, err := redirects.ParseString("<_redirects>") |
return | ||
} | ||
logger.Debugw("serving unixfs directory", "path", contentPath) | ||
i.serveDirectory(w, r, resolvedPath, contentPath, dir, begin, logger) | ||
} | ||
|
||
// redirect returns redirected, newPath (if rewrite), error | ||
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if we could reuse https://github.com/tj/go-redirects (or other library, if you find a better one) and avoid the need for maintaining the parser for _redirects
?
You may find some prior art use in https://github.com/search?l=Go&q=tj%2Fgo-redirects&type=Code
@lidel thank you for your excellent feedback. 🙏 I'm addressing the individual suggestions and comments separately but I want to make sure we're on the same page about something.
From this comment and some others, it sounds like you might think the larger reorg and duplicate code paths were only introduced due to your request that I move pretty 404 logic into In other words, in this code at https://github.com/ipfs/go-ipfs/blob/master/core/corehttp/gateway_handler.go#L336-L350 (on master) we're not handling unixfs yet, but since when resolving the path for ETag purposes the resolution fails and we hit the switch's default case and need to return a 404, even if I keep the pretty 404 logic here, how can I be sure it is truly a 404 without first processing the Does that make sense? I'm totally open to your thoughts, but your comments made me think you might not see the issue I was trying to work around. Thanks again. |
Also, to be clear, assuming a larger refactoring does indeed end up being needed, I will gladly split the refactoring out into a separate PR first to make it easier to review, and then rebase this PR on top of that. |
See #8855 for an initial refactoring that doesn't do anything redirect related yet. |
@lidel When you have time, can you take a look at https://github.com/fission-suite/go-ipfs/pull/43/files and let me know if something like this would be acceptable? This PR is in fission's fork of go-ipfs currently and is built on top of the cleanup PR above. If something along these lines is more acceptable, I'll keep working through the remaining items and get them back on the branch tied to this PR for review. Thank you! |
Given the issues that PRs in the fission-suite org are causing, I've moved these changes over to #8890, which is tied to my personal account. 🙏 |
Rebased Cliff's changes with some upstream gateway refactorings. Will retest from here and then continue the work.