Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gateway): _redirects file support #8816

Closed

Conversation

justindotpub
Copy link
Contributor

Rebased Cliff's changes with some upstream gateway refactorings. Will retest from here and then continue the work.

@welcome

This comment was marked as resolved.

@justindotpub
Copy link
Contributor Author

Ah, I see now that draft PRs kick off welcome processes. 😅 When anyone from the IPFS project looks at this, please let me know if draft PRs are discouraged. I'm just getting started on this work and was assuming a draft PR would be a reasonable way to track my progress. Thanks. 🙏🏻

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up! ❤️

Quick feedback:

  • do we need to look at every level? (is netlify / cloudflare looking for _redirects in subdirs, only check if it is present in the top one?)
  • this should be only applied to default unixfs response types, and ignored by other ones (see comment inline)
  • this should be enabled only when we have Origin isolation
    • (when URL.pathname / starts at the IPFS content root /ipfs/{cid}/ and cant go beyond it)
    • namely, when DNSLink website is loaded based on Host header or if Host is a subdomain gateway

core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
@BigLep BigLep added this to the Best Effort Track milestone Mar 24, 2022
@lidel lidel changed the title (WIP) HTTP Gateway redirects support feat(gateway): _redirects file support Mar 25, 2022
// For Unixfs, when a path can't be resolved we need to check for redirects and pretty 404 page files.
if responseFormat == "" {
logger.Debugw("dispatching to getOrHeadHandlerUnixfs")
i.getOrHeadHandlerUnixfs(w, r, begin, logger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to dispatch to getOrHeadHandlerUnixfs here, to avoid any early 404s, which would then short circuit the _redirects lookup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit invasive change, duplicates code paths which are mostly the same.
This makes it harder to maintain, reason about metrics etc.

Could we keep the old way where we had a single getOrHeadHandler?
See my idea in https://github.com/ipfs/go-ipfs/pull/8816/files#r840944672

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link just takes me to my comment "Moved pretty 404 logic over to gateway_handler_unixfs.go". Is that what you intended to link me?

switch err {
case nil:
case coreiface.ErrOffline:
webError(w, "ipfs resolve -r "+debugStr(contentPath.String()), err, http.StatusServiceUnavailable)
return
default:
// if Accept is text/html, see if ipfs-404.html is present
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved pretty 404 logic over to gateway_handler_unixfs.go

// TODO(JJ): During tests we get multibase.ErrUnsupportedEncoding
// This comes from multibase and I assume is due to a fake or otherwise bad CID being in the test.
// So for now any errors getting the redirect file are silently ignored.
// internalWebError(w, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question about how we should handle errors reading and parsing the redirects if present. Also note my comment about issues with test. I need to dive into this more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to return an error than to ignore it – website owner should get immediate feedback they made a typo or something.

return
default:
// if Accept is text/html, see if ipfs-404.html is present
if i.servePretty404IfPresent(w, r, contentPath) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty 404 logic moved here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Not the best, this introduced a lot of code duplication, even with extracted functions we duplicate ResolvePath, switch etc.

Can we move it back and have a single getOrHeadHandler? 🙏
The refactor does not seem to be worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments at #8816 (comment).

}
}

// TODO(JJ): I was thinking about changing this to just look at the root path as well, but the docs say it searches up
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I was not involved in this feature and not sure what relies on it aside from docs.ipfs.io
To limit the blast radius of this PR I'd keep the existing logic.

Hopefully it will die eventually. We won't be documenting this, we will be documenting _redirects instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. So I assume I should keep existing pretty 404 logic, but only fall back to it if there isn't any netlify style custom 404 page handling entry in the _redirects file. And only the new _redirects file 404 logic will be documented. 👌

return
}
logger.Debugw("serving unixfs directory", "path", contentPath)
i.serveDirectory(w, r, resolvedPath, contentPath, dir, begin, logger)
}

// redirect returns redirected, newPath (if rewrite), error
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't touched any of this redirect file parsing logic yet. Will need to settle on a format, ensure we handle optional bits like status code, etc. The example file I've been using is...

# redirects file
test    /         301
^/hir$  /hi.html  302
^/hi    /hi.html  200

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we could reuse https://github.com/tj/go-redirects (or other library, if you find a better one) and avoid the need for maintaining the parser for _redirects?

You may find some prior art use in https://github.com/search?l=Go&q=tj%2Fgo-redirects&type=Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'm guessing at the very least we'll have to contribute to another library to get feature parity with what Netlify currently supports (and I'm assuming that's our target to start with), but I'd rather not completely reinvent the wheel.

return "", "", fmt.Errorf("there is no 404 file for the requested content types")
}

// TODO(JJ): Pretty sure this is incorrect. Validate the correct approach.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any guidance is appreciated. 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, may not be enough to work on DNSLink websites.

  • Perhaps you could set dnslink-hostname in core/corehttp/hostname.go (around line 222)? That way you could check it here, similar to gw-hostname.
  • We need sharness test for DNSLink case
    • See DNSLINK_FQDN tests in t0114-gateway-subdomains.sh to see how you can inject fake DNSLink mapping for use in tests.

cbrake and others added 2 commits April 4, 2022 14:33
- _redirects: add support for 200 rewrite
- add support for regex in redirects
…nixfs reponse format

- Write functions for logic in getOrHeadHandler, to make it easier to read and to enable reuse in getOrHeadHandlerUnixfs
- Move Unixfs specific functions to gateway_handler_unixfs.go
- Check for _redirects file if we have origin isolation
Justin Johnson added 2 commits April 4, 2022 15:26
- Add test for no redirect due to no origin isolation
- Use http.* for status codes instead of hardcoded numbers
- Test default of no status code in redirects
@lidel lidel added the topic/gateway Topic gateway label Apr 5, 2022
@lidel lidel mentioned this pull request Apr 6, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a stab at this! ❤️

Wrote comments inline, but a quick summary:

  • let's try to keep the changes minimal – cleanup refactors are good )and ok to keep them, but in the future, such things could be a separate PR :)
  • bit worried about impact of servePretty404IfPresent being bigger than the actual _redirects – I am ok with us keeping it the way it was before just to limit the surface of this PR, seems that I overestimated how easy it will be to limit its scope, apologies! 🙈
    • namely, we should get back to a single getOrHeadHandler 🙏
  • needs DNSLink and catch-all tests
  • try reusing existing parser lib, if possible. If anything is missing in existing one(s), fine to fork it for now.

core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
return "", "", fmt.Errorf("there is no 404 file for the requested content types")
}

// TODO(JJ): Pretty sure this is incorrect. Validate the correct approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, may not be enough to work on DNSLink websites.

  • Perhaps you could set dnslink-hostname in core/corehttp/hostname.go (around line 222)? That way you could check it here, similar to gw-hostname.
  • We need sharness test for DNSLink case
    • See DNSLINK_FQDN tests in t0114-gateway-subdomains.sh to see how you can inject fake DNSLink mapping for use in tests.

return
}
logger.Debugw("serving unixfs directory", "path", contentPath)
i.serveDirectory(w, r, resolvedPath, contentPath, dir, begin, logger)
}

// redirect returns redirected, newPath (if rewrite), error
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind moving this to gateway_handler_unixfs__redirects.go?
And renaming to something like

Suggested change
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) {
func (i *gatewayHandler) handledRedirectsFile(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) {

return false, "", fmt.Errorf("redirect, could not convert node to file")
}

redirs := newRedirs(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with https://github.com/tj/go-redirects

Suggested change
redirs := newRedirs(f)
redirs, err := redirects.ParseString("<_redirects>")

return
}
logger.Debugw("serving unixfs directory", "path", contentPath)
i.serveDirectory(w, r, resolvedPath, contentPath, dir, begin, logger)
}

// redirect returns redirected, newPath (if rewrite), error
func (i *gatewayHandler) redirect(w http.ResponseWriter, r *http.Request, path ipath.Resolved) (bool, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we could reuse https://github.com/tj/go-redirects (or other library, if you find a better one) and avoid the need for maintaining the parser for _redirects?

You may find some prior art use in https://github.com/search?l=Go&q=tj%2Fgo-redirects&type=Code

core/corehttp/gateway_handler_unixfs.go Outdated Show resolved Hide resolved
@justindotpub
Copy link
Contributor Author

justindotpub commented Apr 7, 2022

@lidel thank you for your excellent feedback. 🙏 I'm addressing the individual suggestions and comments separately but I want to make sure we're on the same page about something.

bit worried about impact of servePretty404IfPresent being bigger than the actual _redirects – I am ok with us keeping it the way it was before just to limit the surface of this PR, seems that I overestimated how easy it will be to limit its scope, apologies! 🙈

From this comment and some others, it sounds like you might think the larger reorg and duplicate code paths were only introduced due to your request that I move pretty 404 logic into gateway_handler_unixfs.go. Just in case you missed it, what I was trying to work around is the fact that any 404 here can't be handled without first looking for the _redirects file to ensure that the 404 isn't actually a redirect (or rewrite).

In other words, in this code at https://github.com/ipfs/go-ipfs/blob/master/core/corehttp/gateway_handler.go#L336-L350 (on master) we're not handling unixfs yet, but since when resolving the path for ETag purposes the resolution fails and we hit the switch's default case and need to return a 404, even if I keep the pretty 404 logic here, how can I be sure it is truly a 404 without first processing the _redirects file? But I can't process that file if not unixfs, which we're trying to move over to gateway_handler_unixfs.go. So it seems to force a fork in the execution, or some other reconsideration of the organization of this code.

Does that make sense? I'm totally open to your thoughts, but your comments made me think you might not see the issue I was trying to work around. Thanks again.

@justindotpub
Copy link
Contributor Author

Also, to be clear, assuming a larger refactoring does indeed end up being needed, I will gladly split the refactoring out into a separate PR first to make it easier to review, and then rebase this PR on top of that.

@justindotpub
Copy link
Contributor Author

See #8855 for an initial refactoring that doesn't do anything redirect related yet.

@justindotpub
Copy link
Contributor Author

@lidel When you have time, can you take a look at https://github.com/fission-suite/go-ipfs/pull/43/files and let me know if something like this would be acceptable? This PR is in fission's fork of go-ipfs currently and is built on top of the cleanup PR above. If something along these lines is more acceptable, I'll keep working through the remaining items and get them back on the branch tied to this PR for review. Thank you!

@justindotpub
Copy link
Contributor Author

Given the issues that PRs in the fission-suite org are causing, I've moved these changes over to #8890, which is tied to my personal account. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/gateway Topic gateway
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants