Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 and broken link handling #895

Merged
merged 20 commits into from
Oct 1, 2024
Merged

Conversation

pmelab
Copy link
Contributor

@pmelab pmelab commented Sep 19, 2024

The problem

Links that point to non-existent paths will put the router into a faliling state when trying to loading the RSC payload of the target page and throw an exception that is (by default) caught by the global error boundary.

This approach creates some problems for content-focused use cases:

  • The 404 page can not be a full react page that is managed like all the others.
  • It's not possible to hand of 404 handling to a different system (e.g. redirects stored in a CMS or CDN).

Solution

I changed the Link component to optionally wait for the prefetch to resolve before navigating and if unsuccessful trigger a native navigation to the same path to trigger the host system 404 behavior.

Based on a discussion with @dai-shi , I attempted to achieve the same with a custom error boundary, but the error boundary gets rendered and there is a flash of the error message before the actual navigation happens, which feels like the website is broken.

Note

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly: 4ca39f7

Copy link

vercel bot commented Sep 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
waku ✅ Ready (Inspect) Visit Preview Oct 1, 2024 2:01pm

Copy link

codesandbox-ci bot commented Sep 19, 2024

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@dai-shi
Copy link
Owner

dai-shi commented Sep 19, 2024

Hi, thanks for the investigation and the suggestion!

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly

Yeah, it's not surprising because this feature isn't very stable yet. Thanks for the test and the fix!

@pmelab
Copy link
Contributor Author

pmelab commented Sep 20, 2024

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

I will try my best! Wall of text incoming ...

Aside from prefetching (which I agree is an additional feature), all the Waku Link component does is change the route. Then the router tries to render the route, and if fetching the RSC payload fails, it will trigger the error boundary.

In our concrete case, the content management system maintains a list of redirects for pages that changed their slug over time. Links to those pages are not necessarily immediately updated in all locations, but just redirect to the targets new slug.

That means, a missing RSC payload is an expected case that we need to handle before the actual route change happens. It should not emit an error state to the user, but fall back to the CMS' redirect handler. We are working in a purely static environment, so we can't use a middleware to catch that.

Prefetch is part of the solution because it can tell us in advance if a route exists or not, and the Link component can simply trigger a standard navigation and the host system can decide on a higher level what to do. That's one of the things that still work well in Gatsby right now, and I took the solution from there 😄

And there is the other nice side effect that the 404 page can just be a regular page that is handled by Vercel/Cloudflare/Netlify/Nginx/Apache ... 's built-in fallback mechanisms.

Using an error boundary would be more elegant, but it will always unmount the whole page before doing anything else. Maybe this could be handled on a router level, but that means it would apply to every link, which felt even more invasive.

@dai-shi
Copy link
Owner

dai-shi commented Sep 20, 2024

So, does your solution require prefetching always? It feels bad in two ways: a) prefetch should be optional, b) prefetch shouldn't be awaited. (otherwise, we don't get the benefit of prefetching.)

I think we need a solution without prefetching.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 20, 2024

Yes, it always has to prefetch. I tried to trigger the reload right where the exception is thrown:


... but this is too late. The error boundary gets rendered already, since the new route is already being evaluated.

I think without prefetching, we would have to make the router remember its previous state and render that in case the RSC payload returns a 404. And then trigger the page reload.

@dai-shi
Copy link
Owner

dai-shi commented Sep 20, 2024

Yeah, I think that's the right solution. It's tedious though. Awaiting prefetch isn't a prefetching any longer, so it's not an option.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 20, 2024

Do you have any pointers where or how to do that? I tried to track the application flow, and from what i can tell the promise returned from fetchRSC gets passed as elements here:

return createElement(
RefetchContext.Provider,
{ value: refetch },
createElement(ElementsContext.Provider, { value: elements }, children),
);

So the error will always be thrown during rendering, and we are back at the error boundary. Maybe we could store the previous result and and make the error boundary render the whole previous page, but this would still mean a full re-render and probably ui elements re-initializing and flashing.

@dai-shi
Copy link
Owner

dai-shi commented Sep 20, 2024

I need to sit down and think about it more.

As I understand, this is about client side routing, so location.href = ... shouldn't be used, basically.

Meanwhile, would you send a separate PR for 4ca39f7?

@pmelab
Copy link
Contributor Author

pmelab commented Sep 21, 2024

There you go: #896

@pmelab
Copy link
Contributor Author

pmelab commented Sep 21, 2024

Note

Brainstorming different angles to the problem

Frontend-redirects

Add all possible redirects to the js-bundle.

✅ Independent of hosting environment
✅ No visual break
🛑 List can get long and inflates js-payload

Redirect pages

Render an actual page for each redirect that does a client side redirect on load.

✅ Independent of hosting environment
✅ No changes to waku necessary
🛑 Causes visual break, but less than a full page unmount
🛑 Might produce a lot of files. Some environments (e.g. Cloudflare pages) have a hard limit.

Redirect RSC payloads

When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.

🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

createRedirect

New createRedirect API that places a special RSC payload at that path that causes waku to load a different RSC file instead.

✅ Independent of hosting environment
✅ No visual break
🛑 More files, but at least less than "Redirect pages"

@pmelab
Copy link
Contributor Author

pmelab commented Sep 23, 2024

So i had a longer thought about this, and turns out that Option 3 of the brainstorming solves everything:

Redirect RSC payloads
When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.
🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

Nothing in Waku has to change. As an example, I use these Netlify redirects:

/a /b 301
/RSC/a.txt /RSC/b.txt 301

When a Link component now points to /a and the rendering process tries to load /RSC/a.txt, the server responds with the contents of /RSC/b.txt and the correct page is displayed. In a Gatsby environment that would have failed, due to potentially missing client code. But since the RSC payload also knows which js-bundles are required, it just works 🤯

Adding this rewrite...

/RSC/* /RSC/404.txt

... even (almost) solves the 404 problem. When clicking a broken link, the server will respond with the RSC-Payload of the 404 page. It is also correctly displayed, but Waku also replaces the current browser location with /404. Thats great for any other pages, but here we actually don't want that.
@dai-shi : do you know where this happens and if we can add special handling for 404 somehow?

@dai-shi
Copy link
Owner

dai-shi commented Sep 24, 2024

Thanks for your time thinking about this.
I really like that Waku is becoming a collaborative project.

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem. (like, is it only a hosting issue?)
From my perspective, I can provide some requirements:

  • We have SSR, but we can opt-out. So, the solution should work without SSR.
  • We support static sites (with/without SSR). It's a primary use case.

Can the problem happen with static sites and/or without SSR?

@pmelab
Copy link
Contributor Author

pmelab commented Sep 24, 2024

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem.

The problem has shifted through the course of this PR. I'll try to summarize.

Initial Situation:

Requirements:

  • We work in a purely static environment, without a JS server runtime. No SSR and no middleware either. Simplest case would be dist/public served by NGINX for example.
  • 404 error pages are regular pages, maintained in a CMS.
  • There is an externally maintained list of path redirects (e.g. path /a redirects visitors to page /b) that is already applied by the HTTP server.

Problems:

  • When navigating with Waku's Link component, the HTTP server has no chance to apply it's built-in redirects or 404 fallbacks.
    • Waku Link's that point to non-existing paths don't show the 404 page, but the error boundary message.
    • Waku Links that point to redirected pages don't redirect but show the error boundary message as well.
  • This can be reproduced both in SSR and SSG, but in SSR it's easier to solve, since redirect and fallback logic can be applied at time of the request.

After the findings from my "brainstorming"

  • ✅ Adding the same redirects for the RSC directory (e.g. /RSC/a.txt to /RSC/b.txt) to the HTTP server makes Link correctly follow redirects.
  • ✅ Making the HTTP server fall back to /RSC/404.txt for all requests that ask for something in /RSC/* that does not exist, cause Link with broken paths to display the correct custom 404-page.
    • 🛑 .... but it also changes the window.location to /404. It should remain the non-existent path that the link pointed to instead.

Note

The Code in this PR does not reflect its state of discussion any more. I was able to solve everything except that last point outside of Waku.

@dai-shi
Copy link
Owner

dai-shi commented Sep 24, 2024

Thanks for the summary. That helps my understanding.
So, my mental model is that server routing and client routing are different things.

  • Server routing is done by the HTTP server (NGINX, Waku server, or something). It fetches HTML (and RSC).
  • Client routing is basically <Link> with JS enabled. It fetches RSC only.

404.html is basically only for server routing.
404.tsx with waku/router is also basically for server routing (but can it be used for client routing? that would be nice.)

e.g. /RSC/a.txt to /RSC/b.txt

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.


So, I feel like there should be something that waku/router could do.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 24, 2024

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.

Yes, thats a flaw here.

So, I feel like there should be something that waku/router could do.

It could handle 404 responses when trying to fetch RSC:

  • Test the corresponding "real" path to see if it emits a redirect (from NGINX) and attempt to load the RSC for the redirect target path instead. That would mitigate the dependency on internals above.
  • Fall back to loading the RSC for 404.tsx if available.

@pmelab pmelab force-pushed the broken-link-handling branch 2 times, most recently from 5b1ae68 to 99b5428 Compare September 25, 2024 07:22
@pmelab
Copy link
Contributor Author

pmelab commented Sep 25, 2024

Re-started this PR with a detailed test case how I would expect routing to work.

dai-shi
dai-shi previously approved these changes Sep 30, 2024
Copy link
Owner

@dai-shi dai-shi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think supporting redirect with client navigation is feasible, conceptually.
At least it's not the work of the minimal API. As for waku/router, it might be technically possible like adding { REDIRECT: '...' } in the response. But, the user-facing API is uncertain. I think we should learn more about the use cases and community standard with other frameworks.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 30, 2024

This solution does not work in a fully static environment ☹️ Only with a server side router.

@dai-shi
Copy link
Owner

dai-shi commented Sep 30, 2024

Good point...

For static sites, we can't add JSON body to 404 response anyway. Should we go back to double-fetching solution (but in router/client.ts), or always embed 404?

Actually, we can send 404 content on the initial load and cache it on the client. Let me see. (It increases the size with static sites though.)

@pmelab
Copy link
Contributor Author

pmelab commented Sep 30, 2024

For static sites, we can't add JSON body to 404 response anyway.

It would work, but then the RSC path would have to become public API, so an HTTP server can reliably fall back to /RSC/404.txt.

Actually, we can send 404 content on the initial load and cache it on the client. Let me see. (It increases the size with static sites though.)

Does that mean embedding the 404 content into each static html page? With CDNs charging by transfer volume, thats not great.

@dai-shi
Copy link
Owner

dai-shi commented Sep 30, 2024

Does that mean embedding the 404 content into each static html page?

Yeah, kind of. Okay, I'll try another solution.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 30, 2024

It would work, but then the RSC path would have to become public API, so an HTTP server can reliably fall back to /RSC/404.txt.

The more i think about this, the more i like it. It would also allow to handle redirects without Waku's knowledge. But I have no clue how likely this would break. Is there a scenario where "real"- and RSC-path don't match?

@dai-shi
Copy link
Owner

dai-shi commented Sep 30, 2024

Is there a scenario where "real"- and RSC-path don't match?

Not now, but I feel like this convention isn't very flexible. I would definitely like to explore other solutions that don't match.

More than that, 404.txt isn't Waku minimal API's spec. It's waku/router's spec. So, it shouldn't be a public API.

I don't feel comfortable with depending on HTTP server's config. Let me work on something and you might come up with an idea to improve.

@pmelab
Copy link
Contributor Author

pmelab commented Sep 30, 2024

Should i make the test cases "minimal"? (not use server side at all)

@dai-shi
Copy link
Owner

dai-shi commented Sep 30, 2024

Let me work on something

I thought about getting 404.txt on the initial load and caching it, but as it can be dynamic, it can't be cached. So, I'd try something close to your approach.

Should i make the test cases "minimal"? (not use server side at all)

what do you mean? test static site case?

Copy link
Owner

@dai-shi dai-shi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request for review

@dai-shi dai-shi dismissed their stale review September 30, 2024 13:59

new commits

@pmelab
Copy link
Contributor Author

pmelab commented Sep 30, 2024

Should i make the test cases "minimal"? (not use server side at all)

what do you mean? test static site case?

Run the client side test cases with serve instead of waku start. To make sure it always works without a node server. Like in the "render type" test case:

cp = exec(`pnpm serve -l ${port} dist/public`, { cwd });

@dai-shi
Copy link
Owner

dai-shi commented Sep 30, 2024

Run the client side test cases with serve instead of waku start.

That's nice!

@pmelab
Copy link
Contributor Author

pmelab commented Oct 1, 2024

Run the client side test cases with serve instead of waku start.

That's nice!

There we go!

@dai-shi
Copy link
Owner

dai-shi commented Oct 1, 2024

Thanks! Would you like to give a review too?

Copy link
Contributor Author

@pmelab pmelab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont seem to have the permission to "approve". Just this comment.

@dai-shi
Copy link
Owner

dai-shi commented Oct 1, 2024

Yes, I know. That works. Thanks!

packages/waku/src/router/client.ts Show resolved Hide resolved
@dai-shi dai-shi merged commit caf1a18 into dai-shi:main Oct 1, 2024
28 checks passed
@pmelab
Copy link
Contributor Author

pmelab commented Oct 2, 2024

Created a new PR for the redirect case: #928

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants