404 and broken link handling #895

pmelab · 2024-09-19T14:49:20Z

The problem

Links that point to non-existent paths will put the router into a faliling state when trying to loading the RSC payload of the target page and throw an exception that is (by default) caught by the global error boundary.

This approach creates some problems for content-focused use cases:

The 404 page can not be a full react page that is managed like all the others.
It's not possible to hand of 404 handling to a different system (e.g. redirects stored in a CMS or CDN).

Solution

I changed the Link component to optionally wait for the prefetch to resolve before navigating and if unsuccessful trigger a native navigation to the same path to trigger the host system 404 behavior.

Based on a discussion with @dai-shi , I attempted to achieve the same with a custom error boundary, but the error boundary gets rendered and there is a flash of the error message before the actual navigation happens, which feels like the website is broken.

Note

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly: 4ca39f7

vercel · 2024-09-19T14:49:24Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
waku	✅ Ready (Inspect)	Visit Preview	Oct 1, 2024 2:01pm

codesandbox-ci · 2024-09-19T14:49:43Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

dai-shi · 2024-09-19T23:55:13Z

Hi, thanks for the investigation and the suggestion!

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

When writing tests for this, i found that error handling was broken in itself, which is not related to this PR directly

Yeah, it's not surprising because this feature isn't very stable yet. Thanks for the test and the fix!

pmelab · 2024-09-20T02:50:45Z

I would like to understand the problem better as I assume there should be more minimal solution.
One of my questions is is this issue specific to prefetching? As "prefetch" is an additional feature, and it should work without prefetching.

I will try my best! Wall of text incoming ...

Aside from prefetching (which I agree is an additional feature), all the Waku Link component does is change the route. Then the router tries to render the route, and if fetching the RSC payload fails, it will trigger the error boundary.

In our concrete case, the content management system maintains a list of redirects for pages that changed their slug over time. Links to those pages are not necessarily immediately updated in all locations, but just redirect to the targets new slug.

That means, a missing RSC payload is an expected case that we need to handle before the actual route change happens. It should not emit an error state to the user, but fall back to the CMS' redirect handler. We are working in a purely static environment, so we can't use a middleware to catch that.

Prefetch is part of the solution because it can tell us in advance if a route exists or not, and the Link component can simply trigger a standard navigation and the host system can decide on a higher level what to do. That's one of the things that still work well in Gatsby right now, and I took the solution from there 😄

And there is the other nice side effect that the 404 page can just be a regular page that is handled by Vercel/Cloudflare/Netlify/Nginx/Apache ... 's built-in fallback mechanisms.

Using an error boundary would be more elegant, but it will always unmount the whole page before doing anything else. Maybe this could be handled on a router level, but that means it would apply to every link, which felt even more invasive.

dai-shi · 2024-09-20T04:26:26Z

So, does your solution require prefetching always? It feels bad in two ways: a) prefetch should be optional, b) prefetch shouldn't be awaited. (otherwise, we don't get the benefit of prefetching.)

I think we need a solution without prefetching.

pmelab · 2024-09-20T06:02:24Z

Yes, it always has to prefetch. I tried to trigger the reload right where the exception is thrown:

waku/packages/waku/src/client.ts

Line 38 in 58d957c

throw err;

... but this is too late. The error boundary gets rendered already, since the new route is already being evaluated.

I think without prefetching, we would have to make the router remember its previous state and render that in case the RSC payload returns a 404. And then trigger the page reload.

dai-shi · 2024-09-20T06:07:51Z

Yeah, I think that's the right solution. It's tedious though. Awaiting prefetch isn't a prefetching any longer, so it's not an option.

pmelab · 2024-09-20T07:43:19Z

Do you have any pointers where or how to do that? I tried to track the application flow, and from what i can tell the promise returned from fetchRSC gets passed as elements here:

waku/packages/waku/src/client.ts

Lines 205 to 209 in 58d957c

    
           return createElement( 
        
             RefetchContext.Provider, 
        
             { value: refetch }, 
        
             createElement(ElementsContext.Provider, { value: elements }, children), 
        
           );

So the error will always be thrown during rendering, and we are back at the error boundary. Maybe we could store the previous result and and make the error boundary render the whole previous page, but this would still mean a full re-render and probably ui elements re-initializing and flashing.

dai-shi · 2024-09-20T14:15:20Z

I need to sit down and think about it more.

As I understand, this is about client side routing, so location.href = ... shouldn't be used, basically.

Meanwhile, would you send a separate PR for 4ca39f7?

pmelab · 2024-09-21T05:27:52Z

There you go: #896

pmelab · 2024-09-21T06:11:49Z

Note

Brainstorming different angles to the problem

Frontend-redirects

Add all possible redirects to the js-bundle.

✅ Independent of hosting environment
✅ No visual break
🛑 List can get long and inflates js-payload

Redirect pages

Render an actual page for each redirect that does a client side redirect on load.

✅ Independent of hosting environment
✅ No changes to waku necessary
🛑 Causes visual break, but less than a full page unmount
🛑 Might produce a lot of files. Some environments (e.g. Cloudflare pages) have a hard limit.

Redirect RSC payloads

When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.

🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

`createRedirect`

New createRedirect API that places a special RSC payload at that path that causes waku to load a different RSC file instead.

✅ Independent of hosting environment
✅ No visual break
🛑 More files, but at least less than "Redirect pages"

pmelab · 2024-09-23T08:33:58Z

So i had a longer thought about this, and turns out that Option 3 of the brainstorming solves everything:

Redirect RSC payloads
When there is a redirect from /a to /b, the host system also redirects /RSC/a.txt to /RSC/b.txt. Waku router could detect that case and adjust history accordingly.
🛑 Depends on the hosting environment
✅ No visual break
✅ No additional files or JS payload size

Nothing in Waku has to change. As an example, I use these Netlify redirects:

/a /b 301
/RSC/a.txt /RSC/b.txt 301

When a Link component now points to /a and the rendering process tries to load /RSC/a.txt, the server responds with the contents of /RSC/b.txt and the correct page is displayed. In a Gatsby environment that would have failed, due to potentially missing client code. But since the RSC payload also knows which js-bundles are required, it just works 🤯

Adding this rewrite...

/RSC/* /RSC/404.txt

... even (almost) solves the 404 problem. When clicking a broken link, the server will respond with the RSC-Payload of the 404 page. It is also correctly displayed, but Waku also replaces the current browser location with /404. Thats great for any other pages, but here we actually don't want that.
@dai-shi : do you know where this happens and if we can add special handling for 404 somehow?

dai-shi · 2024-09-24T12:24:33Z

Thanks for your time thinking about this.
I really like that Waku is becoming a collaborative project.

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem. (like, is it only a hosting issue?)
From my perspective, I can provide some requirements:

We have SSR, but we can opt-out. So, the solution should work without SSR.
We support static sites (with/without SSR). It's a primary use case.

Can the problem happen with static sites and/or without SSR?

pmelab · 2024-09-24T13:08:54Z

Before going into the solution idea, can I make sure I'm on the same page? I'm still not confident if I understand the problem.

The problem has shifted through the course of this PR. I'll try to summarize.

Initial Situation:

Requirements:

We work in a purely static environment, without a JS server runtime. No SSR and no middleware either. Simplest case would be dist/public served by NGINX for example.
404 error pages are regular pages, maintained in a CMS.
There is an externally maintained list of path redirects (e.g. path /a redirects visitors to page /b) that is already applied by the HTTP server.

Problems:

When navigating with Waku's Link component, the HTTP server has no chance to apply it's built-in redirects or 404 fallbacks.
- Waku Link's that point to non-existing paths don't show the 404 page, but the error boundary message.
- Waku Links that point to redirected pages don't redirect but show the error boundary message as well.
This can be reproduced both in SSR and SSG, but in SSR it's easier to solve, since redirect and fallback logic can be applied at time of the request.

After the findings from my "brainstorming"

✅ Adding the same redirects for the RSC directory (e.g. /RSC/a.txt to /RSC/b.txt) to the HTTP server makes Link correctly follow redirects.
✅ Making the HTTP server fall back to /RSC/404.txt for all requests that ask for something in /RSC/* that does not exist, cause Link with broken paths to display the correct custom 404-page.
- 🛑 .... but it also changes the window.location to /404. It should remain the non-existent path that the link pointed to instead.

Note

The Code in this PR does not reflect its state of discussion any more. I was able to solve everything except that last point outside of Waku.

dai-shi · 2024-09-24T13:36:10Z

Thanks for the summary. That helps my understanding.
So, my mental model is that server routing and client routing are different things.

Server routing is done by the HTTP server (NGINX, Waku server, or something). It fetches HTML (and RSC).
Client routing is basically <Link> with JS enabled. It fetches RSC only.

404.html is basically only for server routing.
404.tsx with waku/router is also basically for server routing (but can it be used for client routing? that would be nice.)

e.g. /RSC/a.txt to /RSC/b.txt

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.

So, I feel like there should be something that waku/router could do.

pmelab · 2024-09-24T14:51:27Z

That actually depends on an internal spec that /a corresponds to /RSC/a.txt, but it may not hold in the future.

Yes, thats a flaw here.

So, I feel like there should be something that waku/router could do.

It could handle 404 responses when trying to fetch RSC:

Test the corresponding "real" path to see if it emits a redirect (from NGINX) and attempt to load the RSC for the redirect target path instead. That would mitigate the dependency on internals above.
Fall back to loading the RSC for 404.tsx if available.

pmelab · 2024-09-25T07:24:23Z

Re-started this PR with a detailed test case how I would expect routing to work.

On the 404 page, `parseRoute` __always__ returned the 404 page itself. Now this is changed for occasions where it is used to actually navigate.

dai-shi

I don't think supporting redirect with client navigation is feasible, conceptually.
At least it's not the work of the minimal API. As for waku/router, it might be technically possible like adding { REDIRECT: '...' } in the response. But, the user-facing API is uncertain. I think we should learn more about the use cases and community standard with other frameworks.

pmelab · 2024-09-30T03:38:46Z

This solution does not work in a fully static environment ☹️ Only with a server side router.

dai-shi · 2024-09-30T03:41:54Z

Good point...

For static sites, we can't add JSON body to 404 response anyway. Should we go back to double-fetching solution (but in router/client.ts), or always embed 404?

Actually, we can send 404 content on the initial load and cache it on the client. Let me see. (It increases the size with static sites though.)

pmelab · 2024-09-30T03:55:59Z

For static sites, we can't add JSON body to 404 response anyway.

It would work, but then the RSC path would have to become public API, so an HTTP server can reliably fall back to /RSC/404.txt.

Actually, we can send 404 content on the initial load and cache it on the client. Let me see. (It increases the size with static sites though.)

Does that mean embedding the 404 content into each static html page? With CDNs charging by transfer volume, thats not great.

dai-shi · 2024-09-30T03:59:57Z

Does that mean embedding the 404 content into each static html page?

Yeah, kind of. Okay, I'll try another solution.

pmelab · 2024-09-30T04:01:36Z

It would work, but then the RSC path would have to become public API, so an HTTP server can reliably fall back to /RSC/404.txt.

The more i think about this, the more i like it. It would also allow to handle redirects without Waku's knowledge. But I have no clue how likely this would break. Is there a scenario where "real"- and RSC-path don't match?

dai-shi · 2024-09-30T04:11:17Z

Is there a scenario where "real"- and RSC-path don't match?

Not now, but I feel like this convention isn't very flexible. I would definitely like to explore other solutions that don't match.

More than that, 404.txt isn't Waku minimal API's spec. It's waku/router's spec. So, it shouldn't be a public API.

I don't feel comfortable with depending on HTTP server's config. Let me work on something and you might come up with an idea to improve.

pmelab · 2024-09-30T04:42:00Z

Should i make the test cases "minimal"? (not use server side at all)

dai-shi · 2024-09-30T12:14:09Z

Let me work on something

I thought about getting 404.txt on the initial load and caching it, but as it can be dynamic, it can't be cached. So, I'd try something close to your approach.

Should i make the test cases "minimal"? (not use server side at all)

what do you mean? test static site case?

dai-shi

request for review

new commits

pmelab · 2024-09-30T14:38:22Z

Should i make the test cases "minimal"? (not use server side at all)

what do you mean? test static site case?

Run the client side test cases with serve instead of waku start. To make sure it always works without a node server. Like in the "render type" test case:

waku/e2e/render-type.spec.ts

Line 35 in 8b38ceb

cp = exec(`pnpm serve -l ${port} dist/public`, { cwd });

dai-shi · 2024-09-30T22:27:19Z

Run the client side test cases with serve instead of waku start.

That's nice!

pmelab · 2024-10-01T08:24:02Z

Run the client side test cases with serve instead of waku start.

That's nice!

There we go!

dai-shi · 2024-10-01T08:36:15Z

Thanks! Would you like to give a review too?

pmelab

I dont seem to have the permission to "approve". Just this comment.

dai-shi · 2024-10-01T09:11:09Z

Yes, I know. That works. Thanks!

packages/waku/src/router/client.ts

pmelab · 2024-10-02T09:00:28Z

Created a new PR for the redirect case: #928

vercel bot deployed to Preview September 19, 2024 14:50 View deployment

vercel bot deployed to Preview September 19, 2024 15:10 View deployment

pmelab force-pushed the broken-link-handling branch 2 times, most recently from 5b1ae68 to 99b5428 Compare September 25, 2024 07:22

pmelab force-pushed the broken-link-handling branch from 99b5428 to 782caad Compare September 25, 2024 07:25

pmelab added 7 commits September 25, 2024 12:23

test: detailed test cases for server- and client navigation cases

458861d

test: make 404 page static

b95ee57

test: disambiguate redirect link name

e6c350f

test: also test navigation back to the index page

6ecb59c

fix: allow 404 pages to client side navigate

614a3ff

On the 404 page, `parseRoute` __always__ returned the 404 page itself. Now this is changed for occasions where it is used to actually navigate.

fix: handle 404 and redirect responses when fetching RSC

f094430

fix: don't push "/404" to history

4671527

pmelab force-pushed the broken-link-handling branch from 782caad to 4671527 Compare September 25, 2024 10:26

vercel bot deployed to Preview September 30, 2024 02:59 View deployment

dai-shi previously approved these changes Sep 30, 2024

View reviewed changes

feat: unstable_enhanceCreateData

8c9c408

vercel bot deployed to Preview September 30, 2024 13:44 View deployment

dai-shi reviewed Sep 30, 2024

View reviewed changes

test: run client side test cases without "waku start"

f8fae52

pmelab commented Oct 1, 2024

View reviewed changes

Merge branch 'main' into broken-link-handling

49e619f

dai-shi approved these changes Oct 1, 2024

View reviewed changes

packages/waku/src/router/client.ts Show resolved Hide resolved

Update client.ts

70fe57e

vercel bot deployed to Preview October 1, 2024 09:16 View deployment

run prettier

5cdcb8d

vercel bot deployed to Preview October 1, 2024 14:01 View deployment

dai-shi merged commit caf1a18 into dai-shi:main Oct 1, 2024
28 checks passed

pmelab mentioned this pull request Oct 2, 2024

feat: client side redirects #928

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 and broken link handling #895

404 and broken link handling #895

pmelab commented Sep 19, 2024 •

edited

Loading

vercel bot commented Sep 19, 2024 •

edited

Loading

codesandbox-ci bot commented Sep 19, 2024 •

edited

Loading

dai-shi commented Sep 19, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 21, 2024

pmelab commented Sep 21, 2024

pmelab commented Sep 23, 2024

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 •

edited

Loading

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 •

edited

Loading

pmelab commented Sep 25, 2024

dai-shi left a comment

pmelab commented Sep 30, 2024

dai-shi commented Sep 30, 2024 •

edited

Loading

pmelab commented Sep 30, 2024

dai-shi commented Sep 30, 2024

pmelab commented Sep 30, 2024 •

edited

Loading

dai-shi commented Sep 30, 2024

pmelab commented Sep 30, 2024

dai-shi commented Sep 30, 2024 •

edited

Loading

dai-shi left a comment

pmelab commented Sep 30, 2024 •

edited

Loading

dai-shi commented Sep 30, 2024

pmelab commented Oct 1, 2024

dai-shi commented Oct 1, 2024

pmelab left a comment

dai-shi commented Oct 1, 2024

pmelab commented Oct 2, 2024

404 and broken link handling #895

404 and broken link handling #895

Conversation

pmelab commented Sep 19, 2024 • edited Loading

The problem

Solution

Note

vercel bot commented Sep 19, 2024 • edited Loading

codesandbox-ci bot commented Sep 19, 2024 • edited Loading

dai-shi commented Sep 19, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 20, 2024

dai-shi commented Sep 20, 2024

pmelab commented Sep 21, 2024

pmelab commented Sep 21, 2024

Frontend-redirects

Redirect pages

Redirect RSC payloads

createRedirect

pmelab commented Sep 23, 2024

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 • edited Loading

Initial Situation:

After the findings from my "brainstorming"

dai-shi commented Sep 24, 2024

pmelab commented Sep 24, 2024 • edited Loading

pmelab commented Sep 25, 2024

dai-shi left a comment

Choose a reason for hiding this comment

pmelab commented Sep 30, 2024

dai-shi commented Sep 30, 2024 • edited Loading

pmelab commented Sep 30, 2024

dai-shi commented Sep 30, 2024

pmelab commented Sep 30, 2024 • edited Loading

dai-shi commented Sep 30, 2024

pmelab commented Sep 30, 2024

dai-shi commented Sep 30, 2024 • edited Loading

dai-shi left a comment

Choose a reason for hiding this comment

pmelab commented Sep 30, 2024 • edited Loading

dai-shi commented Sep 30, 2024

pmelab commented Oct 1, 2024

dai-shi commented Oct 1, 2024

pmelab left a comment

Choose a reason for hiding this comment

dai-shi commented Oct 1, 2024

pmelab commented Oct 2, 2024

pmelab commented Sep 19, 2024 •

edited

Loading

vercel bot commented Sep 19, 2024 •

edited

Loading

codesandbox-ci bot commented Sep 19, 2024 •

edited

Loading

`createRedirect`

pmelab commented Sep 24, 2024 •

edited

Loading

pmelab commented Sep 24, 2024 •

edited

Loading

dai-shi commented Sep 30, 2024 •

edited

Loading

pmelab commented Sep 30, 2024 •

edited

Loading

dai-shi commented Sep 30, 2024 •

edited

Loading

pmelab commented Sep 30, 2024 •

edited

Loading