Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent handling of welcome files between Jetty 10 and 12 #9910

Closed
grgrzybek opened this issue Jun 15, 2023 · 29 comments · Fixed by #9934
Closed

Inconsistent handling of welcome files between Jetty 10 and 12 #9910

grgrzybek opened this issue Jun 15, 2023 · 29 comments · Fixed by #9934
Assignees
Labels
Bug For general bugs on Jetty side

Comments

@grgrzybek
Copy link
Contributor

grgrzybek commented Jun 15, 2023

Jetty version(s)
12.0.0.beta1

Java version/vendor (use: java -version)

$ java -version
java version "11.0.19" 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)

OS type/version

$ uname -srv
Linux 6.3.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jun  5 15:45:04 UTC 2023

Description
When working on Pax Web welcome file handling, I found some differences between Jetty 10 and Jetty 12.

How to reproduce?

I have a quite complicated setup related to welcome file handling. For Pax Web 9 / Jetty 10 it is:
https://github.com/ops4j/org.ops4j.pax.web/blob/web-9.0.9/pax-web-jetty/src/test/java/org/ops4j/pax/web/service/jetty/internal/UnifiedJettyTest.java#L659

Pax Web provides a copy of Jetty's DefaultServlet just to unprivate some methods and uses its extension called org.ops4j.pax.web.service.jetty.internal.web.JettyResourceServlet which is more aware of OSGi resource handling.

In this setup (paxWebWelcomePagesWithDifferentContext() test) I have:

  • one org.eclipse.jetty.ee10.servlet.ServletContextHandler with /c context path and { "index.y", "index.x" } welcome files
  • 5 servlets:
    • org.eclipse.jetty.ee10.servlet.DefaultServlet mapped to / (redirectWelcome=false, welcomeServlets=true, pathInfoOnly=false, baseResource="target/b1")
    • org.eclipse.jetty.ee10.servlet.DefaultServlet mapped to /r/* (redirectWelcome=false, welcomeServlets=true, pathInfoOnly=true, baseResource="target/b2")
    • org.eclipse.jetty.ee10.servlet.DefaultServlet mapped to /s/* (redirectWelcome=true, welcomeServlets=true, pathInfoOnly=true baseResource="target/b3")
    • servlet mapped to { "*.x", "*.y" } to print out values of attributes like jakarta.servlet.forward.request_uri
    • servlet mapped to /gateway/* which can be controlled to do redirect, forward or include

While almost everything works like it worked with Jetty 10, there's a difference when handling /c/r/ URI.

In Jetty 10:

  • servlet mapped to /r/* is invoked first
  • in doGet(), servletPath is set to / because pathInfoOnly is true
  • javax.servlet.http.HttpServletRequest#getPathInfo() returns /
  • pathInContext = URIUtil.addPaths(servletPath, pathInfo); is / because both servletPath and pathInfo are /
  • org.eclipse.jetty.http.HttpContent.ContentFactory#getContent() returns a resource which is directory, so org.eclipse.jetty.server.ResourceService#sendWelcome() is called
  • org.eclipse.jetty.server.ResourceService.WelcomeFactory#getWelcomeFile() is called with /
  • _welcomes == ["index.y", "index.x"], but there are no such resources, so org.eclipse.jetty.servlet.ServletHandler#getMatchedServlet() is called with /index.y because URIUtil.addPaths(pathInContext, s) returned /index.y for / and index.y
  • proper servlet mapped to { "*.x", "*.y" } is returned.

In Jetty 12:

  • servlet mapped to /r/* is invoked first
  • in org.eclipse.jetty.ee10.servlet.DefaultServlet#doGet():
    • isPathInfoOnly() returns true (as configured)
    • jakarta.servlet.http.HttpServletRequest#getPathInfo() returns / (as in Jetty 10)
    • URIUtil.encodePath(req.getPathInfo()) returns / (correct)
    • org.eclipse.jetty.server.ResourceService#getContent() returns a directory resource (correct)
    • org.eclipse.jetty.server.ResourceService#doGet() continues and because the content is directory, org.eclipse.jetty.server.ResourceService#sendWelcome() is called
  • org.eclipse.jetty.server.ResourceService#welcome() is called
  • org.eclipse.jetty.server.ResourceService#processWelcome() is called
  • org.eclipse.jetty.server.ResourceService.WelcomeFactory#getWelcomeTarget() is called
    • request.getPathInfo() is / (ok)
    • org.ops4j.pax.web.service.jetty.internal.web.DefaultServlet#isPathInfoOnly() returns true (as configured)
    • requestTarget is set to /
    • however pathInContext is set to the result of org.eclipse.jetty.server.Request#getPathInContext() and it's /r/
    • URIUtil.addPaths(pathInContext, welcome) returns /r/index.y for the first welcome file instead of expected /index.y

So while Jetty 10 consistently uses "path info" (full URI minus context path minus servlet path), in Jetty 12, the path in context is different in various places:

  • in org.eclipse.jetty.ee10.servlet.DefaultServlet#doGet(), encodedPathInContext is set to the result of URIUtil.encodePath(req.getPathInfo()) which is / (because pathInfoOnly is set to true and "path info" is /)
  • however further down the thread, in org.ops4j.pax.web.service.jetty.internal.web.DefaultServlet.ServletResourceService#getWelcomeTarget(), pathInContext varilable is set to the result of Request.getPathInContext(coreRequest) which doesn't bother with the setting of pathInfoOnly.
  • requestTarget variable is correct: requestTarget = isPathInfoOnly() ? request.getPathInfo() : pathInContext, but the welcome file is calculated using String welcomeInContext = URIUtil.addPaths(pathInContext, welcome) - IMO, requestTarget should be used here instead of pathInContext.
@grgrzybek grgrzybek added the Bug For general bugs on Jetty side label Jun 15, 2023
@grgrzybek
Copy link
Contributor Author

https://github.com/eclipse/jetty.project/blob/jetty-12.0.0.beta1/jetty-ee10/jetty-ee10-servlet/src/main/java/org/eclipse/jetty/ee10/servlet/DefaultServlet.java#L1053

This is the place where either

  • requestTarget should be passed to URIUtil.addPaths(), or
  • pathInContext should be calculated depending on the value of isPathInfoOnly()

@janbartel janbartel self-assigned this Jun 15, 2023
@gregw
Copy link
Contributor

gregw commented Jun 15, 2023

@janbartel and I have had a look at this one. I think the problem is that currently in ee10 DefaultServlet, we work out the encodedPathInContext correctly using the isPathInfoOnly parameter, then use that to select a content. However, when we wrap the servlet request as a core request (to forward to the resource service), we don't use that encodedPathInContext and instead let that request recreate a pathInContext from the servletPath and pathInfo. There is then some attempted fix in the welcome handling to try to strip the servletPath back out again if isPathOnly is set.... but I think it is all too late.

The fix is that once we have worked out the content that we wish to serve, then the request wrapper we make should have the correct pathInContext to represent that resource. So we need to pass that info into the constructor of ServletCoreRequest from doGet. We can then probably get rid of any special handling elsewhere.

I think @janbartel has a test harness that reproduces this problem and will soon make a branch.

@lorban as we'd like you to take the lead with resources, can you look into this one..... however, I think @lachlan-roberts also did some work on this area, so you might like to delegate and/or collaborate with him on this?

@gregw gregw assigned lorban and unassigned janbartel Jun 15, 2023
@janbartel
Copy link
Contributor

@lorban I've checked in a branch jetty-12.0.x-9910-pathInfoOnly with a repro test case in it: ServletContextHandlerTest.testPathInfoOnly. Now I've got a better handle on the issue, I think you should move the test case to the DefaultServletTest.

@grgrzybek
Copy link
Contributor Author

I've added a review comment under #9934, because I believe org.eclipse.jetty.ee10.servlet.DefaultServletTest#testPathInfoOnly() should assume 404 returned for /c1/r/ request, not the result from index-servlet servlet.

@janbartel
Copy link
Contributor

@grgrzybek what's your reasoning there?

@grgrzybek
Copy link
Contributor Author

My reasoning is that Jetty 10 (and I think 11 too) is handling welcome files + "path info only" by delegating to request dispatcher for /r/index.y, while in Jetty 12 it's for /index.y only. org.eclipse.jetty.ee10.servlet.DefaultServletTest#testPathInfoOnly() verifies output of index-servlet, while it should get 404.

I added more details under #9934.

@janbartel
Copy link
Contributor

@grgrzybek I can't see anything extra under #9934?

@grgrzybek
Copy link
Contributor Author

Hmm, maybe it's not starting the review for closed issue? Here it is:

@lorban @gregw @janbartel I think #9910 is not properly handled here - the behavior is different than in Jetty 10.

I've just checked with Jetty 12 SNAPSHOT.

In Jetty 10 (using servlet names from this DefaultServletTest.testPathInfoOnly() case):

  • I send /c1/r/ request
  • it is being correctly mapped to rdefault servlet (/r/* mapping)
  • org.eclipse.jetty.server.ResourceService#doGet() checks org.eclipse.jetty.server.ResourceService#_pathInfoOnly (which is true), so servletPath is / and pathInfo is /.
  • pathInContext = URIUtil.addPaths(servletPath, pathInfo) results in /
  • org.eclipse.jetty.servlet.DefaultServlet#getWelcomeFile() is being passed / as pathInContext
  • org.eclipse.jetty.servlet.ServletHandler#getMatchedServlet() is being passed /index.y as parameter and the mapped servlet is index-servlet
  • /index.y is returned from org.eclipse.jetty.server.ResourceService.WelcomeFactory#getWelcomeFile()
  • org.eclipse.jetty.server.ResourceService#sendWelcome() continues and this code turns welcome variable into /r/index.y
if (welcome != null)
{
    String servletPath = included ? (String)request.getAttribute(RequestDispatcher.INCLUDE_SERVLET_PATH)
            : request.getServletPath();
    if (_pathInfoOnly)
        welcome = URIUtil.addPaths(servletPath, welcome);
  • context.getRequestDispatcher(URIUtil.encodePath(welcome)) is called with /r/index.y, which maps to rdefault servlet which eventually leads to 404

In Jetty 12 with this #9934 fix:

  • I send /c1/r/ request
  • it is being correctly mapped to rdefault servlet (/r/* mapping)
  • org.eclipse.jetty.server.ResourceService#doGet() doesn't check "path info only" and pathInContext = Request.getPathInContext(request) is /r/ which is being passed to org.eclipse.jetty.server.ResourceService#sendWelcome(), but NOT further down to org.eclipse.jetty.server.ResourceService#welcome()
  • org.eclipse.jetty.ee10.servlet.DefaultServlet.ServletResourceService#getWelcomeTarget() AGAIN calls pathInContext = Request.getPathInContext(coreRequest) and gets /r/
  • welcomeInContext = URIUtil.addPaths(pathInContext, welcome) returns /r/index.y for the first welcome
  • so while Jetty 10 gets mapping for index-servlet, in Jetty 12 we're getting mapping for rdefault servlet but entry.getServletHolder().getServletInstance() != DefaultServlet.this fails this time.
  • nevertheless, /index.y is returned as in Jetty 10
  • however org.eclipse.jetty.server.ResourceService#processWelcome() simply returns new WelcomeAction("/index.y", SERVE)
  • org.eclipse.jetty.ee10.servlet.DefaultServlet.ServletResourceService#serveWelcome() calls jakarta.servlet.ServletContext#getRequestDispatcher() with /index.y parameter and it's mapped to index-servlet

@janbartel
Copy link
Contributor

I'm currently on a train, so will look properly a bit later on. However, the jetty-12 behaviour now seems correct to me. I don't think that jetty-10 should be returning a 404 when there is a servlet mapping for *.y and *.x.

@grgrzybek
Copy link
Contributor Author

grgrzybek commented Jun 26, 2023

I agree that this is not an obvious case. It's not about *.y mapping, but about:

  1. index.y welcome page (welcome pages are defined at context level, not at servlet level)
  2. default servlet mapped to /r/* with path info only set to true
  3. a request for /r/

Chapter 10.10. Welcome Files of Servlet spec doesn't tell much about this scenario (it doesn't show examples when there's default servlet (/), prefix servlet (/r/*) and extension servlet (*.y).

There indeed is a mapping for *.y, but the way welcome pages are handled, each welcome pages (here the first one - index.y) should be appended to directory-like path (which is /c1/r/), giving the absolute path /c1/r/index.y (or context-relative /r/index.y path).

The point is that in Jetty 10, /r/index.y is produced by welcome pages resolver (dispatched using /r/* mapping) and in Jetty 12, /index.y is produced by welcome pages resolve (dispatched using *.y mapping).
In Jetty 10, /r/index.y is handled by /r/* servlet resulting in 404 and in Jetty 12 it's handled by *.y servlet resulting in custom doGet() showing some info.

The path info only is something Jetty specific and isn't covered by the spec. So I could only consider it comparing to what was in Jetty 9 and 10 (and probably 11 too).

@janbartel
Copy link
Contributor

I think in jetty-10 we weren't really handling the pathInfoOnly fully, we just forced the path in a couple of places. However, since you've brought this up :) we've had a good think about it. I think the behaviour in jetty-12 is correct: as you've told the DefaultServlet it should consider pathInfoOnly, then it will calculate everything on the server side without the /r.

@grgrzybek
Copy link
Contributor Author

Tomorrow I'll check how I test this in Pax Web for Tomcat and Undertow ;) Thanks for your patience!

@grgrzybek
Copy link
Contributor Author

I have 3 matching tests called UnifiedXXXTest.paxWebWelcomePagesWithDifferentContext():

These tests:

  • use different embedded container preparation (which is obvious)
  • use Pax Web overrides of these default servlets (but with very minor changes):
    • org.eclipse.jetty.ee10.servlet.DefaultServlet
    • org.apache.catalina.servlets.DefaultServlet
    • io.undertow.servlet.handlers.DefaultServlet

In Jetty 9, Jetty 10, Tomcat 10.1 and Undertow 2.3 the scenario with welcome files works in the same way - index.y is appended to /r/ and the calculated welcome page (using servlet mapping, not resource finding) is /r/index.y, so the final servlet that handles the /r/ after welcome page resolution is NOT the *.y servlet, but /r/* servlet.

Only Jetty 12 works differently...

If you have time and are interested in this issue, here are the steps to check:

(use JDK 17)

$ git clone [email protected]:ops4j/org.ops4j.pax.web.git
$ cd org.ops4j.pax.web
$ mvn clean install -Dcheckstyle.skip=true -Denforcer.skip=true -DskipTests -pl pax-web-jetty -am
$ mvn test -Dcheckstyle.skip=true -Denforcer.skip=true -f pax-web-jetty -Dtest=UnifiedJettyTest#paxWebWelcomePagesWithDifferentContext

The test should fail at this assertion:

// "/r" - no "/index.x" or "/index.y" physical resource, but existing mapping for *.y to indexx servlet
// forward is performed implicitly by Jetty's DefaultServlet (even if mapped to /r/*), forward URI is
// "/r/index.y" (first welcome), but this time, "/r/*" is a mapping with higher priority than "*.y"
// (with "/" servlet, "*.y" had higher priority than "/"), so "resource" servlet is called, this time
// with full URI (no welcome files are checked). Such resource is not found, so we have 404
response = send(port, "/c/r/");
// TODO: https://github.com/eclipse/jetty.project/issues/9910
assertTrue(response.startsWith("HTTP/1.1 404"));

Similar pax-web-tomcat and pax-web-undertow tests work fine.

@janbartel
Copy link
Contributor

@grgrzybek I've had a look at your test cases and code. It seems that Tomcat doesn't support the pathInfoOnly setting, so you've had to implement it yourself, and you've based it on the implementation of DefaultServlet in jetty-10. Looking at the code, I can see that we started pre-pending the servletPath with this commit here: c843b80

I think this was to fix the path to which the welcome would be forwarded. Bearing in mind that the DefaultServlet had forced the servletPath to be / (because pathInfoOnly is true) , when we matched a static welcome resource correctly (eg baseResource + /index.html), we would try to serve it by forwarding it to pathInfo + welcome. In the use-case of the OP, the static resource was mapped to the same servlet, and thus needed to be forwarded back to itself at the original servletPath. So the fix to prepend the servletPath also applied to the case where we wanted to forward to a servlet. Hence the behaviour you see in jetty-10 today.

I don't think jetty-10 was correct to re-add the servletPath regardless of which servlet we are forwarding to: if we are forwarding back to the same servlet to handle the welcome (although this scenario wouldn't make any sense), then we should indeed prepend the servletPath. However, if we're forwarding to some other servlet then it can't be correct to use the same servletPath.

In jetty-12, if we have matched a welcome, then we use the original pathInContext + welcome.

In the case of a static resource, we will forward to pathInContext + welcome. So if you have /r/* mapped to DefaultServlet with pathInfoOnly=true and you ask for /r/ and the file index.html exists within the resourceBase then we will forward you to /r/index.html.

In the case where we matched a servlet as the welcome, and it is not this servlet, we will forward you to the reduced pathInfoInContext + welcome. Eg for the above scenario but with a welcome file of index.x mapped to a servlet at *.x , a request to /r/ would result in a forward to /index.x. It would definitely not be correct to try to forward you to the same servletPath as this request, as by definition it would return to you the same DefaultServlet!

As pathInfoOnly is indicating that we only want to consider pathInfo for resolving a resource to serve, it makes sense that we use it both for static welcome resources and for welcome servlets.

I think forwarding to /r/index.x is definitely wrong, and forwarding to /index.x at least has a chance of hitting a servlet mapped with wildcard *.x.

So in short, I think jetty-10 was wrong, and we've got it right in jetty-12 :)

If we proceed with jetty-12 ee10 the way it is, what is the impact on your project? In other words, how heavily do you rely on the jetty-10 (incorrect IMHO) behaviour for correct functioning? Is there any other way you could achieve the same function without relying on that behaviour?

@grgrzybek
Copy link
Contributor Author

grgrzybek commented Jun 29, 2023

Thanks @janbartel for detailed answer. I'll check everything again.

One more think I didn't mention - in Pax Web I have to set pathInfoOnly=true only for DefaultServlets mapped to something different than /. I'll elaborate on this tomorrow.

@gregw
Copy link
Contributor

gregw commented Jun 29, 2023

@grgrzybek There surely should be a more standard way of achieving what you want rather than extending the default servlet of every container.

If I'm understanding correctly you have static content that is at /context/info/content that you want served by a Servlet at /context/servlet/* when hit by a request with URI /context/servlet/info/content.

Surely a better way to do this would be for the servlet at /servlet/* to get a RequestDispatcher for /info/content and to forward to that?

@grgrzybek
Copy link
Contributor Author

Sure sure @gregw ;) I'm aware that my Pax Web Test case is very artificial, Servlet spec doesn't say very much about welcome pages (except basic info) and OSGi CMPN specifications say zero.

And I extend Jetty/Tomcat/Undertow "default servlets" definitely not to change their behavior but to cover some corner cases (like handling "dir" entries for bundle: protocol URIs). I may adjust Pax Web as well because these UnifiedXXXTest cases are non-osgi specific, while there are much more tests in other modules and these work consistently.

So I'll check again tomorrow. I simply didn't feel comfortable seeing differences between Jetty 10/11 and 12. I had to react - I hope you don't mind ;)

@grgrzybek
Copy link
Contributor Author

grgrzybek commented Jul 3, 2023

According to 3.6. Request Path Elements:

requestURI = contextPath + servletPath + pathInfo

But each of the 3 components has some variants:

  • context path - when it's / (root), it becomes "" in the equation
  • servlet path - when the mapping is /* or "" (yes - empty string mapping is valid), servlet path is "". For extension mapping, the servlet path is full path less the context path
  • path info - the remainder. Can be null

According to 12.2. Specification of Mappings, we have one exception:

The empty string ("") is a special URL pattern that exactly maps to the application's context root,
i.e., requests of the form http://host:port/<context-root>/. In this case the path info is / and the
servlet path and context path is empty string ("").

So with this rule, if we have non root context and "" mapped servlet, context path is empty string, which I feel is some flaw in the specification.

But back to Jetty 12 vs Jetty <12, Tomcat and Undertow case.

  • Pax Web borrows pathInfoOnly init parameter from Jetty and implements it for Tomcat and Undertow (which don't use it originally). This was one of my methods of behavior unification in Pax Web.
  • In Jetty before 12, for default servlet mapped to / with pathInfoOnly=true and welcome file configuration I'm getting endless redirect
  • In Jetty 12, for default servlet mapped to / with pathInfoOnly=true and welcome file configuration I'm getting HTTP 500, because jakarta.servlet.ServletContext#getRequestDispatcher() is invoked with a path not starting with /
  • That's why I explicitly set pathInfoOnly=true only for default servlets mapped to something different than just /. For / mapped default servlet, I set pathInfoOnly=true.

If we analyze from the start, let's check javadoc for org.eclipse.jetty.ee10.servlet.DefaultServlet:

pathInfoOnly
Use true to use only the request pathInfo to look for static resources. Defaults to false.

If we analyze only this statement, I'd say:

  • there's nothing about welcome files for servlets (only for physical resources)
  • it doesn't much matter for default servlet mapped to /, because in the equation requestURI = contextPath + servletPath + pathInfo, servletPath is empty string, so pathInfo should be the same whether pathInfoOnly is true or false (but as I wrote, it causes endless redirect or HTTP 500 for true in / mapped default servlet). Also because of the mapping rules, it doesn't matter for default servlet mapped to /* or the exotic ""
  • so it matters only for default servlet mapped to /xxx/* (non-empty prefix mapping) or *.ext (extension mapping)

OK, using only the last point and root context (because it's not relevant here), we have simple resource-related scenario:

  • for /r/* mapped default servlet, /r/readme.txt URI and pathInfoOnly=true, default servlet should look for readme.txt context resource (in WAR deployment it's just root of the WAR)
  • for /r/* mapped default servlet, /r/readme.txt URI and pathInfoOnly=false, default servlet should look for r/readme.txt context resource (in WAR deployment it's readme.txt under r directory of the WAR)

Finally let's add "welcome file servlets" to the analysis.

Mind that Servlets specification doesn't include pathInfoOnly concept and it acts as if it was false.

Dissecting 10.10. Welcome Files:

valid partial request == a URI that corresponds to a directory entry in the WAR not mapped to a web component

  • I think I can safely clarify that "not mapped" means "mapped to default/implicit/resource servlet" - even if mapped to something different than /.
  • Even if Jetty's default servlet mapped to e.g., /r/* seems to be real "web component", it is implementing welcome files, so the rules apply

The use for this facility is made clear by the following common example: A welcome file of index.html
can be defined so that a request to a URL like host:port/webapp/directory/, where directory is an entry
in the WAR that is not mapped to a servlet or JSP page, is returned to the client as
host:port/webapp/directory/index.html.

This quote confirms the expected behavior when default servlet is mapped to / - because it explicitly mentions directory WAR entry.

The welcome file list is an ordered list of partial URLs with no trailing or leading "/"

It doesn't mention / within the welcome file.

The web server must append each welcome file in the order specified in the
deployment descriptor to the partial request and check whether a static resource in the WAR is
mapped to that request URI.

Again, default servlet is assumed here. Also "partial request" is full servletPath + pathInfo.

Finally:

If no match is found, the web server MUST again append each welcome
file in the order specified in the deployment descriptor to the partial request and check if a servlet is
mapped to that request URI.

And after this very long (and hopefully interesting when reading in the future) analysis, I think you're right @janbartel:

  • for /r/* mapped default servlet with index.y welcome file, /r/ request URI and pathInfoOnly=true, the servlet should perform forward/redirect to /index.y instead of /r/index.y. So Jetty 12 is correct and Jetty <=11 is not
  • for /r/* mapped default servlet with index.y welcome file, /r/ request URI and pathInfoOnly=false, the servlet should perform forward/redirect to /r/index.y

Thanks for your patience ;)

@gregw
Copy link
Contributor

gregw commented Jul 3, 2023

The empty string ("") is a special URL pattern that exactly maps to the application's context root,
i.e., requests of the form http://host:port/<context-root>/. In this case the path info is / and the
servlet path and context path is empty string ("").

So with this rule, if we have non root context and "" mapped servlet, context path is empty string, which I feel is some flaw in the specification.

Yeah the servlet specification matching algorithms show you that evolution is a weird thing, specially when you start with something non-optimal!

But back to Jetty 12 vs Jetty <12, Tomcat and Undertow case.
...

Thanks for the extra detail. But I still need a bit more to fully understand what we should do...
@grgrzybek Can you follow through and say exactly where the logic is not ok for your use-case.
@lorban Can you confirm the following is correct, as I think there are some inconsistencies

Assuming context of "/ctx" and that you have a DefaultServlet mapped to "/s/*", plus an "index.xyz" welcome file. Then the issue is what to do with a request to something like "/ctx/s/dir/"?

  • The request arrives at the default sevlet with contentPath=/ctx, servletPath=/s and pathInfo=/dir/
  • Because it is pathInfoOnly, it looks for a resource at "/dir/ " effectively by calling getServletContext().getResource("/dir/") (or the equivalent of that)
  • A resource is found and it exists and is a directory, let's say that it is at "file://var/web/docroot/dir/"
  • We apply the welcome file logic:
    • we look for files matching "file://var/web/docroot/dir/index.xyz"
      • if found our welcomeTarget is the full pathInContext, i.e. "/s/dir/index.xyz"
    • we also look for welcome servlets, currently by looking for a servlet match within the context for "/dir/index.xyz"
      • if found (and no file match is found), then welcomeTarget is pathInfoOnly i.e. "/dir/index.xyz"
  • Then depending on the welcome mode:
    • If REDIRECT or REHANDLE, the context path is added back, so we will get:
      • "/ctx/s/dir/index.xyz" if a file exists, which is fine for redirection
      • "/ctx/dir/index.xyz" if it does not, which is not OK for a redirection
    • SERVE we just
      • leave the target at is:
        • "/s/dir/index.xyz" if a file exists, which wont be found in the getResource to
        • "/dir/index.xyz" if it does not, which will be found in the getResource

So in my reading of the code, we are inconsistent in what we return from "getWelcomeTarget": sometimes it has the servletPath, sometimes it does not. It is also hard to see how this string can be used independently of the welcome file mode if the mode is SERVE, then we have already added the welcome to the Content resource and found the welcome resource... we then throw that resource away, returning a string that will be passed to getResource(String) to get it again... but the same string will be used to do a redirect in different mode... so it can be in two different spaces and can't be right for both.

Can we even support servlet welcome files if the mode is SERVE? Should we force the mode to REHANDLE if the found welcome is a servlet? I don't see how we can just return a single string without the mode and expect it to work? Perhaps we should return a record that contains:

  1. The found welcome resource which can be served directly if the mode is SERVE without looking it up again
  2. The pathInContext which can be used for a REDIRECT or REHANDLE, which must end back on the /s/* servlet, so in this case it should be "/s/dir/index.xyz"
  3. A forced mode. If the welcome "file" is a servlet, then the mode cannot be SERVE. Or maybe this can be worked out because the resource above will be null... or will it? If it is a JSP the file will still exist, but we can't just server it!

Anyway, I think we still have work to do here to clean this up.... unfortunately it will not make beta3, so we might need a beta4 or sneak something in for 12.0.0

@gregw
Copy link
Contributor

gregw commented Jul 3, 2023

@lorban TL;DR; summary of above: The getWelcomeTarget(HttpContent, Request) method needs to return a record containing:

  • Resource of the welcome file (or maybe the HttpContent itself?) if any is found
  • The pathInContext of the welcome file, which always includes the servletPath. This can be used for a redirect or rehandle without further manipulation (other than perhaps contextPath if needed)
  • A boolean indicating if there was a servlet match (or maybe the servlet match itself) different from the defaultServlet. If the match is different, then SERVE mode cannot be used, even if the welcome resource is non-null

@grgrzybek
Copy link
Contributor Author

Thanks for the extra detail. But I still need a bit more to fully understand what we should do... @grgrzybek Can you follow through and say exactly where the logic is not ok for your use-case. @lorban Can you confirm the following is correct, as I think there are some inconsistencies

Assuming context of "/ctx" and that you have a DefaultServlet mapped to "/s/*", plus an "index.xyz" welcome file. Then the issue is what to do with a request to something like "/ctx/s/dir/"?

* The request arrives at the default sevlet with contentPath=/ctx, servletPath=/s and pathInfo=/dir/

* Because it is pathInfoOnly, it looks for a resource at "/dir/ " effectively by calling getServletContext().getResource("/dir/") (or the equivalent of that)

* A resource is found and it exists and is a directory, let's say that it is at "file://var/web/docroot/dir/"

* We apply the welcome file logic:
  
  * we look for files matching "file://var/web/docroot/dir/index.xyz"
    
    * if found our `welcomeTarget` is the full pathInContext, i.e. "/s/dir/index.xyz"
  * we also look for welcome servlets, currently by looking for a servlet match within the context for "/dir/index.xyz"
    
    * if found (and no file match is found), then `welcomeTarget` is pathInfoOnly i.e. "/dir/index.xyz"

* Then depending on the welcome mode:
  
  * If REDIRECT or REHANDLE, the context path is added back, so we will get:
    
    * "/ctx/s/dir/index.xyz" if a file exists, which is fine for redirection
    * "/ctx/dir/index.xyz" if it does not, which is not OK for a redirection
  * SERVE we just
    
    * leave the target at is:
      
      * "/s/dir/index.xyz" if a file exists, which wont be found in the getResource to
      * "/dir/index.xyz" if it does not, which will be found in the getResource

I don't exactly understand found in the getResource... My understanding of the logic is:

resource existence phase of welcome file resolution:

  • if pathInfoOnly=true, "file://var/web/docroot/dir/index.xyz" resource will be checked
  • if pathInfoOnly=false, "file://var/web/docroot/s/dir/index.xyz" resource will be checked

servlet mapping phase of welcome file resolution:

  • if pathInfoOnly=true, /dir/index.xyz mapping will be checked
  • if pathInfoOnly=false, /s/dir/index.xyz mapping will be checked, which will get us back to /s/* mapping, but there's explicit check entry.getServletHolder().getServletInstance() != DefaultServlet.this, so we should get 404

and actually I agree that Jetty 12 is fine - with:

  • default servlet mapped to /s/*
  • pathInfoOnly=true
  • another servlet mapped to *.xyz
  • welcome files { index.xyz }
  • incoming /s/ request

Jetty 12 does this:

  • path info is '/' (directory=true)
  • welcome file is appended (/index.xyz)
  • resource is not found
  • *.xyz servlet is found (different than this resource servlet)
  • servlet is forwarded (or redirected) to with /index.xyz

While in Jetty <=11, the last point is:

  • servlet is forwarded (or redirected) to with /s/index.xyz

This is the biggest difference and my initial concern. In Pax Web I assumed:

  • "servlet mapping phase of welcome file resolution" is taking pathInfoOnly=true into account, so servlet mapped to *.xyz is found (in all Jetty versions)
  • forward/redirect was ignoring pathInfoOnly (which is not the case in Jetty 12)

So in my reading of the code, we are inconsistent in what we return from "getWelcomeTarget": sometimes it has the servletPath, sometimes it does not. It is also hard to see how this string can be used independently of the welcome file mode if the mode is SERVE, then we have already added the welcome to the Content resource and found the welcome resource... we then throw that resource away, returning a string that will be passed to getResource(String) to get it again... but the same string will be used to do a redirect in different mode... so it can be in two different spaces and can't be right for both.

Can we even support servlet welcome files if the mode is SERVE? Should we force the mode to REHANDLE if the found welcome is a servlet? I don't see how we can just return a single string without the mode and expect it to work? Perhaps we should return a record that contains:

1. The found welcome resource which can be served directly if the mode is SERVE without looking it up again

2. The pathInContext which can be used for a REDIRECT or REHANDLE, which must end back on the /s/* servlet, so in this case it should be "/s/dir/index.xyz"

3. A forced mode.  If the welcome "file" is a servlet, then the mode cannot be SERVE.   Or maybe this can be worked out because the resource above will be null... or will it?  If it is a JSP the file will still exist, but we can't just server it!

Anyway, I think we still have work to do here to clean this up.... unfortunately it will not make beta3, so we might need a beta4 or sneak something in for 12.0.0

From my (user) perspective, I think pathInfoOnly concept has to be strengthened and made more consistent. While current Javadoc for org.eclipse.jetty.ee10.servlet.DefaultServlet says:

pathInfoOnly
Use true to use only the request pathInfo to look for static resources. Defaults to false.

It has to effectively behave like this:

pathInfoOnly
Skip using servlet path for resource lookup and welcome file resolution (resource or servlet based)

(that's working description of course)

And somehow it has to be emphasized that in Jetty <=11, it wasn't consistent.

In yet another words, there are two stages of welcome file resolution:

  1. finding physical resource or mapped servlet
  2. serving the found resource/servlet (whether by forward or redirect)

In Jetty 12, pathInfoOnly=true is used in both stages.
In Jetty <=11, pathInfoOnly=true is used only in the first stage.

@lorban
Copy link
Contributor

lorban commented Jul 3, 2023

@grgrzybek is that a good summary of what you are saying?

  • 12 is correct and should stay as is currently is
  • 11 is wrong and should be changed
  • pathInfoOnly javadoc should be improved

@grgrzybek
Copy link
Contributor Author

@lorban that's what seems to be the best summary ;)

I don't want to decide about Jetty 11 - pathInfoOnly is proprietary and I can't estimate the impact of the change for older Jetty versions.
Non-/ mapped default servlets are also not very common (they are in Pax Web to implement Whiteboard resource mappings though).

@gregw
Copy link
Contributor

gregw commented Jul 3, 2023

And somehow it has to be emphasized that in Jetty <=11, it wasn't consistent.

In yet another words, there are two stages of welcome file resolution:

  1. finding physical resource or mapped servlet
  2. serving the found resource/servlet (whether by forward or redirect)

In Jetty 12, pathInfoOnly=true is used in both stages. In Jetty <=11, pathInfoOnly=true is used only in the first stage.

I don't think 12 is consistent either, as per my previous comment. We definitely need to clean this feature up and make it consistent.

I propose that we proceed by making a very exhaustive unit test that checks:

  • pathInfo true/false
  • All modes of welcome: SERVE, REHANDLE, REDISPATCH
  • All combinations of welcome file: no file and no servlet, file exists no servlet match, no file but servlet match, file exists and servlet match.
  • Maybe we need to also do the variation of a servlet *.xyz match vs a /foo/specialWelcome.xyz match?

We can then review the expected results and agree what they should be. Only then do we update the implementation to make the test pass.

@gregw
Copy link
Contributor

gregw commented Jul 3, 2023

  • 12 is correct and should stay as is currently is

@lorban I think 12 is correct only in some combinations. I'm pretty sure it is inconsistent if we give it other combinations. See my proposed unit test above, as I think we need that to get some clarity. Can you write that?

@grgrzybek
Copy link
Contributor Author

grgrzybek commented Jul 3, 2023

I personally declare that every morning, when I don't do other tasks, I will fetch jetty-12.0.x branch, build it and check my tests from Pax Web ;)
I don't assume Pax Web can't be changed - I'll simply provide my feedback, ok?

@lorban
Copy link
Contributor

lorban commented Jul 3, 2023

@gregw I'm on it.

grgrzybek added a commit to ops4j/org.ops4j.pax.web that referenced this issue Jul 3, 2023
@grgrzybek
Copy link
Contributor Author

Here's my test: https://github.com/ops4j/org.ops4j.pax.web/blob/100c567d6/pax-web-jetty/src/test/java/org/ops4j/pax/web/service/jetty/internal/UnifiedJettyTest.java#L374

I've just pushed the changes after adjustments with Jetty 12 changes (see the comments in the class).

Currently I marked the failing assert with:

// >>> current failure (HTTP 500 is only when the resource doesn't exist) <<<

While I adjusted the scenario where the forward is for /index.y instead of /r/index.y, I see more failures related to include method.

As you can see the key part of my test is the "gateway" servlet which performs forward/include based on query params.

If you like, please take this test simply replacing:

JettyResourceServlet jrs1 = new JettyResourceServlet(p1, null);

with (fqcn is org.eclipse.jetty.ee10.servlet.DefaultServlet).

DefaultServlet jrs1 = new DefaultServlet(p1, null);

@grgrzybek
Copy link
Contributor Author

For example, when fetching /r/, DefaultServlet.ServletResourceService#getWelcomeTarget() this works:

if (isPathInfoOnly() && !isIncluded(getServletRequest(coreRequest)))
    welcomeTarget = URIUtil.addPaths(getServletRequest(coreRequest).getPathInfo(), welcome);

and /index.y is returned.

but when I use /gateway/x?what=include&where=/r/, welcomeTarget is not set with the above branch ("include" is used) and org.eclipse.jetty.ee10.servlet.ServletHandler#getMappedServlet() is being passed welcomeInContext=/r/index.y which maps back to the same servlet and null is returned.

lorban added a commit that referenced this issue Jul 4, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 7, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 7, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 12, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 12, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 12, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 13, 2023
Signed-off-by: Ludovic Orban <[email protected]>
lorban added a commit that referenced this issue Jul 13, 2023
Signed-off-by: Ludovic Orban <[email protected]>
sbordet pushed a commit that referenced this issue Jul 14, 2023
* Added DefaultServlet combinations tests.
* Removed pathInfoOnly handling present in Jetty 11, because it is always true for mapping to "/", and always false for other mappings.

Signed-off-by: Ludovic Orban <[email protected]>
grgrzybek added a commit to ops4j/org.ops4j.pax.web that referenced this issue Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants