Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 on filenames with accents #4329

Closed
bmichotte opened this issue Sep 3, 2021 · 12 comments · Fixed by #4332
Closed

404 on filenames with accents #4329

bmichotte opened this issue Sep 3, 2021 · 12 comments · Fixed by #4332
Labels
bug 🐞 Something isn't working
Milestone

Comments

@bmichotte
Copy link

Hi there,

Caddy gives me an error 404 for files with accents like cropped-Capture-d’écran-2019-12-02-à-16.22.37.png (yes, I know, this is awfull), which are called cropped-Capture-d%E2%80%99%C3%A9cran-2019-12-02-%C3%A0-16.22.37.png by the browser.

Obviously the file does exists and the files without accents gives a 200 OK.

Log file is

{
    "level": "error",
    "ts": 1630683589.3759747,
    "logger": "http.log.access.log16",
    "msg": "handled request",
    "request": {
        "remote_addr": "91.180.123.81:51607",
        "proto": "HTTP/2.0",
        "method": "GET",
        "host": "[REDACTED]",
        "uri": "/wp-content/uploads/2019/12/cropped-Capture-d%E2%80%99%C3%A9cran-2019-12-02-%C3%A0-16.22.37.png",
        "headers": {
            "Upgrade-Insecure-Requests": [
                "1"
            ],
            "Sec-Fetch-Dest": [
                "document"
            ],
            "User-Agent": [
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36"
            ],
            "Accept": [
                "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
            ],
            "Sec-Fetch-User": [
                "?1"
            ],
            "Accept-Encoding": [
                "gzip, deflate, br"
            ],
            "Accept-Language": [
                "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7"
            ],
            "Sec-Ch-Ua": [
                "\"Google Chrome\";v=\"93\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"93\""
            ],
            "Sec-Ch-Ua-Mobile": [
                "?0"
            ],
            "Sec-Fetch-Site": [
                "none"
            ],
            "Sec-Fetch-Mode": [
                "navigate"
            ],
            "Cookie": [
                "wp-settings-time-2=1617279544; wordpress_test_cookie=WP+Cookie+check"
            ],
            "Sec-Ch-Ua-Platform": [
                "\"macOS\""
            ],
            "Dnt": [
                "1"
            ]
        },
        "tls": {
            "resumed": true,
            "version": 772,
            "cipher_suite": 4865,
            "proto": "h2",
            "proto_mutual": true,
            "server_name": "[REDACTED]",
        }
    },
    "common_log": "91.180.123.81 - - [03/Sep/2021:17:39:49 +0200] \"GET /wp-content/uploads/2019/12/cropped-Capture-d%E2%80%99%C3%A9cran-2019-12-02-%C3%A0-16.22.37.png HTTP/2.0\" 404 12617",
    "user_id": "",
    "duration": 0.884501508,
    "size": 12617,
    "status": 404,
    "resp_headers": {
        "Expires": [
            "Wed, 11 Jan 1984 05:00:00 GMT"
        ],
        "Cache-Control": [
            "no-cache, must-revalidate, max-age=0"
        ],
        "Link": [
            "<https://[REDACTED]/wp-json/>; rel=\"https://api.w.org/\""
        ],
        "Status": [
            "404 Not Found"
        ],
        "Content-Encoding": [
            "gzip"
        ],
        "Vary": [
            "Accept-Encoding"
        ],
        "Server": [
            "Caddy"
        ],
        "Content-Type": [
            "text/html; charset=UTF-8"
        ]
    }
}

The config file is the following

www.[REDACTED].be {
    redir https://[REDACTED]{uri} permanent
}

[REDACTED].be {
    root * /var/www/redacted
    php_fastcgi unix//run/php/php7.3-fpm.sock
    file_server
    encode gzip
    @disallowed {
        path /xmlrpc.php
        path *.sql
        path /wp-content/uploads/*.php
    }
    rewrite @disallowed '/index.php'

    log {
        output file /var/log/caddy/[REDACTED].be.log
    }

    tls backend@[REDACTED] {
        protocols tls1.2 tls1.3
    }
}
@mohammed90 mohammed90 added the bug 🐞 Something isn't working label Sep 4, 2021
@mohammed90 mohammed90 added this to the v2.4.6 milestone Sep 4, 2021
@mholt
Copy link
Member

mholt commented Sep 5, 2021

@bmichotte Would you be able to try the proposed fix in #4332?

@mohammed90
Copy link
Member

You can grab the CI build artifacts from here: https://github.com/caddyserver/caddy/actions/runs/1201538520

@bmichotte
Copy link
Author

@mholt I can try but as this is a production server I need to duplicate it, so I’ll try to do that tomorrow

@bmichotte
Copy link
Author

@mholt @mohammed90 I can confirm that the patch provided fixes the issue, the images are loading !

@ueffel
Copy link
Contributor

ueffel commented Nov 23, 2021

Sorry to weigh in here, but I think with #4332 and #4409 the path handling is now wrong and paths end up double encoded and markdown rendering does not work anymore for me.

Consider the following example:
Caddyfile:

http://:80 {
	templates
	rewrite /*.md /template.html
	file_server browse
}

template.html:

{{$markdownFilePath := .OriginalReq.URL.Path}}
{{$markdownFile := (include $markdownFilePath | splitFrontMatter)}}
<html>
    <head>
        <title>Awesome Markdown</title>
    </head>
    <body>
        {{markdown $markdownFile.Body}}
    </body>
</html>

Now I have a file test test.md. It gets listed in the file_server browse page like this:

<td>
	<a href="test%2520test.md">
		<svg width="1.5em" height="1em" version="1.1" viewBox="0 0 265 323"><use xlink:href="#file"></use></svg>
		<span class="name">test test.md</span>
	</a>
</td>

The URL is test%2520test.md but it is supposed to be test%20test.md. The % is encoded as %25, because of the double encoding.
In the above setup the extraction of the path does not work anymore, it results in a 500 error:

http.log.error  template: /template.html:2:20: executing "/template.html" at <include $markdownFilePath>: error calling include: open test%20test.md: Das System kann die angegebene Datei nicht finden.        {"request": {..., "err_trace": "templates.(*Templates).executeTemplate (templates.go:339)"}

(German windows over here, "Das System kann die angegebene Datei nicht finden." means "File not found.")


I think the OPs problem is the wrong encoding of the path cropped-Capture-d’écran-2019-12-02-à-16.22.37.png
cropped-Capture-d%E2%80%99%C3%A9cran-2019-12-02-%C3%A0-16.22.37.png is wrong, it should be
cropped-Capture-d%E2%80%99e%CC%81cran-2019-12-02-a%CC%80-16.22.37.png
It's the difference between (Codepoint \u0065\u0301) and é (Codepoint \u00E9), same with (\u0061\u0300) and à (\u00E0). It might look the same, but on the string or byte level it is different and therefore it is essentially a different file name. The file system even lets me have 2 the (seemingly) same file names in one directory.

image

Btw: Both file name cropped-Capture-d’écran-2019-12-02-à-16.22.37.png and cropped-Capture-d’écran-2019-12-02-à-16.22.37.png work fine in the file_server browse page and generate working URLs in v.2.4.5.

I don't know what the OPs real problem was, but it was not caddy's path encoding. Please consider reverting the changes from #4332 and #4409.

@mohammed90
Copy link
Member

@ueffel, yeah, this is now fixed in #4410.

@ueffel
Copy link
Contributor

ueffel commented Nov 23, 2021

@mohammed90 It is not fixed in 1e10f6f.

Edit: Sorry to be so stubborn on this. Like I said: I think is was fine before #4332 and #4410.

@mohammed90
Copy link
Member

@mohammed90 It is not fixed in 1e10f6f.

Edit: Sorry to be so stubborn on this. Like I said: I think is was fine before #4332 and #4410.

No worries. I also tend to be quite assertive, stern, and stubborn in my observations and communication 😄 I didn't read your comment in any negative manner.

I can't look at this now. It'll take me perhaps until the weekend.

@mohammed90
Copy link
Member

@mohammed90 It is not fixed in 1e10f6f.
Edit: Sorry to be so stubborn on this. Like I said: I think is was fine before #4332 and #4410.

No worries. I also tend to be quite assertive, stern, and stubborn in my observations and communication 😄 I didn't read your comment in any negative manner.

I can't look at this now. It'll take me perhaps until the weekend.

You're right, @ueffel. The tests still pass without the changes (says something about writing tests first 🙂, which I didn't), yet we still need the patch from 1e10f6f. I will send a PR that reverts the changes of #4332 but retains the tests.

Ironically, this issue report wasn't valid, but the supposed fix made it tangentially valid.

@ueffel
Copy link
Contributor

ueffel commented Dec 11, 2021

78b5356 looks good to me. Thanks.
@bmichotte can you check if your issue is fixed as well?

@bmichotte
Copy link
Author

Looks good to me as well !

@mholt
Copy link
Member

mholt commented Dec 12, 2021

Awesome. Thanks so much for patching, @mohammed90, and for your help everyone.

@mholt mholt closed this as completed Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants