Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt filename in Upload component when server's file encoding is not UTF-8 #20417

Closed
mas4ivv opened this issue Nov 6, 2024 · 8 comments · Fixed by #20503
Closed

Corrupt filename in Upload component when server's file encoding is not UTF-8 #20417

mas4ivv opened this issue Nov 6, 2024 · 8 comments · Fixed by #20503

Comments

@mas4ivv
Copy link

mas4ivv commented Nov 6, 2024

Description of the bug

When using the Upload component with a filename that has non-ASCII characters, the server receives a corrupt filename if it's file encoding is not UTF-8.
The filename is part of the upload request and seems to be UTF-8 encoded:
grafik

The filename is then read via Apache Commons' ServletFileUpload, created by StreamReceiverHandler.getItemIterator(VaadinRequest). As there is no encoding set on creation, it falls back to the server's default.

Example running the example below from Windows command line:
grafik

Expected behavior

It seems to me that the client is always sending the filename UTF-8 encoded. If this is correct, the ServletFileUpload created in StreamReceiverHandler should be configured to use UTF-8 by calling setHeaderEncoding("UTF-8").

Minimal reproducible example

import java.io.ByteArrayOutputStream;

import com.vaadin.flow.component.orderedlayout.VerticalLayout;
import com.vaadin.flow.component.textfield.TextField;
import com.vaadin.flow.component.upload.Upload;
import com.vaadin.flow.router.Route;

@Route("upload")
public class UploadPage extends VerticalLayout {

    public UploadPage() {
        Upload upload = new Upload();
        TextField filename = new TextField();
        upload.setReceiver((name, type) -> {
            filename.setValue(name);
            return new ByteArrayOutputStream();
        });
        this.add(upload, filename);
    }
}

Versions

  • Vaadin / Flow version: 23.4.1
  • Java version: 11
  • OS version:
  • Browser version (if applicable):
  • Application Server (if applicable):
  • IDE (if applicable):
@mcollovati mcollovati transferred this issue from vaadin/flow Nov 6, 2024
@mcollovati mcollovati transferred this issue from vaadin/flow-components Nov 6, 2024
@tepi tepi added bug BFP Bugfix priority, also known as Warranty labels Nov 7, 2024
@tepi tepi moved this to 🔖 High Priority (P1) in Vaadin Flow bugs & maintenance (Vaadin 10+) Nov 7, 2024
@tepi tepi moved this to 🟢Ready to Go in Vaadin Flow ongoing work (Vaadin 10+) Nov 7, 2024
@tepi tepi self-assigned this Nov 8, 2024
@tepi tepi moved this from 🔖 High Priority (P1) to 🏗 WIP in Vaadin Flow bugs & maintenance (Vaadin 10+) Nov 8, 2024
@tepi tepi moved this from 🟢Ready to Go to ⚒️ In progress in Vaadin Flow ongoing work (Vaadin 10+) Nov 8, 2024
@tepi tepi removed their assignment Nov 11, 2024
@tepi
Copy link
Contributor

tepi commented Nov 11, 2024

@mas4ivv I was unfortunately unable to reproduce this on macOS (uses UTF-8 by default, but even changing the server JVM encoding to soemthing else did not break the special characters). Are you by any chance on Windows?

@tepi tepi moved this from 🏗 WIP to 🔖 High Priority (P1) in Vaadin Flow bugs & maintenance (Vaadin 10+) Nov 11, 2024
@tepi tepi moved this from ⚒️ In progress to 🟢Ready to Go in Vaadin Flow ongoing work (Vaadin 10+) Nov 11, 2024
@mas4ivv
Copy link
Author

mas4ivv commented Nov 11, 2024

Yes, the example was run on Jetty on Windows. Our production server is running on Linux and has the same problem.

@tepi
Copy link
Contributor

tepi commented Nov 11, 2024

Alright, in that case it should be reproducable on mac as well since I think most Linux default to utf-8 anyway. I'll retry.

@tltv tltv self-assigned this Nov 12, 2024
@tltv
Copy link
Member

tltv commented Nov 14, 2024

@mas4ivv Could you please clarify these questions to help us reproducing the issue and find good fix for it. Also if possible, attaching full example application would also answer some of these:

  • which browser you use? or is it maybe same with all you try?
  • which jetty version you use?
  • is there any customization in the app for jetty configurations or request handlers/filters that could change request's character encoding (with ServletRequest.setCharacterEncoding(String))?

Example code generates a request with content type of multipart/form-data with one part in the request body. This looks correct in the screenshot. With parts in the body, StreamReceiverHandler.getItemIterator(VaadinRequest) method should not be called at all. There's a hasParts method that decides how to process the body, and both are using different APIs. If possible, could you confirm by putting a brake point in StreamReceiverHandler.getItemIterator(VaadinRequest) and confirm if it's actually called? I can't see how it would with such a request. But if it does, then that seems a bug.

Updating Flow to always call ServletFileUpload.setHeaderEncoding("UTF-8") would probably not help with parts in the body. But forcing it to UTF-8 in other place could potentially fix the issue.

@mas4ivv
Copy link
Author

mas4ivv commented Nov 15, 2024

Hi,
I can reproduce the error with both Firefox and Chrome, using Jetty 10.0.24. Here is the complete demo app:
upload_sample.zip

I'm adding -Dfile.encoding=ISO8859-15 on startup, if I specify UTF-8 everything works fine.

I added breakpoints at hasParts and getItemIterator - the first is throwing an exception:

java.lang.IllegalStateException: No multipart config for servlet
No multipart config for servlet
java.lang.IllegalStateException: No multipart config for servlet
	at org.eclipse.jetty.server.Request.getParts(Request.java:2337)
	at org.eclipse.jetty.server.Request.getParts(Request.java:2328)
	at javax.servlet.http.HttpServletRequestWrapper.getParts(HttpServletRequestWrapper.java:317)
	at com.vaadin.flow.server.communication.StreamReceiverHandler.getParts(StreamReceiverHandler.java:637)
	at com.vaadin.flow.server.communication.StreamReceiverHandler.hasParts(StreamReceiverHandler.java:225)
	at com.vaadin.flow.server.communication.StreamReceiverHandler.doHandleMultipartFileUpload(StreamReceiverHandler.java:198)
	at com.vaadin.flow.server.communication.StreamReceiverHandler.handleRequest(StreamReceiverHandler.java:159)
	at com.vaadin.flow.server.communication.StreamRequestHandler.handleRequest(StreamRequestHandler.java:116)
	at com.vaadin.flow.server.VaadinService.handleRequest(VaadinService.java:1563)
	at com.vaadin.flow.server.VaadinServlet.service(VaadinServlet.java:369)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)

so it seems that getItemIterator gets called as a fallback (this is the same for all encodings).

I also tried setting ServletFileUpload.setHeaderEncoding("UTF-8") while debugging - this solves the problem for me.

@tltv
Copy link
Member

tltv commented Nov 15, 2024

Thank you. Key for reproducing was to run without multipart config so Flow will fallback to non-multipart upload.

Setting header encoding always to UTF-8 in case when request's character encoding is not explicitly set seems now correct fix. This because ServletFileUpload uses now primarily request's character encoding or if it's null then system's default. And in this test app request's encoding seems to be null.

tltv added a commit that referenced this issue Nov 15, 2024
This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417
tepi pushed a commit that referenced this issue Nov 18, 2024
This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417
tltv added a commit that referenced this issue Nov 18, 2024
This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417
tepi pushed a commit that referenced this issue Nov 19, 2024
* fix: fix ServletFileUpload header encoding (#20480)

This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417

* chore: formatting

* test: fixed test
@github-project-automation github-project-automation bot moved this from ⚒️ In progress to Done in Vaadin Flow ongoing work (Vaadin 10+) Nov 19, 2024
@github-project-automation github-project-automation bot moved this from 🔖 High Priority (P1) to ✅ Closed in Vaadin Flow bugs & maintenance (Vaadin 10+) Nov 19, 2024
tltv added a commit that referenced this issue Nov 19, 2024
This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417
tltv added a commit that referenced this issue Nov 19, 2024
This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417
tepi pushed a commit that referenced this issue Nov 19, 2024
* fix: fix ServletFileUpload header encoding (#20480)

This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417

* test: fixed test

* chore: formatting
tepi pushed a commit that referenced this issue Nov 19, 2024
* fix: fix ServletFileUpload header encoding (#20480)

This change will set ServletFileUpload's header encoding always to UTF-8 in StreamReceiverHandler when request character encoding is null. This ensures that system's default character encoding is not applied when parsing filename of the uploaded file, unless request's character encoding is set otherwise.
Only for setups without multipart config for servlet.

Fixes: #20417

* chore: fixed test and formatting
@vaadin-bot
Copy link
Collaborator

This ticket/PR has been released with Vaadin 23.5.12.

@vaadin-bot
Copy link
Collaborator

This ticket/PR has been released with Vaadin 24.4.18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants