Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problem with proxied binary data #80

Closed
KPesendorfer opened this issue Jul 15, 2013 · 22 comments
Closed

Encoding problem with proxied binary data #80

KPesendorfer opened this issue Jul 15, 2013 · 22 comments
Assignees
Labels

Comments

@KPesendorfer
Copy link

I've just updated from jscoverage to JSCover (using as proxy instrumentation server) and now have some troubles with POST-requests containing binary data ... the requests get stuck at the proxy and run into a socket timeout of 1 minute.

Here are some details:

  • the POST-requests that cause the problem are multipart-posts containing binary data (but small ones handled as 1 request ... < 1KB)
  • the binary data inside that POST-request is from an XML-file with UTF-8 encoding with byte-order-mark (BOM) / without the BOM the problem does not occur
  • I'm using the java system config -Dfile.encoding=UTF-8
  • these requests are not instrumented but handled by the proxy
  • I think the problem is that the POST-data is handled as text (toStringNoClose) which leads into encoding problems that the binary data length is different from the encoded character-text length and therefor the stream read operation can't read any more data -> timeout

Here is a screenshot of some debug output logs for more details ...
jscover-encoding-bug

It would be much better to forward (especially not instrumented requests) exactly with the same content as they are received (copy stream data as binary data - not as encoded characters). POST-data requests could also contain real binary data - maybe they should be by-passed even at a higher level.

Kind regards,
Klaus.

@tntim96
Copy link
Owner

tntim96 commented Jul 15, 2013

Thanks for the report, I'll take a look.

@ghost ghost assigned tntim96 Jul 15, 2013
@tntim96
Copy link
Owner

tntim96 commented Jul 16, 2013

This make take a bit of time (I'm a bit busy ATM). I need to be able to reproduce the error (preferably in a failing unit test). Are you able to assist? Perhaps you can provide a sample file and use debug to find the browser's Content-Length header, which can then be fed into the test.

I don't suppose replacing InputStream with UnicodeBOMInputStream.java fixes your issue?

It would be much better to...copy stream data as binary data - not as encoded characters

I agree. The problem is the InputStream is converted to a BufferedReader to interpret the request - I don't think the underlying InputStream is useable after that - I'll have to look into the options...

@KPesendorfer
Copy link
Author

Hello!
I've just finished a short test page and some test-input files that can be used to easily reproduce the problem.
As github does not allow to post file attachments I've put it temporarly to following cloud share:
http://temp-share.com/show/dPf3Uj35W
... the ZIP-file contains a HTML-page (using jquery.js from same directory) ... put it to your webserver, access it with your web browser (I've used Firefox) via the JSCover proxy and upload the test files ... the result should by as described below:

jscover-test-result

Note: to your question (using UnicodeBOMInputStream) ... no, I don't think that would help me, because same problems may occur with real binary data which can't be encoded as unicode and read as string (the content length won't match)

I hope this helps reproducing and fixing the bug.
Kind regards,
Klaus.

@tntim96
Copy link
Owner

tntim96 commented Jul 16, 2013

Thanks, I've reproduced the error, and will compare with your test.

It's a bit tricky, because the BOM can appear anywhere in the POST data (my test is on a multi-part file upload). I'll look at reading the lines from the InputStream instead of the BufferedReader, so that I can copy the streams as you suggested above.

@tntim96
Copy link
Owner

tntim96 commented Jul 16, 2013

I've committed a failing test HtmlUnitServerTest.shouldWorkWithFileBOMUpload (which is also run as part of HtmlUnitProxyTest.shouldWorkWithFileBOMUpload)...You need to remove the @Ignore to run it.

@tntim96
Copy link
Owner

tntim96 commented Jul 17, 2013

There's a fix in trunk - can you rebuild and re-test?

Thanks to https://github.com/0mc and Apache Commons IO.

@KPesendorfer
Copy link
Author

Thank you for the fix - it did solve the issue of POST-requests with some special XML-files (BOM handling).

BUT the JSCover proxy still can't handle POST-requests with binary data!
I've tried upload a ZIP-file to my webserver (via the JSCover proxy) and the resulting file was corrupt, because it was handled as String (in ... handlePost(httpRequest, ioUtils.toStringNoClose(br, length));)
Not instrumented POST-requests have to be forwarded without modification (String-handling) directly.

Temporarily I've worked around the problem with a Apache proxy in between only sending JS-file requests to JSCover and forwarding the POST-requests directly to my webserver.

@tntim96
Copy link
Owner

tntim96 commented Jul 17, 2013

K - will have to start with a failing test for this scenario.

@tntim96
Copy link
Owner

tntim96 commented Jul 18, 2013

I think I've fixed this. Can you try this patch.

@KPesendorfer
Copy link
Author

cool, looks good. some first tests worked without any troubles.
Thank you very much!

@tntim96
Copy link
Owner

tntim96 commented Jul 18, 2013

Great. Let me know if the rest pass and I'll check in.

I need to write a few more tests and look at handling other encodings.

@KPesendorfer
Copy link
Author

also my nightly integration tests worked with the new patched version without any problems.
so from my point of view this patch can be applied to the trunk.
Thank's a lot!

@tntim96
Copy link
Owner

tntim96 commented Jul 19, 2013

Thanks, but I'm looking at a more elegant solution. I'll let you know when I check in so you can re-test before I release (I should have enough scenarios to cover this though).

@tntim96
Copy link
Owner

tntim96 commented Jul 21, 2013

OK - I think it's all fixed in trunk. Can you rebuild and let me know how it goes?

@KPesendorfer
Copy link
Author

Sorry for the late response. Last night I've started a new test run with the latest version from trunk (yesterday afternoon), but the test run does not look very good - the test got stuck several times on some POST-requests. After restarting the proxy it worked for a while until it got into the same problem (I've extended the code with some log messages logging every served request - and when the proxy run into such problems no request logs are written any more ... )
I don't exactly know what's the problem with the current version, but I will revert back to the older patched version that seems to work.
When I have time in the next days I will try to reproduce the problem with the trunk version.

@tntim96
Copy link
Owner

tntim96 commented Jul 23, 2013

Thanks for taking the time to look at this. I've re-implemented the stream handling using PushbackInputStream, so I can read the HTTP header data, push the header data back onto the stream and forward to the proxy if necessary while avoiding any low-level encoding issues. In theory it's a better implementation if I can get it working. All my local tests pass, so I'd like to write a test that exposes the error you're getting.

I've extended the code with some log messages logging

Maybe you can add your changes to a fork, and I'll look at incorporating it with something like a --debug switch.

When I have time in the next days I will try to reproduce the problem with the trunk version.

That'd be great - thanks. I'll see if I can find the issue in the meantime.

@tntim96
Copy link
Owner

tntim96 commented Jul 24, 2013

I haven't had any luck trying to reproduce...Can you reproduce this manually with a browser? If you describe the steps I can probably solve this a bit quicker.

@KPesendorfer
Copy link
Author

I've just reproduced the problem with the upload-testpage I sent you last time ... uploading several binary data files (ZIP-files) of different size (also some parallel when the last was not finished) ... and after some time the JSCover proxy got stuck (inside the Eclipse debugger) ... see screenshot:
jscover-not-responding
... inside the CopyNoClose function the is.read(buf) call reads the available full-blocks until it gets to the last one which is smaller than the full buffer and the last read-operation stops the thread.

I think I also have found a solution. The function tries to read from the stream always a full block, which may not be available at the end of the file/stream. So I changed the code to only read up to the given max length:
jscover-stream-reading-fix
I will test the fix in my CI-environment and let you know if it works.

@tntim96
Copy link
Owner

tntim96 commented Jul 25, 2013

Excellent. Once confirmed, if you want to fork, commit and create a pull request, you'll get the commit to your name. Otherwise I'll credit you in the commit comments and contributions page.

I'll try to write a test that proves it in the mean time.

@tntim96
Copy link
Owner

tntim96 commented Jul 26, 2013

Did it work for you? I've noticed if I do a huge upload, only the first ~500 bytes come through - that's the header fields and no POST data. Try a break point on line 383 of HttpServer.java. The read value is ~500 bytes - also evaluate new String(headerBytes) to see the data.

@KPesendorfer
Copy link
Author

My nightly tests worked without any problems. I didn't see any problems concerning big post-requests (also a 25MB zip-file upload worked as expected).
So if you would be so kind to include the fix into the trunk (mentioning my fix is enough and ok for me).

To the other logging improvements I've made ... I didn't have time to create a fork yet and don't think that will be very soon (on holiday next week, ...) but shortly explained what I've done:

  • using java.util.logging.Logger to log important things (where the end-user can configure and decide via Java native logging.properties what and on which level he wants the things to be logged) as:
  • whether a file is instrumented or skipped (InstrumenterService)
  • HttpServer/Proxy: log every handled request with method and URL (and for POST-requests additionally the content size) - finer log level

last but not least ... thank's for your effort on fixing this bug.

@tntim96
Copy link
Owner

tntim96 commented Jul 26, 2013

I've committed the fix - can you review? Thanks for your help. I've credited you in the commit message and added you to the contributor list (will upload to web-site with 1.0.2 release in a few days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants