Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with cyrillic symbols in location headers #1693

Closed
LavrovArtem opened this issue May 13, 2015 · 28 comments
Closed

Issue with cyrillic symbols in location headers #1693

LavrovArtem opened this issue May 13, 2015 · 28 comments
Labels
http Issues or PRs related to the http subsystem.

Comments

@LavrovArtem
Copy link

To reproduce this bug, execute the code below and open http://localhost:777/ in the browser. This page redirects to a page from the location header, and browser should open http://localhost:777/?text=что-то%20русское .

In node.js version 0.10.15 and earlier, it works fine, but io.js opens the http://localhost:777/?text=GB>-B>%20@CAA:>5 url.

I think that the issue is linked to the writeHead function that incorrectly encodes headers and sends them to the browser.

var http = require('http');

http.createServer(function(req, res) {
    if (req.url === '/')
        res.writeHead(302, {location:'?text=что-то русское'});

    res.end();
}).listen(777);
@vkurchatkin
Copy link
Contributor

I'm not sure if this is a bug, but it is always better to encode url manually:

var http = require('http');

http.createServer(function(req, res) {
    if (req.url === '/')
        res.writeHead(302, {location:'?text=' + encodeURI('что-то русское')});

    res.end();
}).listen(7777);

@bnoordhuis
Copy link
Member

As general advise, you're better off sticking to ASCII in HTTP headers. From RFC 7230:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset, supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset.

Neither MIME (RFC 2047) nor ISO-8859-1 are widely supported. ASCII is the safest bet.

@bnoordhuis bnoordhuis added the http Issues or PRs related to the http subsystem. label May 13, 2015
@vkurchatkin
Copy link
Contributor

Something like this happens: new Buffer('что-то русское', 'binary').toString()

@LavrovArtem
Copy link
Author

I understand that it would be reasonable to encode cyrillic or go without using these symbols, but we are creating a proxy server, whereas even quite well-known sites are sending headers with cyrillic.

@rlidwka
Copy link
Contributor

rlidwka commented May 13, 2015

whereas even quite well-known sites are sending headers with cyrillic

Do you have an example of a such website? It might be useful to mention a real-life example in the respective test should that be written.

@vkurchatkin
Copy link
Contributor

Adding res._send(''); forces headers to be flushed with correct encoding. Ironically, calling res.flushHeaders() doesn't really flush headers

@LavrovArtem
Copy link
Author

@rlidwka
Take https://market.yandex.ru/
If you type cyrillic letters into the search box (say, "телефон"), the header will be like http://screencast.com/t/cKzbSW0a

@vkurchatkin
Copy link
Contributor

Another workaround:

var http = require('http');

http.createServer(function(req, res) {
    if (req.url === '/') {
      res.statusCode = 302;
      res.setHeader('location', '?text=что-то русское');
      res.flushHeaders();
    }

    res.end();
}).listen(7777);

@ChALkeR
Copy link
Member

ChALkeR commented May 14, 2015

@vkurchatkin Does that mean that response.writeHead behaves differently from response.setHeader? How that could be expected?

@ChALkeR
Copy link
Member

ChALkeR commented May 14, 2015

@LavrovArtem Yandex.Market gives properly encoded url in the Location header for me.
What tool are you using on the screenshot? Maybe it decodes the Location header? Have you tried pressing the «[Raw]» button?

@ChALkeR
Copy link
Member

ChALkeR commented May 14, 2015

Guess this link goes here: nodejs/node-v0.x-archive#25293

@LavrovArtem
Copy link
Author

@ChALkeR I used Fiddler2 to view requests. Also, when I debug our proxy-server, I see the response coming from the original site with unencoded location header.

@LavrovArtem
Copy link
Author

@vkurchatkin thank you for workarounds

@vkurchatkin
Copy link
Contributor

@ChALkeR totally unexpected, at least for me. see #1695

@jasnell
Copy link
Member

jasnell commented May 14, 2015

Good to have those workarounds but this definitely feels like something that should be fixed. Granted, best practice would be to escape prior to setting but writeHead shouldn't be acting this way

@vkurchatkin
Copy link
Contributor

@jasnell the problem with writeHead is that it doesn't actually write anything to the socket and prevents flushHeaders from doing so. Can you take a look at #1695 ?

@vkurchatkin
Copy link
Contributor

There is a similar problem with parsing:

var http = require('http');

http.createServer(function(req, res) {
  console.log(req.url);
  console.log(req.headers)
  res.end('');
}).listen(3000);

curl -H "X-Test: Тест" http://localhost:3000/?тест

On node 0.10:

/?тест
Тест

On io.js 3.2:

/?��
ТеÑ�Ñ

This is a seems like a major regression and I can't think of a proper workaround.

@Fishrock123 Fishrock123 added the confirmed-bug Issues with confirmed bugs. label Aug 31, 2015
@trevnorris
Copy link
Contributor

@bnoordhuis Quick note. ISO-8859-1 is what's returned by OneByteString(). This started back when v8 switched from String::WriteAscii() to String::WriteOneByte(). And since it would be extra overhead to convert the headers to US-ASCII, we simply leave them latin-1 encoded. Demonstrative example (check X-Test header):

var http = require('http');

http.createServer(function(req, res) {
  console.log(req.headers);
  res.end('bye\n');
}).listen(3000, makeRequest);

var options = {
  port: 3000,
  headers: { 'X-Test': 'Ä' },
};

function makeRequest() {
  http.request(options, function (res) { }).end();
}

Output:

{ 'x-test': 'Ä', host: 'localhost:3000', connection: 'close' }

@Fishrock123
Copy link
Contributor

Closing as per #2629 (comment).

This is working as intended. The spec only allows ascii for security reasons. :S

@ChALkeR
Copy link
Member

ChALkeR commented Sep 16, 2015

@Fishrock123 What about #1693 (comment) then?

@Fishrock123 Fishrock123 reopened this Sep 16, 2015
@Fishrock123
Copy link
Contributor

Hmm, I suppose that is an actual bug, but from my knowledge of internals quite difficult to reasonably (performantly) fix.

The entire http headers API (internals) needs a rewrite/cleanup.

@trevnorris
Copy link
Contributor

Discrepancies I'm aware of:

  • Writing headers, then writing a string will default headers to utf8.
  • Writing headers, then writing a buffer will default headers to binary.
  • Writing and flushing headers defaults to utf8, even if a buffer is written afterwards.

The entire thing needs to be looked at.

@Fishrock123
Copy link
Contributor

Related: #962

@misterdai
Copy link

Just encountered a problem with this. Wanted to apply some headers containing non-ascii characters, to a response that is sitting behind IIS (which is proxying requests). IIS threw a fit over the header values, eventually leading me to use the module mimelib to Mime word encode the header values before using response.setHeader().

@ChALkeR
Copy link
Member

ChALkeR commented Dec 15, 2015

Wanted to apply some headers containing non-ascii characters

This is against the HTTP spec, I guess.
Edit: no, I am incorrect. The restrictions are a bit different:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

@jasnell
Copy link
Member

jasnell commented Dec 15, 2015

Just to be clear... RFC 7230 and it's additional docs replace RFC 2616:

https://tools.ietf.org/html/rfc7230#section-3.2
https://tools.ietf.org/html/rfc7230#section-1.2

The specific ABNF for Header field values is:

VCHAR (any visible [USASCII] character)

header-field   = field-name ":" OWS field-value OWS

field-name     = token
field-value    = *( field-content / obs-fold )
field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar    = VCHAR / obs-text

obs-text = %x80-FF
obs-fold       = CRLF 1*( SP / HTAB )

Also note the comment in Section 3.2.4:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

Also, from Appendix A.2:

Non-US-ASCII content in header fields ... has been obsoleted and made opaque

For backwards compatibility, header fields are generally always supposed to be treated as ISO-8859-1 but anything outside of the VCHAR (visible US-ASCII) range is obsolete and treated as "opaque". Unfortunately the spec doesn't actually define what "opaque" means in this case. I assume that means the bytes are intended to be uninterpreted (meaning we're not supposed to decode them as UTF-8).

In other words, a header value must contain only bytes in the x20-xFF range, and should ONLY decode bytes x20-x7E while leaving bytes x80-xFF as "raw bytes". This range, of course, does happen to allow for valid UTF8 byte sequences. Interpretation of those, however, is considered "unspecified".

@ChALkeR
Copy link
Member

ChALkeR commented Feb 15, 2016

The original testcase should be fixed in recent versions on all supported Node.js branches, it just throws errors now, as it should.

Any reason for keeping this open?

@ChALkeR
Copy link
Member

ChALkeR commented Feb 15, 2016

I'm going to close this.
Everyone, leave a comment here if you disagree and want this reopened for some reason.

The behaviour is consistent now.
#1693 (comment) — this «workaround» no longer works, because it shouldn't.
Proper usage looks like this: #1693 (comment) (or just use querystring).

#1693 (comment) is also fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

No branches or pull requests

9 participants