Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not enforce URLs to be UTF-8 #2208

Closed
ckcr4lyf opened this issue Jan 25, 2023 · 2 comments
Closed

Do not enforce URLs to be UTF-8 #2208

ckcr4lyf opened this issue Jan 25, 2023 · 2 comments

Comments

@ckcr4lyf
Copy link

What would you like to discuss?

While trying to make an announce as per the bittorrent spec, I encountered an error. In the bittorrent announce (https://www.bittorrent.org/beps/bep_0003.html), the info_hash param is the raw 20 byte hash, which needs to be escaped (for non URL safe characters / arbitrary octets), and is currently being blocked by Got, due to a call to "decodeURI()":

decodeURI(urlString);

Error Trace
RequestError: URI malformed
    at Request._destroy (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/core/index.js:481:21)
    at _destroy (node:internal/streams/destroy:102:25)
    at Request.destroy (node:internal/streams/destroy:64:5)
    at Request.flush (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/core/index.js:240:22)
    at lastHandler (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/create.js:37:26)
    at iterateHandlers (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/create.js:49:28)
    at got (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/create.js:69:16)
    at Function.got.<computed> [as get] (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/create.js:172:42)
    at file:///C:/Users/user/Documents/gotpoc/index.mjs:3:5
    at ModuleJob.run (node:internal/modules/esm/module_job:195:25)
    at decodeURI (<anonymous>)
    at Options.set url [as url] (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/core/options.js:694:9)   
    at new Options (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/core/options.js:367:34)
    at new Request (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/core/index.js:224:28)
    at got (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/create.js:31:25)
    at Function.got.<computed> [as get] (file:///C:/Users/user/Documents/gotpoc/node_modules/got/dist/source/create.js:172:42)
    at file:///C:/Users/user/Documents/gotpoc/index.mjs:3:5
    at ModuleJob.run (node:internal/modules/esm/module_job:195:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:337:24) {

Related Issue

I've found a related issue, which seems to be incorrectly closed: #420 (comment) , which says

According to RFC 3986, UTF-8 encoding of URLs is spec and a requirement.

However, from my understanding, the requirement applies to new URI schemes created after 2005 (last paragraph of https://www.rfc-editor.org/rfc/rfc3986#section-2.5) , so for HTTP it is not be a requirement.

For clarification I contacted the IETF via their mailing list, and Julian R. (co-author of HTTP specifications) confirmed this (ref: https://mailarchive.ietf.org/arch/msg/ietf/0OydNWKPkHCsDE-z1N5rILhQBYo/) . In fact he even commented in the original issue

Based on this, would you be open to removing this restriction, which goes against the spec?

Seems this MR is related: #2200

@szmarczak
Copy link
Collaborator

szmarczak commented Jan 27, 2023

It is correct that URI schemes created before 2005 are not affected; however this would require the decodeURI function to also accept the scheme or some sort of a flag in order to figure out what RFC to follow. This is very unlikely to happen as it adds complexity and would make a one fits all solution impossible.

It is worth distinguishing that URIs are not URLs but URLs are URIs, so calling decodeURI may be incorrect here. We are already using WHATWG URLs and that enforces some URI rules already.

However, in most cases URLs represent letters and not raw bytes, so the push for UTF-8 is very understandable.

On a side note, I (and surely tens of others) have spent days over the course of being here, and still got http: URI encoding wrong.

I acknowledge the mistake.

@szmarczak
Copy link
Collaborator

Thank you @reschke and @ckcr4lyf for bringing this up 🙌

@szmarczak szmarczak changed the title Got does not allow non UTF-8 sequences to be percent-encoded in the URL (GET request) Do not enforce URLs to be UTF-8 Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants