ReUrl is a library for parsing and manipulating URLs. It supports relative- and non-normalized URLs and a number of operations on them. It can be used to parse, resolve, normalize and serialize URLs in separate phases and in such a way that it conforms to the WhatWG URL Standard.
I wrote this library because I needed a library that supported non-normalized and relative URLs but I also wanted to be certain that it followed the specification completely.
The WhatWG URL Standard defines URLs in terms of a parser algorithm that resolves URLs, normalizes URLs and serializes URL components in one pass. Thus to implement a library that follows the standard, but also supports a versatile set of operations on relative, and non-normalized URLs, I had to disentangle these phases from the specification and to some extent rephrase the specification in more elementary terms.
Eventually I came up with a small 'theory' of URLs that I found very helpful and I based the library on that. Over time, this theory has become thoroughly documented in this new URL Specification.
npm install reurl
git clone https://github.com/alwinb/reurl.git
cd reurl
make all
cp dist/reurl.min.js /my/project/js/
The ReUrl library exposes an Url class and a RawUrl class with an identical API. Their only difference is in their handling of percent escape sequences.
In a Node.JS project, you can use these classes as follows:
import { Url, RawUrl } from 'reurl'
Note: ReUrl is an ESM-only module, so it cannot be imported with require
.
Url
For Url objects the URL parser decodes percent escape sequences, getters report percent-decoded values and the set method assumes that its input is percent-decoded unless explicitly specified otherwise.
var url = new Url ('//host/%61bc')
url.file // => 'abc'
url = url.set ({ query:'%def' })
url.query // => '%def'
url.toString () // => '//host/abc?%25def'
RawUrl
For RawUrl objects the parser preserves percent escape sequences, getters report values with percent-escape-sequenes preserved and set expects values in which % signs start a percent-escape sequence.
var url = new RawUrl ('//host/%61bc')
url.file // => '%61bc'
url = url.set ({ query:'%25%64ef' })
url.query // => '%25%64ef'
url.toString () // => '//host/%61bc?%25%64ef'
Url and RawUrl objects are immutable. Modifying URLs is acomplished through methods that return new Url and/ or RawUrl objects, such as the url.set (patch) method described below.
new Url (string \[, conf])
Construct a new Url object from an URL-string. The optional conf argument, if present must be a configuration object as described below.
var url = new Url ('sc:/foo/bar')
console.log (url)
// => Url { scheme: 'sc', root: '/', dirs: [ 'foo' ], file: 'bar' }
new Url (object)
Construct a new Url object from any object, possibly an Url object itself. The optional conf argument, if present, must be a configuration object as described below. Throws an error if the object cannot be coerced into a valid URL.
var url = new Url ({ scheme:'file', dirs:['foo', 'buzz'], file:'abc' })
console.log (url.toString ())
// => 'file:foo/buzz/abc'
conf.parser
You can pass a configuration object with a parser property to the Url constructor to trigger scheme-specific parsing behaviour for relative, scheme-less URL-strings.
The scheme determines support for windows drive-letters and backslash separators.
Drive-letters are only supported in file
URL-strings, and backslash separators are limited to file
, http
, https
, ws
, wss
and ftp
URL-strings.
var url = new Url ('/c:/foo\\bar', { parser:'file' })
console.log (url)
// => Url { drive: 'c:', root: '/', dirs: [ 'foo' ], file: 'bar' }
var url = new Url ('/c:/foo\\bar', { parser:'http' })
console.log (url)
// => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' }
var url = new Url ('/c:/foo\\bar')
console.log (url)
// => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' }
Url and RawUrl objects have the following optional properties.
url.scheme
The scheme of an URL as a string. This property is absent if no scheme part is present, e.g. in scheme-relative URLs.
new Url ('http://foo?search#baz') .scheme
// => 'http'
new Url ('/abc/?') .scheme
// => undefined
url.user
The username of an URL as a string. This property is absent if the URL does not have an authority or does not have credentials.
new Url ('http://joe@localhost') .user
// => 'joe'
new Url ('//host/abc') .user
// => undefined
url.pass
A property for the password of an URL as a string. This property is absent if the URL does not have an authority, credentials or password.
new Url ('http://joe@localhost') .pass
// => undefined
new Url ('http://host') .pass
// => undefined
new Url ('http://joe:pass@localhost') .pass
// => 'pass'
new Url ('http://joe:@localhost') .pass
// => ''
url.host
A property for the hostname of an URL as a string, This property is absent if the URL does not have an authority.
new Url ('http://localhost') .host
// => 'localhost'
new Url ('http:foo') .host
// => undefined
new Url ('/foo') .host
// => undefined
url.port
The port of (the authority part of) of an URL, being either a number, or the empty string if present. The property is absent if the URL does not have an authority or a port.
new Url ('http://localhost:8080') .port
// => 8080
new Url ('foo://host:/foo') .port
// => ''
new Url ('foo://host/foo') .port
// => undefined
url.root
A property for the path-root of an URL. Its value is '/'
if the URL has an absolute path. The property is absent otherwise.
new Url ('foo://localhost?q') .root
// => undefined
new Url ('foo://localhost/') .root
// => '/'
new Url ('foo/bar')
// => Url { dirs: [ 'foo' ], file: 'bar' }
new Url ('/foo/bar')
// => Url { root: '/', dirs: [ 'foo' ], file: 'bar' }
It is possible for file URLs to have a drive, but not a root.
new Url ('file:/c:')
// => Url { scheme: 'file', drive: 'c:' }
new Url ('file:/c:/')
// => Url { scheme: 'file', drive: 'c:', root: '/' }
url.drive
A property for the drive of an URL as a string, if present. Note that the presence of drives depends on the parser settings and/ or URL scheme.
new Url ('file://c:') .drive
// => 'c:'
new Url ('http://c:') .drive
// => undefined
new Url ('/c:/foo/bar', 'file') .drive
// => 'c:'
new Url ('/c:/foo/bar') .drive
// => undefined
url.dirs
If present, a nonempty array of strings. Note that the trailing slash determines whether a component is part of the dirs or set as the file property.
new Url ('/foo/bar/baz/').dirs
// => [ 'foo', 'bar', 'baz' ]
new Url ('/foo/bar/baz').dirs
// => [ 'foo', 'bar' ]
url.file
If present, a non-empty string.
new Url ('/foo/bar/baz') .file
// => 'baz'
new Url ('/foo/bar/baz/') .file
// => undefined
url.query
A property for the query part of url
as a string,
if present.
new Url ('http://foo?search#baz') .query
// => 'search'
new Url ('/abc/?') .query
// => ''
new Url ('/abc/') .query
// => undefined
url.hash
A property for the hash part of url
as a string,
if present.
new Url ('http://foo#baz') .hash
// => 'baz'
new Url ('/abc/#') .hash
// => ''
new Url ('/abc/') .hash
// => undefined
Url and RawUrl objects are immutable, therefore setting and removing components is achieved via a set method that takes a patch object.
url.set (patch)
The patch object may contain one or more keys being scheme, user, pass, host, port, drive, root, dirs, file, query and/ or hash. To remove a component you can set its patch' value to null.
If present;
โ port must be null
, a string, or a number
โ dirs must be an array of strings
โ root may be anything and is converted to '/'
if truth-y and is interpreted as null
otherwise
โ all others must be null
or a string.
new Url ('//host/dir/file')
.set ({ host:null, query:'q', hash:'h' })
.toString ()
// => '/dir/file?q#h'
For security reasons, setting the user will remove pass, unless a value is supplied for it as well. Setting the host will remove user, pass and port, unless values are supplied for them as well.
new Url ('http://joe:[email protected]')
.set ({ user:'jane' })
.toString ()
// => 'http://[email protected]'
new Url ('http://joe:secret@localhost:8080')
.set ({ host:'example.com' })
.toString ()
// => 'http://example.com'
patch.percentCoded
The patch may have an additional key percentCoded with a boolean value to indicate that strings in the patch contain percent encode sequences.
This means that you can pass percent-encoded values to Url.set by explicity setting percentCoded to true. The values will then be decoded.
var url = new Url ('//host/')
url = url.set ({ file:'%61bc-%25-sign', percentCoded:true })
url.file // => 'abc-%-sign'
log (url.toString ()) // => '//host/abc-%25-sign'
You can pass percent-decoded values to RawUrl.set by explicitly setting percentCoded to false. Percent characters in values will then be encoded; specifically, they will be replaced with %25
.
var rawUrl = new RawUrl ('//host/')
rawUrl = rawUrl.set ({ file:'abc-%-sign', percentCoded:false })
rawUrl.file // => 'abc-%25-sign'
rawUrl.toString () // => '//host/abc-%25-sign'
Note that if no percentCoded value is specified, then Url.set assumes percentCoded to be false whilst RawUrl.set assumes percentCoded to be true.
var url = new Url ('//host/') .set ({ file:'%61bc' })
url.file // => '%61bc'
url.toString () // => '//host/%2561bc'
var rawUrl = new RawUrl ('//host/') .set ({ file:'%61bc' })
url.file // => '%61bc'
rawUrl.toString () // => '//host/%61bc'
url.toString ()
Converts an Url object to a string. Percent encodes only a minimal set of codepoints. The resulting string may contain non-ASCII codepoints.
var url = new Url ('http://๐ฟ๐ฟ๐ฟ/{braces}/hสษช')
url.toString ()
// => 'http://๐ฟ๐ฟ๐ฟ/%7Bbraces%7D/hสษช'
url.toASCII (), url.toJSON (), url.href
Converts an Url object to a string that contains only ASCII code points. Non-ASCII codepoints in components will be percent encoded and/ or punycoded.
var url = new Url ('http://๐ฟ๐ฟ๐ฟ/{braces}/hสษช')
url.toASCII ()
// => 'http://xn--8h8haa/%7Bbraces%7D/h%CA%8C%C9%AA'
url.toURI ()
Uses url.toASCII () to convert url to an RFC3986 URI. Throws an error if url does not have a scheme, because URIs must always have a scheme.
url.normalize (), url.normalise ()
Returns a new Url object by normalizing url
.
This interprets a.o. .
and ..
segments within the path and removes default ports and trivial usernames/ passwords from the authority of url
.
new Url ('http://foo/bar/baz/./../bee') .normalize () .toString ()
// => 'http://foo/bar/bee'
url.percentEncode ()
Returns a RawUrl object by percent-encoding the properties of url
according to the Standard. Prevents double escaping of percent-encoded-bytes in the case of RawUrl objects.
url.percentDecode ()
Returns an Url object by percent-decoding the properties of url
if it is a RawUrl, and leaving them as-is otherwise.
url.goto (url2)
Returns a new Url object by 'extending' url with url2, where url2 may be a string, an Url or a RawUrl object.
new Url ('/foo/bar') .goto ('baz/index.html') .toString ()
// => '/foo/baz/index.html'
new Url ('/foo/bar') .goto ('//host/path') .toString ()
// => '//host/path'
new Url ('http://foo/bar/baz/') .goto ('./../bee') .toString ()
// => 'http://foo/bar/baz/./../bee'
If url2 is a string, it will be parsed with the scheme of url as a fallback scheme. TODO: if url has no scheme then โฆ
new Url ('file://host/dir/') .goto ('c|/dir2/') .toString ()
// => 'file://host/c|/dir2/'
new Url ('http://host/dir/') .goto ('c|/dir2/') .toString ()
// => 'http://host/dir/c|/dir2/'
url.isBase ()
Returns a boolean, indicating if url is a base-URL. What is and is not a base-URL, depends on the scheme of an URL. For example, http
- and file
-URLs that do not have a host are not base-URLs.
url.force ()
Forcibly convert an Url to a base-URL according to this URL Specification, in accordance with the WHATWG Standard.
- In
file
URLs without hostname, the hostname will be set to''
. - For URLs that have a scheme being one of
http
,https
,ws
,wss
orftp
and an absent or empty authority, the authority will be 'stolen from the first nonempty path segment'. - In the latter case, an error is thrown if url cannot be forced. This happens if it has no scheme, or if it has an empty host and no non-empty path segment.
new Url ('http:foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http:/foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http://foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http:///foo/bar') .force () .toString ()
// => 'http://foo/bar'
url.genericResolve (base) โ RFC3986 - strict
Resolve an Url object url against a base URL base according to the strict reference resolution algorithm as defined in RFC3986.
url.legacyResolve (base) โ RFC 3986 - non-strict
Resolve an Url object url against a base URL base according to the non-strict reference resolution algorithm as defined in RFC3986.
url.WHATWGResolve (base), aka. url.resolve
Resolve an Url object url against a base URL base in a way that is compatible with the error-correcting, forcing reference resoluton algorithm as defined in the WHATWG Standard.
- Converted the project from a CommonJS Module to an ES Module.
- Updated the core to use spec-url version 2.0.0-dev.1
- Changes to the API for reference resolution.
ReUrl now exposes three methods for reference resolution:
- url.genericResolve (base)
- url.legacyResolve (base)
- url.WHATWGResolve (base), also known as
- url.resolve (base)
MIT.
Enjoy!