Missing "/" in path when url contains a scheme other than "https" #773

slondr · 2022-06-09T00:52:41Z

Hi, I've been chasing down a bug in my application to this behavior which seems inconsistent with the docs. For example

    #[test]
    fn url_parsing_library_is_sane() {
	let url = url::Url::parse("https://example.com").unwrap();
	assert_eq!(url.path(), "/"); // passes
    }

    #[test]
    fn url_parsing_library_is_sane_2() {
	let url = url::Url::parse("spartan://example.com").unwrap();
	assert_eq!(url.path(), "/") // fails
    }

    #[test]
    fn url_parsing_library_is_sane_3() {
	let url = url::Url::parse("spartan://example.com").unwrap();
	assert!(!url.cannot_be_a_base()); // passes
    }

    #[test]
    fn url_parsing_library_is_sane_4() {
	let url = url::Url::parse("asdfuiop://example.com").unwrap();
	assert_eq!(url.path(), "/") // fails
    }

The docs for path mention:

For cannot-be-a-base URLs, this is an arbitrary string that doesn’t start with ‘/’. For other URLs, this starts with a ‘/’ slash and continues with slash-separated path segments.

It seems like URLs with a scheme other than HTTPS are correctly identified as can be a base, but the path does not start with / anyway. If I'm reading the docs correctly, all four of these tests should pass.

The text was updated successfully, but these errors were encountered:

erickt · 2022-11-18T19:18:57Z

I bisected this change down to being introduced in #537 and released in v2.1.1, specifically the commit 9cd646.

@o0Ignition0o and @nox - is this expected behavior, or is this a bug?

o0Ignition0o · 2022-12-30T13:41:02Z

Hey, sorry for the late reply. Good question, I m gonna check

o0Ignition0o · 2022-12-30T13:47:19Z

It looks like a but to me:

Section 8 of scheme state mentions:

Otherwise, if [remaining](https://url.spec.whatwg.org/#remaining) starts with an U+002F (/), 
set state to [path or authority state](https://url.spec.whatwg.org/#path-or-authority-state)
 and increase pointer by 1.

Let's see if I can write a test case.

Edit: nvm, checking whether the path shall be / by default for custom schemes

o0Ignition0o · 2022-12-30T13:55:57Z

I understand this sentence:

A special URL’s path is always a list, i.e., it is never opaque.

As "Special schemes always have at least an emty list as path", but I don't see it apply to non special schemes. I'm going to browse a bit more

o0Ignition0o · 2022-12-30T14:05:20Z

This seems to be aligned with the comment @SimonSapin wrote in the path() getter:

    /// Return the path for this URL, as a percent-encoded ASCII string.
    /// For cannot-be-a-base URLs, this is an arbitrary string that doesn’t start with '/'.
    /// For other URLs, this starts with a '/' slash
    /// and continues with slash-separated path segments.

Cannot be a base URLs only happen to special URLs

annevk · 2023-02-06T10:21:22Z

Right, this is by design. I recommend closing this bug and perhaps adding documentation that if you want your custom scheme to have similar processing to a special scheme you need to implement that on top in a processor of sorts.

riccardodivirgilio · 2023-02-24T15:00:05Z

Hi I have been investigating a similar inconsistency as well, this bug seems to be similar enough, let me know if you prefer a different issue to be opened.

use url::{Host, Url};

fn url_parse(s: String) {

    let result = Url::parse(&s);


    println!("parsing: {}", s);

    match result {
        Ok(parsed) => {
            println!("scheme: {}", parsed.scheme());

            match parsed.host() {
                Some(Host::Domain(s)) => println!("host: {}", s),
                Some(Host::Ipv4(s)) => println!("host: {}", &s.to_string()),
                Some(Host::Ipv6(s)) => println!("host: {}", &s.to_string()),
                _  => println!("host: -"),
            }

            println!("path: {}", parsed.path());

        }
        Err(error) => {

            println!("error: {}", error);
        },
    }

    println!("");

}

fn main() {
    // Statements here are executed when the compiled binary is called

    url_parse("file://ciao//bye".to_string());
    url_parse("http://ciao//bye".to_string());

    // Print text to the console
    url_parse("file:///ciao//bye".to_string());
    url_parse("http:///ciao//bye".to_string());
}

The output of this program is:

parsing: file://ciao//bye
scheme: file
host: ciao
path: /bye

parsing: http://ciao//bye
scheme: http
host: ciao
path: //bye

parsing: file:///ciao//bye
scheme: file
host: -
path: /ciao//bye

parsing: http:///ciao//bye
scheme: http
host: ciao
path: //bye

an identical program written in python is this one:

from urllib.parse import urlparse

def url_parse(url):
    parsed = urlparse(url)
    print('parsing: ', url)
    print('scheme: ', parsed.scheme)
    print('host: ', parsed.netloc)
    print('path: ', parsed.path)
    print('')


url_parse("file://ciao//bye")
url_parse("http://ciao//bye")

url_parse("file:///ciao//bye")
url_parse("http:///ciao//bye")

and it would output the following:

parsing:  file://ciao//bye
scheme:  file
host:  ciao
path:  //bye

parsing:  http://ciao//bye
scheme:  http
host:  ciao
path:  //bye

parsing:  file:///ciao//bye
scheme:  file
host:  
path:  /ciao//bye

parsing:  http:///ciao//bye
scheme:  http
host:  
path:  /ciao//bye

in particular python behaviour seems to be doing a better job in keeping trailing slashes in path, that in rust are silently dropped in some specific circumstances (file scheme vs http), and it has a more coherent behaviour regardless of the scheme. also the parsed outcome of "http:///ciao//bye" seems to be completely wrong in rust...

valenting · 2023-02-27T10:39:48Z

also the parsed outcome of "http:///ciao//bye" seems to be completely wrong in rust...

It parses correctly as far as I can see

tmccombs mentioned this issue Feb 5, 2023

Allow specifying additional "special" schemes. whatwg/url#749

Open

valenting closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing "/" in path when url contains a scheme other than "https" #773

Missing "/" in path when url contains a scheme other than "https" #773

slondr commented Jun 9, 2022 •

edited

Loading

erickt commented Nov 18, 2022

o0Ignition0o commented Dec 30, 2022

o0Ignition0o commented Dec 30, 2022 •

edited

Loading

o0Ignition0o commented Dec 30, 2022

o0Ignition0o commented Dec 30, 2022 •

edited

Loading

annevk commented Feb 6, 2023

riccardodivirgilio commented Feb 24, 2023 •

edited

Loading

valenting commented Feb 27, 2023

Missing "/" in path when url contains a scheme other than "https" #773

Missing "/" in path when url contains a scheme other than "https" #773

Comments

slondr commented Jun 9, 2022 • edited Loading

erickt commented Nov 18, 2022

o0Ignition0o commented Dec 30, 2022

o0Ignition0o commented Dec 30, 2022 • edited Loading

o0Ignition0o commented Dec 30, 2022

o0Ignition0o commented Dec 30, 2022 • edited Loading

annevk commented Feb 6, 2023

riccardodivirgilio commented Feb 24, 2023 • edited Loading

valenting commented Feb 27, 2023

slondr commented Jun 9, 2022 •

edited

Loading

o0Ignition0o commented Dec 30, 2022 •

edited

Loading

o0Ignition0o commented Dec 30, 2022 •

edited

Loading

riccardodivirgilio commented Feb 24, 2023 •

edited

Loading