Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing "/" in path when url contains a scheme other than "https" #773

Closed
slondr opened this issue Jun 9, 2022 · 8 comments
Closed

Missing "/" in path when url contains a scheme other than "https" #773

slondr opened this issue Jun 9, 2022 · 8 comments

Comments

@slondr
Copy link

slondr commented Jun 9, 2022

Hi, I've been chasing down a bug in my application to this behavior which seems inconsistent with the docs. For example

    #[test]
    fn url_parsing_library_is_sane() {
	let url = url::Url::parse("https://example.com").unwrap();
	assert_eq!(url.path(), "/"); // passes
    }

    #[test]
    fn url_parsing_library_is_sane_2() {
	let url = url::Url::parse("spartan://example.com").unwrap();
	assert_eq!(url.path(), "/") // fails
    }

    #[test]
    fn url_parsing_library_is_sane_3() {
	let url = url::Url::parse("spartan://example.com").unwrap();
	assert!(!url.cannot_be_a_base()); // passes
    }

    #[test]
    fn url_parsing_library_is_sane_4() {
	let url = url::Url::parse("asdfuiop://example.com").unwrap();
	assert_eq!(url.path(), "/") // fails
    }

The docs for path mention:

For cannot-be-a-base URLs, this is an arbitrary string that doesn’t start with ‘/’. For other URLs, this starts with a ‘/’ slash and continues with slash-separated path segments.

It seems like URLs with a scheme other than HTTPS are correctly identified as can be a base, but the path does not start with / anyway. If I'm reading the docs correctly, all four of these tests should pass.

@erickt
Copy link

erickt commented Nov 18, 2022

I bisected this change down to being introduced in #537 and released in v2.1.1, specifically the commit 9cd646.

@o0Ignition0o and @nox - is this expected behavior, or is this a bug?

@o0Ignition0o
Copy link
Contributor

Hey, sorry for the late reply. Good question, I m gonna check

@o0Ignition0o
Copy link
Contributor

o0Ignition0o commented Dec 30, 2022

It looks like a but to me:

Section 8 of scheme state mentions:

Otherwise, if [remaining](https://url.spec.whatwg.org/#remaining) starts with an U+002F (/), 
set state to [path or authority state](https://url.spec.whatwg.org/#path-or-authority-state)
 and increase pointer by 1.

Let's see if I can write a test case.

Edit: nvm, checking whether the path shall be / by default for custom schemes

@o0Ignition0o
Copy link
Contributor

I understand this sentence:

A special URL’s path is always a list, i.e., it is never opaque.

As "Special schemes always have at least an emty list as path", but I don't see it apply to non special schemes. I'm going to browse a bit more

@o0Ignition0o
Copy link
Contributor

o0Ignition0o commented Dec 30, 2022

This seems to be aligned with the comment @SimonSapin wrote in the path() getter:

    /// Return the path for this URL, as a percent-encoded ASCII string.
    /// For cannot-be-a-base URLs, this is an arbitrary string that doesn’t start with '/'.
    /// For other URLs, this starts with a '/' slash
    /// and continues with slash-separated path segments.

Cannot be a base URLs only happen to special URLs

@annevk
Copy link

annevk commented Feb 6, 2023

Right, this is by design. I recommend closing this bug and perhaps adding documentation that if you want your custom scheme to have similar processing to a special scheme you need to implement that on top in a processor of sorts.

@riccardodivirgilio
Copy link

riccardodivirgilio commented Feb 24, 2023

Hi I have been investigating a similar inconsistency as well, this bug seems to be similar enough, let me know if you prefer a different issue to be opened.

use url::{Host, Url};

fn url_parse(s: String) {

    let result = Url::parse(&s);


    println!("parsing: {}", s);

    match result {
        Ok(parsed) => {
            println!("scheme: {}", parsed.scheme());

            match parsed.host() {
                Some(Host::Domain(s)) => println!("host: {}", s),
                Some(Host::Ipv4(s)) => println!("host: {}", &s.to_string()),
                Some(Host::Ipv6(s)) => println!("host: {}", &s.to_string()),
                _  => println!("host: -"),
            }

            println!("path: {}", parsed.path());

        }
        Err(error) => {

            println!("error: {}", error);
        },
    }

    println!("");

}

fn main() {
    // Statements here are executed when the compiled binary is called

    url_parse("file://ciao//bye".to_string());
    url_parse("http://ciao//bye".to_string());

    // Print text to the console
    url_parse("file:///ciao//bye".to_string());
    url_parse("http:///ciao//bye".to_string());
}

The output of this program is:

parsing: file://ciao//bye
scheme: file
host: ciao
path: /bye

parsing: http://ciao//bye
scheme: http
host: ciao
path: //bye

parsing: file:///ciao//bye
scheme: file
host: -
path: /ciao//bye

parsing: http:///ciao//bye
scheme: http
host: ciao
path: //bye

an identical program written in python is this one:

from urllib.parse import urlparse

def url_parse(url):
    parsed = urlparse(url)
    print('parsing: ', url)
    print('scheme: ', parsed.scheme)
    print('host: ', parsed.netloc)
    print('path: ', parsed.path)
    print('')


url_parse("file://ciao//bye")
url_parse("http://ciao//bye")

url_parse("file:///ciao//bye")
url_parse("http:///ciao//bye")

and it would output the following:

parsing:  file://ciao//bye
scheme:  file
host:  ciao
path:  //bye

parsing:  http://ciao//bye
scheme:  http
host:  ciao
path:  //bye

parsing:  file:///ciao//bye
scheme:  file
host:  
path:  /ciao//bye

parsing:  http:///ciao//bye
scheme:  http
host:  
path:  /ciao//bye

in particular python behaviour seems to be doing a better job in keeping trailing slashes in path, that in rust are silently dropped in some specific circumstances (file scheme vs http), and it has a more coherent behaviour regardless of the scheme. also the parsed outcome of "http:///ciao//bye" seems to be completely wrong in rust...

@valenting
Copy link
Collaborator

also the parsed outcome of "http:///ciao//bye" seems to be completely wrong in rust...

It parses correctly as far as I can see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants