Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

301 redirect : Wrong containment triples when fetching Container from NSS without trailing slash #617

Open
angelo-v opened this issue Apr 12, 2023 · 16 comments
Labels

Comments

@angelo-v
Copy link
Contributor

Steps to reproduce

  1. fetch a container from NSS without using a trailing slash in it's URI
  2. query the store for the same URI as subject and ldp:contains as predicate

Expected result

The canonical container URI is with a slash, so the query should not give a result

Actual result

A statement with a wrong relative URI as the object

Example

  1. Fetch https://bourgeoa.solidcommunity.net/chats
  2. Store contains the following triple:
<https://bourgeoa.solidcommunity.net/chats>
  <http://www.w3.org/ns/ldp#contains>
    <https://bourgeoa.solidcommunity.net/10b089b1-cec0-475f-9be3-7078d950c04c/> .

Correct containment statement would be

<https://bourgeoa.solidcommunity.net/chats/>
  <http://www.w3.org/ns/ldp#contains>
    <https://bourgeoa.solidcommunity.net/chats/10b089b1-cec0-475f-9be3-7078d950c04c/> .

Note the difference in the subject URI (trailing slash) and the object URI (relative to /chats/)

This statement is not in the store, besides following the redirect from NSS. Only when fetching directly with a trailing shlash everything works as expected.

Test to reproduce:

describe("container URI without trailing slashes", () => {
  it("should contain nothing", async () => {
    const store = graph();
    const fetcher = new Fetcher(store);
    await fetcher.load("https://bourgeoa.solidcommunity.net/chats");

    const contains = store.statementsMatching(
      sym("https://bourgeoa.solidcommunity.net/chats"),
      sym("http://www.w3.org/ns/ldp#contains")
    );
    expect(contains).toEqual([]);
  });
});
@bourgeoa
Copy link
Contributor

bourgeoa commented Apr 14, 2023

When a container URL is without a /, NSS is using a 301 redirect to a new URL ending with a /
The issue is with 301 return code. When using 200 the result is correct.

@timbl I am wondering if rdflib does check the header for location or only content-location ?

@bourgeoa bourgeoa changed the title Wrong containment triples when fetching Container from NSS without trailing slash 301 redirect : Wrong containment triples when fetching Container from NSS without trailing slash Apr 17, 2023
@csarven
Copy link
Member

csarven commented Apr 18, 2023

When a server receives a GET request to /chat, it should do the required:

  • If unauthorized to read /, respond with 403.
  • If authorized to read /, respond with 404.

It may do the optional:

  • If unauthorized to read /chats/, respond with 403.
  • If authorized to read /chats/, respond with 301 to /chats/, or:
  • If authorized to read /chats/, respond with 200 with Content-Location including the URL of the requested representation, e.g., /chats/.ttl, /chats/.html, /chats/.jsonld, where the payload refers to /chats/.

@TallTed

This comment was marked as resolved.

@timbl
Copy link
Member

timbl commented Apr 18, 2023

Here are all the headers:

$ curl -I https://bourgeoa.solidcommunity.net/chats
HTTP/1.1 301 Moved Permanently
X-Powered-By: solid-server/5.7.7
Vary: Accept, Authorization, Origin
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Authorization, User, Location, Link, Vary, Last-Modified, ETag, Accept-Patch, Accept-Post, Updates-Via, Allow, WAC-Allow, Content-Length, WWW-Authenticate, MS-Author-Via, X-Powered-By
Allow: OPTIONS, HEAD, GET, PATCH, POST, PUT, DELETE
Link: <chats.acl>; rel="acl", <chats.meta>; rel="describedBy", <http://www.w3.org/ns/ldp#Resource>; rel="type"
Location: /chats/
Content-Type: text/plain; charset=utf-8
Content-Length: 41
Date: Tue, 18 Apr 2023 21:39:09 GMT
Connection: keep-alive
Keep-Alive: timeout=5

$ curl -I https://bourgeoa.solidcommunity.net/chats/
HTTP/1.1 200 OK
X-Powered-By: solid-server/5.7.7
Vary: Accept, Authorization, Origin
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Authorization, User, Location, Link, Vary, Last-Modified, ETag, Accept-Patch, Accept-Post, Updates-Via, Allow, WAC-Allow, Content-Length, WWW-Authenticate, MS-Author-Via, X-Powered-By
Allow: OPTIONS, HEAD, GET, PATCH, POST, PUT, DELETE
Link: <.acl>; rel="acl", <.meta>; rel="describedBy", <http://www.w3.org/ns/ldp#Container>; rel="type", <http://www.w3.org/ns/ldp#BasicContainer>; rel="type"
WAC-Allow: user="read",public="read"
MS-Author-Via: SPARQL
Updates-Via: wss://bourgeoa.solidcommunity.net
Content-Type: application/octet-stream; charset=utf-8
Content-Length: 2
ETag: W/"2-nOO9QiTIwXgNtWtBJezz8kv3SLc"
Date: Tue, 18 Apr 2023 21:40:15 GMT
Connection: keep-alive
Keep-Alive: timeout=5

$ 

@timbl
Copy link
Member

timbl commented Apr 18, 2023

So you are saying it looks as though rdflib uses the location of the initial resource not the one after the redirect, as a base address for parsing the data.

@csarven

This comment was marked as resolved.

@bourgeoa
Copy link
Contributor

So you are saying it looks as though rdflib uses the location of the initial resource not the one after the redirect, as a base address for parsing the data.

Yes. When I looked at rdflib code I did not find any use of location in relation with 301, but only of content-location in fetcher

@bourgeoa
Copy link
Contributor

bourgeoa commented Apr 19, 2023

In the Handle fetch() response content-location is checked has if it was a single location
But location is nowhere checked
Is the rdflib webOperation meant to do :

  • use the resource location : not the case actually
  • and use content-location resources : not the case actually with multiple files to load

    rdflib.js/src/fetcher.ts

    Lines 1951 to 1965 in c5bcd95

    handleResponse (
    response: ExtendedResponse,
    docuri: string,
    options: AutoInitOptions
    ): Promise<FetchError | ExtendedResponse> | ExtendedResponse {
    const kb = this.store
    const headers = (response as Response).headers
    const reqNode = options.req
    const responseNode = this.saveResponseMetadata(response, options)
    const contentType = this.normalizedContentType(options, headers) || ''
    let contentLocation = headers.get('content-location')

@angelo-v
Copy link
Contributor Author

angelo-v commented May 3, 2023

Yes, I suppose in case of a 301 redirect rdflib should use the URI of the location header as the base URI, not the URI it requested initially and got redirected.

@timbl
Copy link
Member

timbl commented May 4, 2023

In https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L604 maybe original is the problem. I suspect it goes back to the original URI to avoid using the URI sent to a proxy when using a proxy. So we also want to avoid using the URI from the last fetch operation.

@NoelDeMartin
Copy link
Contributor

Well I just found out the same @timbl already said 😅. But yes, I can confirm that's the problem. I've tried to reproduce the issue, and changing the following in https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L1161 the test passes:

.then(response => {
    if (options.baseURI !== response.url) {
        options.baseURI = response.url;
    }

    if (options.original.value !== response.url) {
        options.original.value = response.url;
    }

    return this.handleResponse(response, response.url, options);
},

Not that this should be the solution, but it proves that the issue was indeed that the redirect was happening silently.

@bourgeoa
Copy link
Contributor

bourgeoa commented May 5, 2023

I found the same and checking for the Link Container in response of latest fetch does resolve the issue.

Replace https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L604
with

  let baseUrl = options.original.value
    const isContainer = kb.any(options.original, null, ns.ldp('Container'))
    console.log('@@ isContainer ' + isContainer)
    if (isContainer && !baseUrl.endsWith('/')) baseUrl = baseUrl + '/' 
    let p = N3Parser(kb, kb, options.original.value, baseUrl,
      null, null, '', null)

@bourgeoa
Copy link
Contributor

bourgeoa commented May 5, 2023

@angelo-v
#621 passes your test
here is the npm version [email protected] to test with PodOS

@angelo-v
Copy link
Contributor Author

It fixes the issue with wrong containment triples.

But still rdflib states that the resource without a trailing slash is of type ldp:Container which i think is wrong.

@bourgeoa
Copy link
Contributor

But still rdflib states that the resource without a trailing slash is of type ldp:Container which i think is wrong.

@angelo-v
I suppose this patch will resolve your point #621 (comment)
With this patch a slash is also added to the resource without a trailing slash of type ldp:Container.

@bourgeoa bourgeoa added the bug label May 17, 2023
@bourgeoa
Copy link
Contributor

@angelo-v
here is the npm version [email protected] to test with PodOS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants