Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebUI should (somehow) indicate a problematic directory entry #4292

Open
mib-kd743naq opened this issue Oct 7, 2017 · 11 comments
Open

WebUI should (somehow) indicate a problematic directory entry #4292

mib-kd743naq opened this issue Oct 7, 2017 · 11 comments
Labels
kind/bug A bug in existing code (including security flaws) status/deferred Conscious decision to pause or backlog topic/security Topic security
Milestone

Comments

@mib-kd743naq
Copy link
Contributor

Version information:

Any

Type:

Enhancement ( arguably Bug )

Severity:

Medium

Description:

When faced with a duplicate name in a directory, the UI does not signal in any way that something is amiss. There should be some sort of indication ( a red exclamation mark or something ) to visually alert the user "hey, watch out". I am not able to build FUSE locally, so I do not know how things behave there: likely needs to be investigated as well.

Here is an example DAG: https://ipfs.io/ipfs/zdjA8bKXBUttGJ7aXtG3DeuHYtAVXNj1jA5sKuscTJ1nk1xL9

This issue is tangentially related to #1710. There is also a somewhat explicit policy in the IPLD spec, but if the UI doesn't show anything about it - this is still a problem.

P.S. In case this is deemed a security issue that should not be public - feel free to delete this bugreport. I tried to use the security@ contact two times, but never received a response, so decided to file here before I move on.

@Stebalien
Copy link
Member

We should definitely be removing the duplicate entries.


As for the security implications, this definitely isn't a good bug to have but would probably be hard to exploit. The (really) worrying part is that you didn't get a timely response; I've opened up an internal discussion to figure out what happened there.


Note: That part of the IPLD spec is only tangentially related. IPFS directories have a very different structure.

@Stebalien Stebalien added kind/bug A bug in existing code (including security flaws) topic/security Topic security labels Oct 13, 2017
@mib-kd743naq
Copy link
Contributor Author

We should definitely be removing the duplicate entries.

Only if there are guarantees within all readers ( http, cat, fuse ) that the first encountered "named link" is the one that wins. Otherwise the problem will only get worse.

Stepping back a bit - I was really hoping for actual marking of all duplicates or semi-duplicates as "Hey! Something is fishy". The thing I constructed is a simple bit-for-bit duplication. However, when you consider all the nastiness within #1710, just hiding an offender is likely insufficient, as it will miss:

  • The x00 byte ( especially bad for FUSE )
  • The <CR> byte, that can be rigged to make ipfs ls output look... different
  • Various utf8 representations of the same codepoint ( which the browser will dutifully recombine )
  • Other things I couldn't come up with off the top of my head

The (really) worrying part is that you didn't get a timely response

It's growing pains within the team, it is understandable. If anything - this is a relatively low profile issue, so having the report-loop tested with that is way better than some sort of RCE ;)

@whyrusleeping whyrusleeping added P0 Critical: Tackled by core team ASAP status/ready Ready to be worked labels Oct 17, 2017
@kpcyrd
Copy link
Contributor

kpcyrd commented Oct 17, 2017

The byte, that can be rigged to make ipfs ls output look... different

This sounds like terminal control characters might go through as well. Maybe worth a try. A program that prints non-binary to stdout should never print special control characters, partly because of this, but also because it's sort-of an injection problem if stdout is piped to stdin of another program that expects a line separated "protocol". I usually never print user input without repr() or {:?} because of this. Did you test for newline?

Regarding duplicate entries: I think the \x00 byte might cause a truncation (depending on if there are null byte checks before go calls open(2)). Duplicate entries have been exploited on zip for android, I don't think I can recall all details, but the problem was that the first entry wins if you access the zip file with the library, but the last entry wins if you write it to disk, since the 2nd entry would just overwrite the first. For directories, it's probably going to merge them.

@mib-kd743naq
Copy link
Contributor Author

@kpcyrd I didn't even think to test this, but yes ( example outputs color codes only, is safe to ls ): https://ipfs.io/ipfs/QmZcYTLQNo1hJtGxQD8d9sHtmnBBamhDHydtqJ2zxzLSx7

~$ ipfs ls QmZcYTLQNo1hJtGxQD8d9sHtmnBBamhDHydtqJ2zxzLSx7 
QmeHXsYCtWa9KhC5ZME5pn97c1zTt7TgK37VgKJWWAdknn 9 
===========
RED ALERT
===========
~$ ipfs ls QmZcYTLQNo1hJtGxQD8d9sHtmnBBamhDHydtqJ2zxzLSx7 | hexdump -C
00000000  51 6d 65 48 58 73 59 43  74 57 61 39 4b 68 43 35  |QmeHXsYCtWa9KhC5|
00000010  5a 4d 45 35 70 6e 39 37  63 31 7a 54 74 37 54 67  |ZME5pn97c1zTt7Tg|
00000020  4b 33 37 56 67 4b 4a 57  57 41 64 6b 6e 6e 20 39  |K37VgKJWWAdknn 9|
00000030  20 0a 3d 3d 3d 3d 3d 3d  3d 3d 3d 3d 3d 0a 1b 5b  | .===========..[|
00000040  39 37 6d 1b 5b 35 6d 1b  5b 34 31 6d 1b 5b 35 6d  |97m.[5m.[41m.[5m|
00000050  52 45 44 20 41 4c 45 52  54 1b 5b 35 6d 1b 5b 30  |RED ALERT.[5m.[0|
00000060  6d 1b 5b 35 6d 0a 3d 3d  3d 3d 3d 3d 3d 3d 3d 3d  |m.[5m.==========|
00000070  3d 0a                                             |=.|
00000072

@mib-kd743naq
Copy link
Contributor Author

@Stebalien All the problems with non-printables aside - there is also a length-based DoS: http://ipfs.io/ipfs/QmVtFwE9CWfBk7WC2pCgPtMQsZ7zF6tgjbL5Vcpte7pLcy

Trying to follow the longer-named links results in either a 414 ( decent ) or 502 ( ugh ) on the gateway. On the CLI things are no better:

~$ ipfs ls QmVtFwE9CWfBk7WC2pCgPtMQsZ7zF6tgjbL5Vcpte7pLcy
Qme57pNu4VvdX5yZdVacxcr4VyxqtgoymeKGPzRmmm121Q 26 This string can be very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long
QmcMeKrhJXNcyExPtAYKUfbJpn6WGweR7KbyV6CRMbD3Rm 29 This subdir is *really* large - do not list its contents/
~$ ipfs ls 'QmVtFwE9CWfBk7WC2pCgPtMQsZ7zF6tgjbL5Vcpte7pLcy/This subdir is *really* large - do not list its contents/I mean really - this will dump 1mb of nyancat on your screen...' | wc -c
1114145

@Stebalien
Copy link
Member

Only if there are guarantees within all readers ( http, cat, fuse ) that the first encountered "named link" is the one that wins. Otherwise the problem will only get worse.

We'd do this at the lowest level so yes, it would cover all cases. I'd like to just reject such directories outright but directory sharding makes that impossible.

Stepping back a bit - I was really hoping for actual marking of all duplicates or semi-duplicates as "Hey! Something is fishy". The thing I constructed is a simple bit-for-bit duplication.

I'd like to handle this stuff at a layer below the UI. Users should (almost) never be asked to deal with security related things because they'll generally just be confused.

The thing I constructed is a simple bit-for-bit duplication. However, when you consider all the nastiness within #1710, just hiding an offender is likely insufficient, as it will miss:

So, we initially wanted Unix compatibility but, as you can see in that thread, sentiments have changed. Limiting unixfs paths to printable, UTF-8 byte sequences is reasonable.

there is also a length-based DoS

For long paths, we've considered limiting the length of individual components in a path (not entire paths, just components). However, we haven't picked a good limit. Currently, there's a hard limit of slightly less than 1MiB (due to other constraints) but we almost definitely want something shorter (4096?). Gateways can choose to reject any paths they want and we should probably set a "paths longer than X may not work, avoid them" limit.

As for large directories, that's something we can't fix. We want to, e.g., support dumping Wikipedia into IPFS (and every article on Wikipedia lives in /wiki). However, we do need to stream directories when listing them instead of blocking (I can't find the issue but I'm pretty sure there is one about this).


If you're feeling adventurous, want to read through #1710 and take a stab at a reasonable path restriction proposal?

@mib-kd743naq
Copy link
Contributor Author

If you're feeling adventurous, want to read through #1710

I have read it, multiple times: I was the one who linked it into the thread ;)

take a stab at a reasonable pathrestriction proposal?

I currently can't commit to that, due to severe time shortage ( sorry :/ ). I will only make a rather off-the-cuff remark about something I think has not been explicitly explored:

It seems that the question of "what fundamentally is the UnixFS portion of IPFS, and what naming rules apply" remains unresolved ( that's really a question for @jbenet and friends )

  • It could be seen as a new POSIX-ish Filesystem: in this case it should shed as much backwards compatilibity load and define ( as mostly outlined in IPFS permits undesirable paths #1710 ):

    • a restrictive range of characters available for names ( handling of non-printables )
    • enshrine into a document a list of reserved names: ( ., .. come to mind )
    • provide strict rules for unicode handling ( Normalization ( cue APFS disaster ), handling of invalid byte sequences and invalid poitns/surrogates
    • perhaps even go further and require linknames to contain at least one character matching the unicode classes of [\p{Letter}\p{Number}\p{Symbol}]
  • or it can be seen as a "VFS layer"-like system, which in essence must be POSIX, and allow for everything that Libarchive has to deal with

A different way of asking the above question is:

  • "Can I expect to be able to grab a random tar file from the internets, and represent its un-tarred contents faithfully within UnixFS". Which in turn implies that all consumers ( FUSE, Web Gateway, others ) must protect themselves as is the current "standard" within POSIX
  • OR are all IPFS/UnixFS dag-reader implementations supposed to abide by a certain set of rules, and thus any consumer ( FUSE, WebGateway, etc ) can forgo any and all sanity checks and just trust what a dagreader returned

An explicit answer to this will inform where the rest of this ticket goes. Furthermore this answer must be provided and codified sooner rather than later, due to the accelerating growth of 3rd party implementations

@Kubuxu
Copy link
Member

Kubuxu commented Nov 1, 2017

@mib-kd743naq my question is: it it bad that this can happen? and how does for example ls command work with it. Unix tooling is a great resource as they had to figure it out long ago and had a lot of time to think about it, it is still a resource and we might want to make some changes.

@mib-kd743naq
Copy link
Contributor Author

my question is: it it bad that this can happen?

@Kubuxu the problem is that I do not know :/

Intuitively it is a rather nasty situation when "input arriving from the internet" is able to manipulate your cursor position and essentially redraw the entirety of the terminal. There is something about IPFS being directly connected to the net that makes the scenario less palatable than say a .tar.gz with the same content being tar -ztf-ed ( though my version of tar handles this perfectly, see below ) .

Not having a clear and well thought through answer to "is this a problem", is why I sidestepped this question entirely, and instead put together the "which one of the two" writeup above. It seems clearer from that standpoint that IPFS can't be both "fully POSIX compliant" and "an upgraded protocol with less cruft".

For me personally - "Fully POSIX compliant" is the answer, with the ( significant ) work needed to make the tooling behave better. But I really want to defer this to @jbenet in part because of his proven visionary track record, and in part because he has extensively thought through this already ( or so I hope ).

how does for example ls command work with it

Way better than I expected:

/dev/shm$ ls fooo/
?[97m?[5m?[41m?[5mRED ALERT?[5m?[0m?[5m?
/dev/shm$ tar -tf fooo.tar 
fooo/
fooo/\033[97m\033[5m\033[41m\033[5mRED ALERT\033[5m\033[0m\033[5m\n

@Kubuxu
Copy link
Member

Kubuxu commented Nov 1, 2017

As we are working on ipld/legacy-unixfs-v2#1, if the decision is to allow all link names then best part forward (I think) would be to also to spec out way of escaping them for the user.

It seems clearer from that standpoint that IPFS can't be both "fully POSIX compliant" and "an upgraded protocol with less cruft".

I disagree with this statement, what is stored (and can be accessed through the API) versus what is presented to the user are two things that can be different. As most of shells are not about to handle binary strings (as variables) it makes sense to have encoding in presenting them.

@mib-kd743naq
Copy link
Contributor Author

@Kubuxu in light of

Currently, IPLD specifies that all map/object key names must be Unicode text
( via ipld/legacy-unixfs-v2#3 (comment) )

does my statement about "IPFS/IPLD can't be both at the same time" make more sense?

@Stebalien Stebalien added status/deferred Conscious decision to pause or backlog and removed status/ready Ready to be worked labels Dec 18, 2018
@Stebalien Stebalien removed the P0 Critical: Tackled by core team ASAP label Apr 22, 2021
@BigLep BigLep moved this to 🥞 Todo in IPFS Shipyard Team Mar 3, 2022
@BigLep BigLep added this to the TBD milestone Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) status/deferred Conscious decision to pause or backlog topic/security Topic security
Projects
No open projects
Status: 🥞 Todo
Development

No branches or pull requests

6 participants