Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nip 94 - File Header #337

Merged
merged 20 commits into from
Apr 20, 2023
Merged

Nip 94 - File Header #337

merged 20 commits into from
Apr 20, 2023

Conversation

frbitten
Copy link
Contributor

@frbitten frbitten commented Mar 9, 2023

A first approach to organize the dissemination and management of files in the NOSTR protocol

@jonas-lundqvist I used the Hash suggestion so that whoever downloads it can confirm that the file is correct and is still the same as at the time of the event

@kinakuta-co I used your idea to allow encrypted files to avoid server censorship.

@fiatjaf It may generate a certain conflict of interest with the NIP-78, but I preferred to make a separate nip than suggest a change in the NIP-78.

@staab
Copy link
Member

staab commented Mar 9, 2023

I'm a yes on both. This spec is well-written and very simple. It's nice that 94 and 95 are separate so relays can advertise support for one and not the other. As an aside, #259 would allow non-supporting relays to recommend a server they trust to host NIP 95 files.

The one reservation I have is making these replaceable. That is how links work on the internet, but it leads to a lot of possible rug-pull type scenarios. Maybe name could be a separate tag, and d could be an optional "slot" (uuid or semantic pointer of some kind).

@fiatjaf
Copy link
Member

fiatjaf commented Mar 9, 2023

Looks good to me, but I missed the link between the two.

@frbitten
Copy link
Contributor Author

frbitten commented Mar 9, 2023

I'm a yes on both. This spec is well-written and very simple. It's nice that 94 and 95 are separate so relays can advertise support for one and not the other. As an aside, #259 would allow non-supporting relays to recommend a server they trust to host NIP 95 files.

The one reservation I have is making these replaceable. That is how links work on the internet, but it leads to a lot of possible rug-pull type scenarios. Maybe name could be a separate tag, and d could be an optional "slot" (uuid or semantic pointer of some kind).

The d tag refers to the NIP-33 that defines replaceable events. I suggested the id be the file name to be more readable than any ID. But you can put an ID and use the "title" tag to put the name.

I imagine that the fact that they are replaceable generates some implementation complications. But since there is already this behavior defined for events, I thought it fit the scenario. But I'll only be sure when I start trying to implement the proposal.

@frbitten
Copy link
Contributor Author

frbitten commented Mar 9, 2023

Looks good to me, but I missed the link between the two.

I first thought of NIP-95 but the problem of how to propagate it without overloading the network by sending very large events worried me. When reading the other nips and issues about files, I came up with the idea of NIP-94 being a file header, which would solve the problem of disclosing the NIP-95 event and other files for external sharing.

As I put it in the NIP-95, it should not be returned in searches on the relay. So the only way to know that the NIP-95 event exists is via a NIP-94 event. With this, the NIP-95 is only transmitted when someone really wants to access the complete file.

@fiatjaf
Copy link
Member

fiatjaf commented Mar 9, 2023

OK, that sounds good.

The only question I have is: is anyone really going to want to store very large files like that? Does that really make sense in practice? It's ok to consider it as a possibility, but it is a bad idea to have a NIP that no one will ever implement.

@frbitten
Copy link
Contributor Author

frbitten commented Mar 9, 2023

OK, that sounds good.

The only question I have is: is anyone really going to want to store very large files like that? Does that really make sense in practice? It's ok to consider it as a possibility, but it is a bad idea to have a NIP that no one will ever implement.

Large files really have some doubts if one day they will be used. I understand large files as something with tens or hundreds of MB.

But smaller files I think makes more sense. Like for example a social network of photos (pinterest), or memes, even animated gifs

But for something like the image below I think it can be very useful
gifs-telegran

But of course we will only be sure of something when someone starts using it. But I imagine that 99% of relays would not implement NIP95. Only very specific cases.

The relay can even define the maximum size of data it accepts.

@mikedilger
Copy link
Contributor

minor: please put endquotes after the JSON, I can't read all that red text.

94.md Outdated Show resolved Hide resolved
@v0l
Copy link
Member

v0l commented Mar 10, 2023

I'm a yes on both. This spec is well-written and very simple. It's nice that 94 and 95 are separate so relays can advertise support for one and not the other. As an aside, #259 would allow non-supporting relays to recommend a server they trust to host NIP 95 files.

The one reservation I have is making these replaceable. That is how links work on the internet, but it leads to a lot of possible rug-pull type scenarios. Maybe name could be a separate tag, and d could be an optional "slot" (uuid or semantic pointer of some kind).

Is it possible to use the event id when referencing the file? Or will relays just delete the old event when its replaced?

Would be kinda nice to have a history actually.

@Egge21M
Copy link
Contributor

Egge21M commented Mar 10, 2023

I proposed something very similar a couple of months ago (#112). Back then the general consensus was, that we should not be storing base64 encoded files on relay, due to the inefficiency of that approach. Therefore I came up with nostr-ing which is very similar to this approach, but moves NIP-95 to a subprotocol that allows the transmission and storage of raw binary data.

@frbitten
Copy link
Contributor Author

frbitten commented Mar 10, 2023

Is it possible to use the event id when referencing the file? Or will relays just delete the old event when its replaced?
Would be kinda nice to have a history actually.

I don't know how the NIP-33 is expected to behave. Because if you really change the event ID when overwriting all references will be lost. If you keep the ID, relays that do not have the NIP implementation will have 2 events with the same ID that can cause problems.

I'll open an issue asking about this working and tag the authors of NIP-33.
Opened issue #346

@frbitten
Copy link
Contributor Author

I proposed something very similar a couple of months ago (#112). Back then the general consensus was, that we should not be storing base64 encoded files on relay, due to the inefficiency of that approach. Therefore I came up with nostr-ing which is very similar to this approach, but moves NIP-95 to a subprotocol that allows the transmission and storage of raw binary data.

I will read your protocol suggestion later and comment on it.
On the question of performance it is not true at all. There are ways to store that lose a little in space but can gain in availability.
The point is that relays that use a relational database are impractical to implement NIP-95. You will need a No-SQL database such as mongodb, which even already has tests and uses with storing large files. The main advantage of using mongodb is the ease of replicating data across multiple servers.
You can read more details here:
https://www.mongodb.com/developer/products/mongodb/storing-large-objects-and-files/
https://www.mongodb.com/docs/manual/core/gridfs/

An alternative to base64 is BSON which can also be used. I opted for Base64 because it is something better known, but I can include the BSON option in NIP-95 as well.

But my idea of the NIP-95 is sharing small files and data. Icons, images. And just be the communication protocol and not necessarily the storage. As I described, the relay can store it on disk and give http access to the file.

@frbitten
Copy link
Contributor Author

frbitten commented Mar 10, 2023

I think I should split this PullRequest in two. One for NIP-94 and one for NIP-95. I see that the NIP-95 will demand much more discussion and the approval of one does not depend on the approval of the other.

@frbitten frbitten changed the title Nip 94 and 95 Nip 94 - File Header Mar 10, 2023
@Egge21M
Copy link
Contributor

Egge21M commented Mar 10, 2023

I will read your protocol suggestion later and comment on it. On the question of performance it is not true at all. There are ways to store that lose a little in space but can gain in availability. The point is that relays that use a relational database are impractical to implement NIP-95. You will need a No-SQL database such as mongodb, which even already has tests and uses with storing large files. The main advantage of using mongodb is the ease of replicating data across multiple servers. You can read more details here: https://www.mongodb.com/developer/products/mongodb/storing-large-objects-and-files/ https://www.mongodb.com/docs/manual/core/gridfs/

I dont get your point. Storing a file in Base64 will increase required storage space compared to storing the same file in binary. Of course there is compression that will reduce the impact.

The reason why mongoDB has no issues storing large files is pricesely that they do not store base64 encoded data, but BSON (as you mentioned yourself), which supports raw binary. Adding BSON to this nip wouldn't make sense IMHO because the nostr protocol does not support the transmission of BSON anyways. Therefore relays would have to do all the encoding and decoding BSON -> Base64 and then clients would again have to decode base64 -> to binary to use the file.

@frbitten
Copy link
Contributor Author

I moved the NIP-95 to a Pull Request of its own. We can continue the discussion about it in #345

@frbitten
Copy link
Contributor Author

I dont get your point. Storing a file in Base64 will increase required storage space compared to storing the same file in binary. Of course there is compression that will reduce the impact.

Base64 encoding adds 1 byte to the data for every 3 original binary bytes. Which means that storing in base64 increases the final space used.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 12, 2023

Isn't a replaceable event a bad design here?

Think about a File Header that is highly shared as a tag inside a TextNote (TextNote references the file). Clients can easily preview the file as part of the experience. But if the File Header author's keys leak and the attacker replaces the file (and hash) with porn, everyone will have a post citing content they didn't intend in the first place.

My suggestion is to make a regular event. Once created, you cannot replace it.

Same issue on #345

94.md Outdated Show resolved Hide resolved
94.md Outdated Show resolved Hide resolved
@frbitten
Copy link
Contributor Author

frbitten commented Apr 5, 2023

Can we standardize the encryption mechanism to be always the same single one and use only that always until further notice?

I don't see any problem.

@jb55
Copy link
Contributor

jb55 commented Apr 11, 2023

Can we standardize the encryption mechanism to be always the same single one and use only that always until further notice?

👍 having one way of doing things (initially) is the nostr way.

@jb55
Copy link
Contributor

jb55 commented Apr 11, 2023

I love the blur hash + encrypted file thing. this would be great for selling ahem personal photos.

@v0l
Copy link
Member

v0l commented Apr 12, 2023

Merged NIP-81 tags into NIP-94

{
    "pubkey": "63fe6318dc58583cfe16810f86dd09e18bfd76aabc24a0081ce2856f330504ed",
    "kind": 1063,
    "created_at": 1681292347,
    "content": "2.jpg",
    "tags": [
        [
            "url",
            "https://void.cat/d/LjV3oNj3EES3YGBr4yaac2.webp"
        ],
        [
            "x",
            "109a4fb847fe5b8c9c0093affd6bc8b0069998285d8f560382f7025670d203ff"
        ],
        [
            "m",
            "image/webp"
        ],
        [
            "size",
            "24544"
        ],
        [
            "magnet",
            "magnet:?xt=urn:btih:5c9a5df8e67887bd8c9fef4242bfcdebed0d6578&dn=2.jpg&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A6969%2Fannounce&tr=http%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&xs=https%3A%2F%2Fvoid.cat%3A443%2Fd%2FLjV3oNj3EES3YGBr4yaac2.torrent"
        ],
        [
            "i",
            "5c9a5df8e67887bd8c9fef4242bfcdebed0d6578"
        ]
    ],
    "id": "bb61a769ccd90701b5b398c3d7910815d88824bcbf67e829b33047f6b7f689c1",
    "sig": "2ed85bf7ba117cab8e349540dd7bea33e3662ef1e07c4288f8c95df804e42466d9156519fc37c6c70deb3b1bc1270067d9c496598941882ed34327e0c0e6cb63"
}

@frbitten @lovvtide - Ok with this? Want to move forward with this

From NIP-81 you also have secret and blurhash these work the same in NIP-94 and i didn't include them in the above json.

secret = decrypt (discussion not resolved from NIP-94)
blurhash = hash

  • I dont think it makes sense to index this so its simply would be included as ["blurhash", "<value>"]

@fiatjaf
Copy link
Member

fiatjaf commented Apr 12, 2023

Looks good to me.

94.md Outdated Show resolved Hide resolved
@frbitten
Copy link
Contributor Author

@fiatjaf @staab @v0l @arthurfranca @jb55 @vitorpamplona @mikedilger @lovvtide
Updated NIP to be in consensus with NIP-81. Please review in case you missed something.

I'm going to close all the top suggestions so as not to confuse with what has already been commented. New suggestions and corrections please post below and based on this latest version.

@lovvtide
Copy link

lovvtide commented Apr 12, 2023

@frbitten @v0l @fiatjaf There's just one thing - we agreed in the other thread that the size tag should be required and not optional, right? See #417 (comment)

Edit: One other thing - wouldn't it be a good idea to standardize an (optional) tag for the file name? I suggested having an optional name tag, and @frbitten I remember you said that people could maybe use the NIP-14 subject tag. I think many apps are going to need files to have a name and it would be a good idea to not leave that ambiguous.

Otherwise looks good to me!

@arthurfranca
Copy link
Contributor

r is already used elsewhere as "reference" with url value.
The reason to make it searchable is that I imagined a client may want to have an user media panel that the user would fill by using a context menu pop-up over images on notes (like "add to memes collection" for instance).
For that, it would detect used image url on kind 1 note content and create a corresponding kind 1063 event. Making 1063 url tag searchable will allow the client to know if an event with that url from the same pubkey already exists.
It would be an alternative to searching by hash.
Suggested change
* url the url to download the file
* r the url to download the file, without trailing slash

I've suggested using r tag and @frbitten had accepeted it but now with NIP-81 merge it got reverted. I still think it would be useful for the use case i mentioned above. For images it may be ok to hash it before searching, but for videos maybe not. What do you think?

@frbitten
Copy link
Contributor Author

@frbitten @v0l @fiatjaf There's just one thing - we agreed in the other thread that the size tag should be required and not optional, right? See #417 (comment)

Edit: One other thing - wouldn't it be a good idea to standardize an (optional) tag for the file name? I suggested having an optional name tag, and @frbitten I remember you said that people could maybe use the NIP-14 subject tag. I think many apps are going to need files to have a name and it would be a good idea to not leave that ambiguous.

Otherwise looks good to me!

The problem with forcing to inform a size that the tendency is to have wrong values in the tag. Or force customers to download the file to check the size.

Because in many cases the file will already be in the URL, torrent or magnet and whoever wants to publish it in NOSTR will not know the size.

I'll include the name tag to avoid confusion.

@frbitten
Copy link
Contributor Author

r is already used elsewhere as "reference" with url value.
The reason to make it searchable is that I imagined a client may want to have an user media panel that the user would fill by using a context menu pop-up over images on notes (like "add to memes collection" for instance).
For that, it would detect used image url on kind 1 note content and create a corresponding kind 1063 event. Making 1063 url tag searchable will allow the client to know if an event with that url from the same pubkey already exists.
It would be an alternative to searching by hash.
Suggested change

  • url the url to download the file
  • r the url to download the file, without trailing slash

I've suggested using r tag and @frbitten had accepeted it but now with NIP-81 merge it got reverted. I still think it would be useful for the use case i mentioned above. For images it may be ok to hash it before searching, but for videos maybe not. What do you think?

@fiatjaf who suggested keeping as url. I don't know if it's a good idea to abuse indexable tags for the long term of NOSTR. But to serve everyone I would change "url" to "r" . Let's wait a few days to see if anyone objects to this.

@fiatjaf
Copy link
Member

fiatjaf commented Apr 12, 2023

@arthurfranca I think the r URL tag was about annotating/commenting on webpages, right? In that context it makes sense to have a indexable URL. Here it doesn't.

@arthurfranca
Copy link
Contributor

@fiatjaf I mentioned the use case for the indexable url above. It would be useful to search by url to know if the item already exists before trying to add an image/video to a personal "gallery" of NIP-94 events. It could instead be done by searching by hash but I wonder if it would be a problem for videos (would have to download it completely to hash it).

@v0l
Copy link
Member

v0l commented Apr 13, 2023

LGTM

@cryptoquick
Copy link

Would it be okay to indicate what type of hash is being used, perhaps with a prefix like sha256: for the hash field? This feature would be really useful for us to implement Nostr support for carbonado.io, but we use blake3 because it's much faster. Technically sha256 hw acceleration is almost as fast as blake3, but that's really only for datacenter server CPUs like EPYC; sha2 acceleration extensions are not widespread like AES is.

Then, if a client doesn't support that hash alg, hash verification can be skipped. If a reasonably complete set of hash algs are supported by the client, then anything using an unsupported alg can be ignored.

Does this all make sense?

@lovvtide
Copy link

lovvtide commented Apr 20, 2023

@cryptoquick So this was my original idea as described in NIP-81, that there could be multiple x tags with different hashes and a marker of some kind to differentiate the kind of hash, but as I recall we decided instead that it would be better to use a different tag for each new kind of hash, i.e. to use tags to differentiate usage instead of markers. As is stands now we're using x for sha256 and i for the torrent infohash. For your use case, the simplest thing to do would be to just use another tag for blake3

@fiatjaf
Copy link
Member

fiatjaf commented Apr 20, 2023

Would it be okay to indicate what type of hash is being used, perhaps with a prefix like sha256: for the hash field? This feature would be really useful for us to implement Nostr support for carbonado.io, but we use blake3 because it's much faster. Technically sha256 hw acceleration is almost as fast as blake3, but that's really only for datacenter server CPUs like EPYC; sha2 acceleration extensions are not widespread like AES is.

Then, if a client doesn't support that hash alg, hash verification can be skipped. If a reasonably complete set of hash algs are supported by the client, then anything using an unsupported alg can be ignored.

Does this all make sense?

No, this defeats the entire purpose of even having a hash field. The only two possible outcomes of having multiple hashes are

  1. No one implements anything and hashes are never checked
  2. Everybody has to implement all possible hashing algorithms, which increases the burden on everybody

If blake3 is so much better then everybody should agree on using blake3 and not sha256, but since everybody is already doing sha256 all the time on all these events, sha256 is already available in all Nostr implementations and so on it will be hard to change this now.

@fiatjaf
Copy link
Member

fiatjaf commented Apr 20, 2023

I'm looking at https://github.com/diba-io/carbonado now and it's a very interesting project, but I don't see how it could be fully integrated into Nostr at all ever, so I don't see the point in trying to achieve compatibility just with the hashing algorithm.

@fiatjaf fiatjaf merged commit 34af61d into nostr-protocol:master Apr 20, 2023
@s3x-jay
Copy link

s3x-jay commented Apr 20, 2023

I like the general idea of this NIP. Unlike actually storing the files on relays, it doesn't make the problems with CSAM and pirated content any worse than they are now since it's basically just relating meta data about the file.

One suggestion - mention the NIP-36 content-warning tag. Technically NIP-36 content warnings can be added to any event, but they're particularly relevant here.

And the proposed "NIP-69" defines a "vocabulary" that can be used with NIP-36 (and NIP-56).

@frbitten frbitten deleted the NIP-94 branch April 20, 2023 14:07
@vitorpamplona
Copy link
Collaborator

+1 for SHA256 instead of blake3.

If we want faster hashing, I suggest getting ready to migrate all of Nostr from SHA256 to Blake3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.