Supporting existing mime types. #65

mikeal · 2018-06-23T05:08:37Z

I wanted to start a conversation about the best way to support existing mime types.

Specifically, I want to talk about data that doesn't have links but is often linked to, like images and video. It would be great not to re-invent the entire mime/content-type system for data without links.

Something along the lines of mime[audio/aac].

We also may want to consider the same for addressing compression of the format mime[audio/aac][gzip].

I looked around for a previous discussion around this but couldn't find anything. If there's another thread please point me at it :)

The text was updated successfully, but these errors were encountered:

vmx · 2018-06-25T10:49:40Z

Do you mean leveraging existing mime-types to describe the blocks (e.g. if you store a JPEG), so that the resolvers can correctly deal with the data?

mikeal · 2018-06-25T14:23:51Z

Exactly :)

…

On Mon, Jun 25, 2018, 3:49 AM Volker Mische ***@***.***> wrote: Do you mean leveraging existing mime-types to describe the blocks (e.g. if you store a JPEG), so that the resolvers can correctly deal with the data? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#65 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACQ59aZEoUwE-IU1eI41B_gxo5R_LSks5uAMBIgaJpZM4U0ph9> .

vmx · 2018-06-25T16:49:58Z

I looked into it some time ago (I can't remember why, I guess there was some other issue triggering it). I was wondering if there was an easy way to get unique identifiers (as in "hex value") from the IANA Media Types. The only idea I has was scraping the Templates, getting a date from them and then assigning an increasing value ourselves.

mitra42 · 2018-06-25T16:55:55Z

Why reinvent the wheel when there is an existing, extensible process for assigning them ? With higher level types, (image, video etc) ; splits off that, and when needed parameters to allow even more detail. Its not perfect for all situations, but its unlikely that any replacement would be perfect either, and it has the huge advantage that it integrates with other things - for example you can check what application your system wants to open the file in.

If you invented your own system everyone would just have to carry around a big conversion table in their apps and figure out how to continuously update it to match a new hex type to the table.

mikeal · 2018-06-25T17:21:19Z

I think the goal here *is* to avoid re-inventing the wheel. The problem is that we can’t use the raw strings for these mime types because it has to fit in a limited hex space. So the question is “how do we fit mime types into the space.” Perhaps the first thing we should do is reserve a hex range for mime types, enough to fit all current types and what we can expect for future types.

If you invented your own system everyone would just have to carry around a big conversion table in their apps and figure out how to continuously update it to match a new hex type to the table.

People already do this for mapping file extensions to mime types. Yes, it’s annoying, but it’s actually a common practice. If you’re using `request` you’ve got one somewhere in your node_modules ;) The size of the map is actually much smaller than what it takes to actually implement using the types anyway. I’ve seen CID’s be compared a lot to URL’s, but that’s only half the story. In HTTP the URL only tells you where the resources is, the Content-Type header tells you how to interpret it. But CID not only describes how to get the resource, it also describes how to interpret it. Two different CID’s can be created for the same block’s multihash so that they are interpreted differently.

…

On Jun 25, 2018, at 9:55AM, Mitra Ardron ***@***.***> wrote: Why reinvent the wheel when there is an existing, extensible process for assigning them ? With higher level types, (image, video etc) ; splits off that, and when needed parameters to allow even more detail. Its not perfect for all situations, but its unlikely that any replacement would be perfect either, and it has the huge advantage that it integrates with other things - for example you can check what application your system wants to open the file in. If you invented your own system everyone would just have to carry around a big conversion table in their apps and figure out how to continuously update it to match a new hex type to the table. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#65 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACQw_TYAwp0C1tX8O8Kn14g7fePlreks5uARYdgaJpZM4U0ph9>.

Stebalien · 2018-06-26T23:00:58Z

Note: The "codec" on the CID just tells you how to interpret the binary data as a structured IPLD object. It should not be used as an MIME type.

vmx · 2018-06-27T07:34:28Z

@Stebalien Isn't there a huge overlap between MIME types and codecs? For me it makes sense to have a codec that tells me to interpret something like image/png.

mikeal · 2018-06-27T19:23:21Z

If you look at the existing list of multicodecs many of them already have registered mime types, so there's certainly overlap.

If the only codecs were for dag nodes and all edge nodes were raw then I could see the separation, but that just isn't the case right now, there are many registered codecs for edge nodes in formats that don't support links.

Here's a question that might shed some light on how to interpret this. If I'm building a fileserver on top of unixfs-v2 and the file name has an extension of .json but the CID has a codec of bson what do I set the content-type header to?

To me, someone clearly encoded the node into bson and just set the wrong file extension, so I would trust the CID's codec for interpretation.

I'll also note that projects like the IPLD graph viewer get much more interesting if we can signal mime-types for any edge node in a graph. It means that even the most abstract graphs people create that include images and other content can be interpreted and viewed much more easily.

Stebalien · 2018-06-27T20:18:06Z

Isn't there a huge overlap between MIME types and codecs?

...

If you look at the existing list of multicodecs many of them already have registered mime types, so there's certainly overlap.

...

If the only codecs were for dag nodes and all edge nodes were raw then I could see the separation, but that just isn't the case right now, there are many registered codecs for edge nodes in formats that don't support links.

Yes. However, those aren't all IPLD formats.

So, the issue here is twofold:

We don't want to tie binary representation to interpretation.
We don't want to have to create a new IPLD format every time someone implements a new filetype. With normal MIME types, I can talk about some data without actually understanding the MIME type. With IPLD formats, I can't even talk about the data. If you tell me to pin some IPLD DAG that has nodes in a format I don't understand, I literally can't pin it because I have no idea how to find/follow the internal links.

Really, we want a type system in addition to IPLD formats. However, IPLD formats are not a type system. The important difference is that, as long as a tool understands all the relevant IPLD formats, it can traverse/transform arbitrary IPLD DAGs even if it doesn't understand the types.

Our current plan is to:

Extract type information from existing IPLD datastructures.
Allow users to explicitly specify types in future IPLD datastructures (requires support from the format).

Aside: Yes, I know we have a GitRaw codec. IMO, we shouldn't. I don't know how that snuck in but that shouldn't be there. However, it is slightly useful because raw git objects are a bit special (they use the broken SHA1 hash and may be arbitrarily large).

vmx · 2018-06-28T09:48:10Z

Aside: Yes, I know we have a GitRaw codec. IMO, we shouldn't. I don't know how that snuck in but that shouldn't be there. However, it is slightly useful because raw git objects are a bit special (they use the broken SHA1 hash and may be arbitrarily large).

Do you mean there should be a codec for each Git Object type (commit, tag, tree)? If yes, why not changing it while we can?

Stebalien · 2018-07-02T22:42:09Z

Do you mean there should be a codec for each Git Object type (commit, tag, tree)? If yes, why not changing it while we can?

No, no, I'm just confused. I saw GitRaw and assumed that only applied to blobs. Turns out GitRaw just means "non-blob git object" and blobs are stored using the Raw codec. This is correct and as it should be.

Now, really, there probably is a large overlap. Most files have some logical internal structure that could be decoded as a structured IPLD object. However, we have to be careful about adding new formats too eagerly as, again, we need to add support for those formats to every implementation.

jchris · 2018-09-09T19:28:10Z

One cool aspect of mime in the browser world is content negotiation. I don’t see mime in the multiformats project, but I’m still finding my way around. At first glance IPLD seems like a reasonable place to empower user agents to pick content types, especially since different files with the same content might only be linked at the appplication layer otherwise. Sniffing is fine for serving files when only one format is available, but that path doesn’t lead toward robust content negotiation.

mikeal · 2018-09-10T16:46:17Z

At the CID/Block level we can't really negotiate the content because we can't change out the underlying data. If I have a dag-json node you could interpret the same data as json, dag-json (JSON with links), and raw (binary). But you couldn't ask for a different content type because it would end up being a different hash.

Also, keep in mind that a CID is rarely an entire file. Files are written as a metadata node (dag-json, dag-cbor, dag-pb) with a bunch of links to the chunks of binary data for the actual file data. If we wanted to enable some kind of content negotiation we would need to encode it at that layer.

The current format doesn't support it, but it might be worth creating an issue in the unixfs-v2 spec. ipld/legacy-unixfs-v2#2

You'd need something like:

{
  type: 'dir'
  data: {
     'filename;image/png': CID()
     'filename;image/svg': CID()
  }
}

Or, alternatively, you could just use file extensions and a naming convention to do multiple formats of resources in a single directory and then write logic on top in order to pick which one is supported by the client.

da2x · 2018-10-04T02:35:51Z

Media types (formerly MIME types) can contain more information than the suggested format allows.

type "/" [tree "."] subtype ["+" suffix] *[";" parameter]

Some examples:

text/plain; charset=utf-16 (UTF-16 encoded text)

application/atom+xml (XML structured Atom document)

text/csv; header (comma separated values; first line contains column headers)

jonnycrunch · 2019-03-04T14:48:05Z

just adding a link for the verifiable credentials discussion. w3c/vc-data-model#421

mikeal closed this as completed Aug 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting existing mime types. #65

Supporting existing mime types. #65

mikeal commented Jun 23, 2018

vmx commented Jun 25, 2018

mikeal commented Jun 25, 2018 via email

vmx commented Jun 25, 2018

mitra42 commented Jun 25, 2018

mikeal commented Jun 25, 2018 via email

Stebalien commented Jun 26, 2018

vmx commented Jun 27, 2018

mikeal commented Jun 27, 2018 •

edited

Loading

Stebalien commented Jun 27, 2018

vmx commented Jun 28, 2018

Stebalien commented Jul 2, 2018

jchris commented Sep 9, 2018

mikeal commented Sep 10, 2018

da2x commented Oct 4, 2018

jonnycrunch commented Mar 4, 2019

Supporting existing mime types. #65

Supporting existing mime types. #65

Comments

mikeal commented Jun 23, 2018

vmx commented Jun 25, 2018

mikeal commented Jun 25, 2018 via email

vmx commented Jun 25, 2018

mitra42 commented Jun 25, 2018

mikeal commented Jun 25, 2018 via email

Stebalien commented Jun 26, 2018

vmx commented Jun 27, 2018

mikeal commented Jun 27, 2018 • edited Loading

Stebalien commented Jun 27, 2018

vmx commented Jun 28, 2018

Stebalien commented Jul 2, 2018

jchris commented Sep 9, 2018

mikeal commented Sep 10, 2018

da2x commented Oct 4, 2018

jonnycrunch commented Mar 4, 2019

mikeal commented Jun 27, 2018 •

edited

Loading