Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need more support mime-types. #2164

Open
ghost opened this issue Jan 5, 2016 · 25 comments
Open

Need more support mime-types. #2164

ghost opened this issue Jan 5, 2016 · 25 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/community-input Needs input from the wider community topic/gateway Topic gateway

Comments

@ghost
Copy link

ghost commented Jan 5, 2016

IPFS might well be appropriate for publication in its whole of HTML-pages. But there is a problem. MHTML is not possible to place because IPFS is not able to give this format with mime-type: message/rfc822.

For example: http://gateway.ipfs.io/ipfs/QmfHtsEyXGdJm6Yo4frKLdKyDKT5G6ubECfVLnvkVUkscM

Is there any solution to this problem?

Variant with download and then view is not necessary, as for easy operation requires opening documents directly in the browser.

@whyrusleeping
Copy link
Member

we implemented some code to make this work, i'll try and hack at something and report back. This may be a fairly simple fix

@whyrusleeping
Copy link
Member

So, the mime type detection used by go-ipfs gateway is from the go standard library here: https://golang.org/src/net/http/sniff.go

@rht
Copy link
Contributor

rht commented Jan 21, 2016

http.DetectContentType is already used when http.ServeContent is invoked.

#2230 does this explicitly, however, and includes charset detection through https://golang.org/x/net/html/charset.

@jbenet
Copy link
Member

jbenet commented Jan 22, 2016

If we save the MIME type in the unixfs obect (file or metadata), we could set it explicitly instead of just relying on the mime sniffer.

@rht
Copy link
Contributor

rht commented Jan 22, 2016

The mime detect/sniffing is still required, but moved to the earlier part of the chain.
By explicit, where should it be put in the object? Does this require ipld?

The file API does have a type attribute. Is this equivalent?

@jbenet
Copy link
Member

jbenet commented Jan 22, 2016

unixfs in IPLD may change. see
https://github.com/ipfs/ipld-examples/blob/master/unixfs the MIME type can
go in the file there.

in the current formats the MIME type maybe should go in a Metadata object,
but that may be more annoying right now than useful.

On Thu, Jan 21, 2016 at 11:24 PM rht [email protected] wrote:

The mime detect/sniffing is still required, but moved earlier to the chain.
By explicit, where should it be put in the object? Does this require ipld?

The file API https://www.w3.org/TR/FileAPI/ does have a type attribute.
Is this equivalent?


Reply to this email directly or view it on GitHub
#2164 (comment).

@jefft0
Copy link
Contributor

jefft0 commented Feb 10, 2016

Hello. This is my first posting on the project, but I need to jump in. IPFS is attractive because the multihash of an important file (like a measurements file supporting a scientific paper) only depends on the data, not on arbitrary decisions by the publisher like filename or original location. Two different publishers holding the same data file should compute the same multihash. If you insert the mime type in the object itself, this is also an arbitrary decision by the publisher. For example, is the measurements file text/plain, text/csv or application/ms-excel? I urge to keep metadata like mime type external to the data itself so that a multihash link is just about the data.

@em-ly em-ly added kind/enhancement A net-new feature or improvement to an existing feature need/community-input Needs input from the wider community labels Aug 25, 2016
@singpolyma
Copy link

I agree that having the hash of the same data (regardless of mime type or file name) be the same is important, but also having an associated MIME type available using a different hash is important so that /ipfs/* links can be directly used by MIME-aware applications (including browsers)

@davux
Copy link

davux commented Mar 8, 2018

@jefft0,

Two different publishers holding the same data file should compute the same multihash.

I'd like to (kindly) challenge that point. The scenario you're talking about here is:

  • Publisher A has a data file locally.
  • Publisher B has the same data file somewhere else.
  • Both save the file in IPFS but indicate a different MIME type.
  • For some reason:
    • it is important that the two hashes be the same
    • it is not important that the MIME types be different.
    • A needs to re-get the file by the URL provided by B

Can you illustrate that situation with a real-life example?

Generally, I don't think it's a big deal that a file has two different representations. It might slightly affect the overall efficiency of the network, but then we should also be worried about empty newlines at the end of text files, etc.

At any rate, if content integrity does matter but metadata doesn't, then the users can always throw away the metadata and compute a hash of the content itself, it doesn't have to be done through multihashes. (Following the newline analogy, if two users want to make sure two text files are identical and trailing newlines don't matter, they can trim() and then compute a hash of the result.)

@justinmchase
Copy link

justinmchase commented May 15, 2018

I agree with @davux , if the two publishers both have the same file they would be likely to use the same MIME type when publishing. Especially if the production of MIME type was automatic.

If they didn't have the same MIME type for some reason, then having two different hashes would seem like a reasonable thing to do. No?

The other side of the problem is that people who are trying to receive the published document use different MIME type sniffing logic and one gets it wrong. Who's more likely to get it wrong, or who should have the onus of getting the MIME type right, the publisher or the reader?

@singpolyma
Copy link

It seems we're headed towards raw leaves as the default, which seems good. So there are two places the mime could go: in the directory (where filename already is) or in an intermediate IPLD object that has metadata like mime and a pointer to the leaf

@justinmchase
Copy link

That sounds good, if there is a compromise where the data is in somewhere rather than inferred but the data in the leaves can also be raw, seems like everyone is happy then.

@ivan386
Copy link
Contributor

ivan386 commented Aug 4, 2019

I try to use xhtml as index.html. But gateway return wrong mime-type for that. UnixFS have MetaData field and it have MimeType filed.

I handmade test identity link with metadata but get content-type: text/plain; charset=utf-8 instead of content-type: application/xhtml+xml that in metadata block.

I try to use raw block as link in Metadata block but get error: expected protobuf dag node

@lidel
Copy link
Member

lidel commented Aug 12, 2019

I don't think Metadata from unixfsv1 is used in this case.
AFAIK Gateway exposed by go-ipfs does mime-sniffing via net/http/sniff.go

Hardcoding explicit content-type will be possible with unixfsv2 (more details in ipld/legacy-unixfs-v2#11).

@ivan386
Copy link
Contributor

ivan386 commented Aug 12, 2019

@lidel Gateway do not get MimeType field from metadata block. Gateway must use it if it set.

@lidel
Copy link
Member

lidel commented Aug 12, 2019

Correct, we need (and is missing right now):

  1. ability to set content type in metadata (eg. during ipfs add)
  2. make Gateway code use it, if present

@lidel
Copy link
Member

lidel commented Sep 4, 2019

Summarized potential solutions in ipfs/in-web-browsers#152
One is to embedd content-type in DAG metadata, another is specific to HTTP Gateway and proposes content-type override via drop-in config files similar to .gitattributes.

@lanzafame lanzafame added the topic/gateway Topic gateway label Sep 10, 2019
@kevincox
Copy link

kevincox commented Mar 9, 2020

One cheep solution is allowing extensions to be used in the gateway. This way we don't have to rely on file sniffing, but only an extension -> mime mapping (like apache and nginx have been doing for years).

Example would be https://ipfs.io/ipfs/bafkreiajjehupljsltknxzdcenhrmcarjuagybwx7ht2bcyyon6s3ayn2m.css is served as text/css. This isn't as nice as the mime type traveling with the data but is very quick to implement and doesn't change the network, just the gateway.

@Stebalien
Copy link
Member

@kevincox
Copy link

kevincox commented Mar 9, 2020

For CSS that works fairly well but for things like SVG it annoyingly sets the Content-Disposition header.

@Stebalien
Copy link
Member

Hm. Yeah, it's designed for specifying the download name. Is it not possible to wrap your file in a directory?

@ivan386
Copy link
Contributor

ivan386 commented Mar 9, 2020

@lidel Need first make Gateway code use MimeType from metadata block, if present. And then make ability to set content type in metadata.

@kevincox
Copy link

It's possible to wrap in in a directory but I would prefer not to do it because:

  1. I now need to give it a name which feels wrong in a content-addressed storage system.
  2. Now the client needs to fetch the directory before the file which is added latency.

@Stebalien
Copy link
Member

For now, you can use inline CIDs for that (not ideal, but a workaround):

https://ipfs.io/ipfs/bafyaanysgefciakvciqassipi6wtexgu3psgei2pcyebctianqdnp6phucfrq435fwbq3uysavtc4y3tommnnzacbibaqai/f.css

> mkdir toadd
> mv myStylesheet.css toadd/f.css
> ipfs add -q -r --raw-leaves --cid-version=1 --inline --inline-limit=64 toadd
bafyaanysgefciakvciqassipi6wtexgu3psgei2pcyebctianqdnp6phucfrq435fwbq3uysavtc4y3tommnnzacbibaqai

@ivan386
Copy link
Contributor

ivan386 commented Mar 10, 2020

This will be shorter:

ipfs add --cid-base base64url -Q -w --raw-leaves --cid-version 1 --inline --inline-limit 64 --stdin-name .css < myStylesheet.css

https://ipfs.io/ipfs/uAXAANhIwCiQBVRIgCUkPR60yXNTb5GIjTxYIEU0AbAbX-eegixhzfS2DDdMSBC5jc3MY1uQCCgIIAQ/.css

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/community-input Needs input from the wider community topic/gateway Topic gateway
Projects
None yet
Development

No branches or pull requests