-
Notifications
You must be signed in to change notification settings - Fork 110
Add Content-Disposition http header #945
Add Content-Disposition http header #945
Conversation
Add Content-Disposition http header
…sphere#527, ethersphere#944) Changed swarm/api.Upload: - when using WaitGroup no reason to use done counter. - f.Close() must be called in Defer - otherwise panic or future added early return will cause leak of file descriptors - use common DetectContentType method - one error was suppressed
…n fallback to extension (ethersphere#944)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution 👍 🍺
Please see my comments in the code.
awg := &sync.WaitGroup{} | ||
|
||
for i, entry := range list { | ||
if i >= dcnt+maxParallelFiles { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have removed important I/O throttling code here. either convert to a worker pattern or preserve as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! I will revert it.
cmd/swarm/upload.go
Outdated
@@ -119,7 +118,7 @@ func upload(ctx *cli.Context) { | |||
} | |||
defer f.Close() | |||
if mimeType == "" { | |||
mimeType = detectMimeType(file) | |||
mimeType = api.DetectContentType(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we really need to open a new file handle inside of api.DetectContentType
. at this point we already have f
as an open file handle. maybe we could use that instead. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found problem:
- If DetectContentType func will accept *File, then it must current preserve position in file
- Here we working with
*swarm.File
not*os.File
which has no Seek() method to save and restore position
It doesn't look possible to get content probe from *swarm.File
Maybe you have an idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, *swarm.File it's local file, then i can cast it to Seeker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also try to change the swarm.File
spec to be
Reader,
Closer,
Seeker
instead of just ReadCloser
. that might make your life easier. but you also might need to move the type around a bit because this will probably generate a circular dependency when you'll try to reference client
from api
:)
swarm/api/content_type.go
Outdated
"github.com/ethereum/go-ethereum/log" | ||
) | ||
|
||
// detect content type by file content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment should follow godoc style:
// DetectContentType by file content with fallback
// to file extension and "application/octet-stream"
swarm/api/content_type.go
Outdated
// otherwise by file extension | ||
// returns "application/octet-stream" in worst case | ||
func DetectContentType(file string) string { | ||
var contentType = "application/octet-stream" // default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract to const MIME_OCTET_STREAM = "application/octet-stream"
, then reuse across the code (DRY)
swarm/api/content_type.go
Outdated
|
||
f, err := os.Open(file) | ||
if err != nil { | ||
log.Warn("detectMimeType: can't open file", "file", file, "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Error
- err != nil
also - you're not returning on error (the rest of the code below will continue executing - resulting in an error). if you choose to have another handle opened here - please return an empty string in this code block - the function has to return here.
as mentioned above - i would remove this part and already use the open handle that is created in the enclosing call
swarm/api/content_type_test.go
Outdated
} | ||
|
||
func testDetectCTCreateFile(t *testing.T, fileName, content, expectedContentType string) { | ||
path := os.TempDir() + fileName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ioutil.TempFile
instead
swarm/api/content_type_test.go
Outdated
testDetectCTCreateFile(t, "file.css", "Lorem Ipsum", "text/css; charset=utf-8") | ||
} | ||
|
||
func testDetectCTCreateFile(t *testing.T, fileName, content, expectedContentType string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content
param not used - can be removed
swarm/api/http/server.go
Outdated
@@ -201,6 +201,7 @@ func (s *Server) HandleBzzGet(w http.ResponseWriter, r *http.Request) { | |||
defer reader.Close() | |||
|
|||
w.Header().Set("Content-Type", "application/x-tar") | |||
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s.tar\"", path.Base(r.URL.Path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two issues with this:
- do not use r.URL as it will contain the entire requested URI e.g.
/bzz:/somename.eth/something
etc. please use theuri
variable above which comes from the middleware that parses the bzz uri. it also has aPath
member which you can use to get to the path inside the bzz uri. - there could be a situation where you would be downloading a tarball but the filename might be empty. in that case, you should fallback to the
Addr
fromuri
(oruntitled
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about case:
- when i upload 1 file, then download it by hash only (without file name in url)
Is there known way to get file name? (As i understand - only way is manifest retrieve)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if its one file - if there's an enclosing manifest, i think there should be a filename there too (need to double check. in any case - there should be - let's assume that)
swarm/api/http/server.go
Outdated
@@ -862,6 +881,10 @@ func (s *Server) HandleGetFile(w http.ResponseWriter, r *http.Request) { | |||
return | |||
} | |||
|
|||
if contentType == "" { | |||
contentType = "application/octet-stream" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
swarm/api/http/server.go
Outdated
@@ -891,6 +914,7 @@ func (s *Server) HandleGetFile(w http.ResponseWriter, r *http.Request) { | |||
} | |||
|
|||
w.Header().Set("Content-Type", contentType) | |||
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", path.Base(r.URL.Path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please amend this - use uri
from context instead of request URI. also - provide the filename fallback
…eReview (ethersphere#527, ethersphere#944) Fixes for ethersphere#945 - revert back channel which handle I/O throttling - add tests on real files - add tests when file extension doesn't match content - move code to existing files, no real reason to have dedicated file for such functions - move http headers validation to swarm/api/http package - swarm.Open now checking content of file to detect Content Type
Fixed all comments, but I need carefully revisit code about " I/O throttling" tomorrow. Found edge case: |
hmmm this is starting to suck. we should probably provide an override for all web types because browsers need to have them correctly served, or rather, just have it the way you originally implemented (first file extension, then content based, then fallback to |
I will look around to existing webservers tonight. For example Nginx has mime configuration file, default is: https://github.com/nginx/nginx/blob/master/conf/mime.types For deeper content check https://github.com/h2non/filetype may help, but it looks overkill for me. |
Here are my thoughts. About content sniff:
About extension-based checks:
|
We don't use it. When fetching content from Swarm - we usually use
Ignore? this is a response header.
Definitely, I also think that adding tests for utf8 content (filenames included) is a must. I'll open an issue for that :)
I completely agree. The list from nginx looks pretty exhaustive for this point in time. |
…phere#944) - add hardcoded mime types list - follow stdlib way of ContentType checking
Ok. Added. Not solved parts yet:
|
This is due to another problem and is outside the scope of this issue. I'm already have this issue at mind and it will be taken care of as part of ongoing efforts to refactor and cleanup swarm manifests. |
swarm/api/api.go
Outdated
// MimeOctetStream default value of http Content-Type header | ||
const MimeOctetStream = "application/octet-stream" | ||
|
||
var once sync.Once // guards initMime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed - use init()
instead
swarm/api/api.go
Outdated
".avi": "video/x-msvideo", | ||
} | ||
|
||
func initMime() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to init()
swarm/api/api.go
Outdated
|
||
// DetectContentType by file content, or fallback to file extension | ||
func DetectContentType(f *os.File) (string, error) { | ||
once.Do(initMime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swarm/api/api_test.go
Outdated
testDetectContentType(t, "file-no-extension", "Lorem Ipsum", "text/plain; charset=utf-8") | ||
testDetectContentType(t, "file-no-extension-no-content", "", "text/plain; charset=utf-8") | ||
|
||
// TODO: research, define behavior when content doesn't mach file extension. Now just rely on extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
swarm/api/filesystem.go
Outdated
return | ||
} | ||
|
||
if err := wait(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have this wait()
before the mime detection happens, as in the original code
swarm/api/http/server.go
Outdated
@@ -891,6 +914,7 @@ func (s *Server) HandleGetFile(w http.ResponseWriter, r *http.Request) { | |||
} | |||
|
|||
w.Header().Set("Content-Type", contentType) | |||
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", path.Base(r.URL.Path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please amend this - use uri
from context instead of request URI. also - provide the filename fallback
swarm/api/http/server.go
Outdated
@@ -942,3 +972,16 @@ func (lrw *loggingResponseWriter) WriteHeader(code int) { | |||
func isDecryptError(err error) bool { | |||
return strings.Contains(err.Error(), api.ErrDecrypt.Error()) | |||
} | |||
|
|||
func ValidateContentTypeHeader(r *http.Request) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmars what is your input on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what the purpose of this code is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validating incoming mime type on upload from header, to make sure that the input we're getting is correct before storing it in a manifest - just to make sure we are on par with IETF RFCs
swarm/api/server.go
Outdated
@@ -0,0 +1 @@ | |||
package api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove file
…where possible (ethersphere#527, ethersphere#944) - simplified I/O throttling by semaphore primitive - don't use URL from request, use parsed Swarm uri for file name detection
Fixed all comments. Question: do i need to delete force of "application/octet-stream" in |
Great work on this! :) |
Thank you for on-boarding. |
swarm/api/api.go
Outdated
// MimeOctetStream default value of http Content-Type header | ||
const MimeOctetStream = "application/octet-stream" | ||
|
||
// builtinTypesLower stores copy of https://github.com/nginx/nginx/blob/master/conf/mime.types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally feel this mime.types
file should be vendored (or just copied into a Go file) to make it easy to keep in-sync with the upstream file.
I also think it should be clear why this is here (if I saw this I'd probably just think it was unnecessary because the Go mime package does the right thing in pretty much all cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By vendoring of mime.types file - you mean write some go generate
command to parse it and gen code? Same approach as here https://github.com/bradrydzewski/go-mimetype (it based on mailcap instead of nginx, but there is same mime file) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that, yes, which would make it much easier to keep in sync, and more obvious where the types come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swarm/api/api.go
Outdated
|
||
func init() { | ||
for ext, t := range builtinTypesLower { | ||
mime.AddExtensionType(ext, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should panic if there is an error (in case some one adds an extension or mime type that doesn't parse)
swarm/api/api.go
Outdated
} | ||
} | ||
|
||
// DetectContentType by file content, or fallback to file extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not consistent with what the method actually does.
swarm/api/api_test.go
Outdated
if err != nil { | ||
t.Fatal(err) | ||
} | ||
defer os.Remove(f.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid the need for a temp file by modifying DetectContentType
to take (filename string, data io.ReadSeeker)
and just use a bytes.Reader
in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, it will simplify everything
swarm/api/api_test.go
Outdated
|
||
func TestDetectContentType(t *testing.T) { | ||
// internally use http.DetectContentType, so here are test cases only about fallback to file extension check | ||
testDetectContentType(t, "file.css", "body {background-color: orange}", "text/css; charset=utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use t.Run
to isolate each test.
swarm/api/http/server.go
Outdated
if found := path.Base(uri.Path); found != "" && found != "." && found != "/" { | ||
fileName = found | ||
} | ||
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", fileName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we want to make all files attachments? If I go to bzz://mysite.eth/index.html
in my browser, I expect to see the HTML in the browser, not download it to a local file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, how much you right. Right now i will change it to:
Content-Disposition: inline; filename="example.html"
But i will rise question in Gitter - maybe exist some cases when attachment
make sense, for example in S3 owner can add any headers to it's file/folder by UI, and Content-Disposition: attachment
too.
swarm/api/http/server.go
Outdated
@@ -942,3 +972,16 @@ func (lrw *loggingResponseWriter) WriteHeader(code int) { | |||
func isDecryptError(err error) bool { | |||
return strings.Contains(err.Error(), api.ErrDecrypt.Error()) | |||
} | |||
|
|||
func ValidateContentTypeHeader(r *http.Request) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what the purpose of this code is?
7c3c4c1
to
b632a2b
Compare
…on only since go1.11, a bit of copy-paste (ethersphere#527, ethersphere#944)
2 more changes:
|
…on only since go1.11, a bit of copy-paste (ethersphere#527, ethersphere#944) - added text/markdown content type
d3e2886
to
afb750a
Compare
@lmars about |
…ion from attachment to inline (ethersphere#527, ethersphere#944) - If I go to bzz://mysite.eth/index.html in my browser, I expect to see the HTML in the browser, not download it to a local file.
08bc172
to
1e87e3d
Compare
@nizsheanez @justelad status? shall we move this to main repo? |
@zelig let's first resolve all issues here then squash+reopen on upstream |
Status: fixed all CR comments except one:
|
swarm/api/http/server_test.go
Outdated
t.Fatalf("Content-Type header expected: application/x-tar, got: %s", h) | ||
} | ||
|
||
// See header to suggest file name for Browsers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
if found := path.Base(uri.Path); found != "" && found != "." && found != "/" { | ||
fileName = found | ||
} | ||
w.Header().Set("Content-Disposition", fmt.Sprintf("inline; filename=\"%s.tar\"", fileName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am wondering - maybe in this case we should actually send an attachment
header. @lmars?
if found := path.Base(uri.Path); found != "" && found != "." && found != "/" { | ||
fileName = found | ||
} | ||
w.Header().Set("Content-Disposition", fmt.Sprintf("inline; filename=\"%s\"", fileName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmars I think this should be:
if contentType == 'application/*' - return attachment
else return inline.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worry about "application/pdf" and "application/rss+xml", I'm personally fine if .pdf opening in browser (re-checked, browsers do download .pdf file with Content-Disposition: attachment; file="name.pdf"
header, and only Safari on my Mac auto-open it in Preview).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've seen the rss issue too but i think its a non-issue as its a machine readable format and is not supposed to be read by the user as-is (AFAIK)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my MacBook:
With inline
:
- Firefox: has Built-in RSS reader. it displays RSS in readable way and has "subscribe" button.
- Safari: forward link to installed RSS Reader app.
With attachment
:
- Firefox: download file
- Safari: forward link to installed RSS Reader app
Looks like we breaking some UX here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - keep it as it is for now (inline
it)
Added generator based on mime.types file as lmars recommended. |
// mime.types file provided by mailcap, which follow https://www.iana.org/assignments/media-types/media-types.xhtml | ||
// | ||
// Get last version of mime.types file by: | ||
// docker run --rm -v $(pwd):/tmp alpine:edge /bin/sh -c "apk add -U mailcap; mv /etc/mime.types /tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now seems to do exactly what the package you linked to does (i.e. https://github.com/bradrydzewski/go-mimetype), should we not just use that package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That package looks unmaintained, but let me try to rise PR there, here is list of differences:
go-mimetype
doesn't fail ifmime.AddExtensionType
returns error- outdated mime.types file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't mean for you to go back and forth on this. I assumed you didn't want to use that package because you wanted to use nginx mime types, but if we want to use the same method as that package then we should try and PR upstream (the maintainer looks active on other projects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, just that mime.types file easier to parse. File from nginx has nginx.config format you can see there are some bracers: https://github.com/nginx/nginx/blob/master/conf/mime.types
No problem, already created PR: bradrydzewski/go-mimetype#1
Anyway we still discussing about Content-Disposition
…ator added, based on mailcap mime.types file (ethersphere#527, ethersphere#944)
e89f0ef
to
5c00804
Compare
@nizsheanez @justelad it seems this PR is covering two things:
It seems we've got to a good solution for improving What is driving the decision to add this |
@lmars this solves the problem that the browser does not always know under which filename to save swarm content and this is rather annoying. |
I agree that Adding @justelad , now I need to do squash and open new PR with good description? |
@nizsheanez many thanks for this comprehensive PR! Please squash all commits and reopen this PR against |
For #527