Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big file upload #133

Closed
lacravate opened this issue Mar 5, 2018 · 21 comments
Closed

Big file upload #133

lacravate opened this issue Mar 5, 2018 · 21 comments
Assignees

Comments

@lacravate
Copy link

Hi,

i tried my luck with the other issues, sorry if this is a duplicate.

Is it possible to upload files without loading the entire content into memory ?

Cheers !

@jeevatkm jeevatkm self-assigned this Mar 5, 2018
@jeevatkm
Copy link
Member

jeevatkm commented Mar 5, 2018

@lacravate
Copy link
Author

@jeevatkm

Thanks a lot for the very fast answer. I should have said i am close to an absolute beginner in go (but not at all in development).

I tried SetFileReader, with an io.Pipe() though, before asking, but didn't get the expected result.

I suppose your answer tells me i muste persevere...

Thanks again, cheers.

@jeevatkm
Copy link
Member

jeevatkm commented Mar 5, 2018

@lacravate You're welcome. I happened to be online. Will be heading to bed soon.

Please let me know how it goes. I will check this thread tomorrow.

@lacravate
Copy link
Author

lacravate commented Mar 5, 2018

@jeevatkm

Good morning.
Thanks for your time.

i tried this (dead simple) :

package main

import (
    "os"
    "gopkg.in/resty.v1"
)

func main() {
    f, err := os.Open("my_movie.mp4")
    if err != nil { panic(err) }
    defer f.Close()
    rest := resty.R().SetFileReader("paramName", "my_movie.mp4", f)

    resp, err := rest.Post("http://myhost:4001/my_endpoint")
    if resp.StatusCode() != 200 { panic(err) }
}

--

file name, params, endpoints, obviously changed but it compiles and uploads the file.
But the given file is ~ 2.5 GB, and RAM goes crazy when sending the request, as high as three times the file size.

Out of luck, and out of my league. For the time being.

Cheers !

@jeevatkm
Copy link
Member

jeevatkm commented Mar 5, 2018

@lacravate Thank you, sorry to hear that. Let me check the request flow and get back to you.

@lacravate
Copy link
Author

@jeevatkm

Thanks again. Most sincerely. In that kind of circumstances, i am generally the one at fault. But even that would be a plus to know : i need something to be sure of, a firm ground, to mend my code, even if that sure is thing is that i s... at Go for the moment.

Cheers !

@jeevatkm
Copy link
Member

jeevatkm commented Mar 6, 2018

@lacravate I had a look, there is buffer in-between before the actual request is fired.

I have questions for you-

  • File upload via multipart/form-data? OR
  • File upload via request body?

For a first one - Multipart data - Parts and boundary values are involved. Its bit complex for bringing generalized solution in resty. Technically feasible, I need more time for implementation.

For a second one, I can quickly add support for SetBody(io.Reader) scenario without intermediate buffer, large file won't be an issue - The downsides is (quickly came to my mind) resty won't set Content-Length header - I believe it doesn't concern your use case.

@lacravate
Copy link
Author

lacravate commented Mar 6, 2018

@jeevatkm

Here's what i have been up to.

Along with trying to have a bottom-line code that's working, i wanted to make sure the problem was coming from Go/Resty/Net-http.

With my somewhat simple case, we are nevertheless at some edges of common use cases, so there was a chance the problem was coming from my server code.
So i got rid of it to put up a netcat in lsitening mode and, with resty, as much as with Go net/http, using the following code the problem occurs.

package main

import (
  "bytes"
  "fmt"
  "io"
  "mime/multipart"
  "net/http"
  "os"
  "path/filepath"
)

func main() {
  path := "my_file.mp4"
  file, err := os.Open(path)
  if err != nil { panic(err) }

  defer file.Close()

  body := &bytes.Buffer{}

  writer := multipart.NewWriter(body)
  part, _ := writer.CreateFormFile("colligere[profile]", filepath.Base(path))

  var n int64
  n = 1024
  for n == 1024 {
    n, _ = io.CopyN(part, file, 1024)
  }

  writer.Close()

  req, err := http.NewRequest("POST", "http://myhost/my_endpoint", body)
  if err != nil { panic(err) }

  client := &http.Client{}
  client.Do(req)
}

Even CopyN doesn't do it (provided i used it properly). Memory consumption is three times the file size (so i would guess there other hidden buffers on the way), to a nice 7 GB RAM consumption in my case.

So, despite my strong preference for am multipart solution, it's a lost cause, unless you implement a workaround of the Go net/http library. Doable, sure, but...

So if you can come up with a handy, simple solution, with the file in the request body, i'll be happy to continue to use Resty; get this vital yet entirely technical and not functionnal phase of my workflow done. I'll perhaps have to adapt server code (much easier in comparison), and manually set the content length with one line of code and a file stat.

Sounds a good deal to me, if you're willing to set yourself to the task.

Thanks again, for your time and for witnessing and assisting my first, very hard, steps at Go.

Cheers !

@lacravate
Copy link
Author

well, that's probably mime/multipart that would have to be patched/worked around, but you got my point...

Cheers !

@jeevatkm
Copy link
Member

jeevatkm commented Mar 6, 2018

@lacravate Yeah, I got your point. Typically its best to use request body for large file upload. Multipart approach is overhead for large file upload.

I will be implementing request body support (as I described in the second point), once done I will ping you then you can give it a try.

@jeevatkm
Copy link
Member

jeevatkm commented Mar 6, 2018

@lacravate I have implemented it on branch large-file-support-on-body. This branch contains buffer less request body writing.

I believe you're using prod version gopkg.in/resty.v1, follow below step to get the branch code:

cd $GOPATH/src/gopkg.in/resty.v1
git pull && git checkout large-file-support-on-body

Sample Code:

package main

import (
	"fmt"
	"log"
	"os"

	"gopkg.in/resty.v1"
)

func main() {
	f, err := os.Open("my_movie.mp4")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	// get the file info
	fileInfo, err := f.Stat()
	if err != nil {
		log.Fatal(err)
	}

	resp, err := resty.R().
		SetBody(f).
		SetHeader("Content-Type", "video/mp4").
		SetHeader("Content-Length", fmt.Sprintf("%v", fileInfo.Size())).
		Post("http://myhost:4001/my_endpoint")

	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("Response Code:", resp.StatusCode())
}

Please give it try and let me know.

@jeevatkm jeevatkm added this to the v1.4 Milestone milestone Mar 6, 2018
@lacravate
Copy link
Author

@jeevatkm

Neat, thanks. I'm on another subject right now for the next few hiours. But i will sure test this today, and sure report it to you right after.

Cheers !

@lacravate
Copy link
Author

@jeevatkm

sorry for the mess in my answers, but i have to leave now, i had the time to launch a very limited test and contrary to my first attempt it looks fine, no extra memory consumption.

I'll get back at it in a few hours and let you know...

Cheers !

@lacravate
Copy link
Author

@jeevatkm

i did not come right back at you because the results of my tests are mixed up, and i am not sure what to make out of them.

I giive you what i have, and am ready to answer for precision if need be.

Context :

  • client : go Resty

  • server : ruby sinatra

  • backend server tested : webrick, puma, thin

  • webrick : just worked, correct file size, entered application code

  • thin : never worked, gets a 500, server indicates a too long header, never started the upload

  • puma : never worked with resty, upload was almost complete, but never finished, hence it never got to application code

Side notes :

  • all solutions tested with ruby rest-client performing the same upload, ended up with conclusive results
  • i didn't stick to webrick solution because the memory gained by your work is lost to the (faulty) server
  • your implementation on the point of view of the feature of avoiding memory consumption has always been beyond reproach in all the tests
  • i managed with puma to reach application code with using another copy strategy (IIRC with io Pipe in a go func(), but even then there was the mention of malformed http request)

Conclusion (possibly) :

  • while server side code is not your business
  • problems with the transmitted header possibly is
    (the manual setting of the body may need to be wrapped in some code that ensures header is properly set and closed)
  • not fulfilling the upload may be a problem of the sample code, but if you want resty to remain "cool and easy" you possibly need to wrap the use of the passed io reader interface in some code that makes sure it is propoerly drained

HTH,
thanks again for your time,
i will find a solution, resty or not, and will tell you of it, letting you if it is useful to you,
cheers.

@jeevatkm
Copy link
Member

jeevatkm commented Mar 8, 2018

@lacravate Thanks for getting back.

One thing I can assure, the branch I gave you does zero buffer on sending request body.

Let's get into facts:

  • The sample code sends a file on request body not a multipart request. Server implementation should handle that (as a file data on the request body since its not a multipart)
  • Sample code sends only one header called Content-Type
  • As for as io.Reader concern its caller responsibility to close/release it. For example: as you can see on sample code defer f.Close() line the releases the file resource.
  • Typically on large file upload, server endpoint which handles an upload should higher read timeout.

all solutions tested with ruby rest-client performing the same upload, ended up with conclusive results

What do you mean? Resty does the same behavior.

As always I'm glad to improve resty. Please let me know your inputs.

@lacravate
Copy link
Author

@jeevatkm

i may come back with much more matter than now, but you're right : for the moment, facts.

So in facts, is there a way to manually specify the content-length ?

I tried to set the header, with no luck, i tried other stuff, to no avail. Is there a simple way, in my case, meaning a body specified otherwise than a buffer in bodyBuf, to explicitly set the content length ?

Cheers !

@jeevatkm
Copy link
Member

jeevatkm commented Mar 8, 2018

@lacravate Yes you can set manually. I have updated the sample code for header Content-Length.

@lacravate
Copy link
Author

@jeevatkm

i am pretty sure i tried that one.

Here is the header and first line(s) of data gotten with a copy/paste of sample code :

POST /princeps/colligeres.json HTTP/1.1
Host: dev.aude_sophro:4001
User-Agent: go-resty v1.4-edge - https://github.com/go-resty/resty
Transfer-Encoding: chunked
Accept: video/mp4
Content-Type: video/mp4
Accept-Encoding: gzip

1

8000
ftypmp42\x00\x00\x00\x00isommp42\x9Do\xA2\xFBmdat!\x11E\x00\x14P\x01F\xFF\xF1\n

If you have any other way, to get to that result...

Thanks again, and anyway.

Cheers !

@lacravate
Copy link
Author

@jeevatkm

don't bother with my last question, i found a way. If there's any question left, if ts in middleware.go code, where you write stuff to enforce Content-Length while i think net/http overrides this entirely. This part, according to my understanding (yet to be bettered), is void code.

I'll come back later, if i get a finalized solution and tell you of it. Maybe that'll be of use to you.

Cheers !

@jeevatkm
Copy link
Member

jeevatkm commented Mar 9, 2018

@lacravate Okay, Thank you.

@jeevatkm
Copy link
Member

jeevatkm commented Mar 9, 2018

Closing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants