Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload hangs for large zip file (>= ~1 GB) #3634

Closed
tdilauro opened this issue Feb 17, 2017 · 13 comments
Closed

File upload hangs for large zip file (>= ~1 GB) #3634

tdilauro opened this issue Feb 17, 2017 · 13 comments
Labels

Comments

@tdilauro
Copy link
Contributor

To enable upload of an intact target zip file, we encapsulate it in another zip file (the containing zip). So we are uploading a zip (target) within a zip (containing. When uploading a large ( ~1 GB or larger) one, the UI seems to hang with the progress bar at complete (behavior similar to that described in #2643, now consolidated under #2482 ). The unencapsulated target zip ends up in the
${dataverse.files.directory}/temp directory, but the upload seems not to complete and the target zip never appears in the uploaded files box below, so the files cannot be "saved". No error is reported in the UI.

This error was observed in Dataverse 4.6, but may occur in earlier releases.

upload hang

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2017

@ tdilauro what if you upload via the SWORD API instead? Any difference? Sorry to hear about your trouble.

@tdilauro
Copy link
Contributor Author

@pdurbin I tried it with SWORD and got an internal server error. I have replaced the filename and DOI in the command below. $DV01_DVA contains the dataverseAdmin API key. I haven't used the SWORD API before, so please let me know if I'm "doing it wrong."

Command

$ curl --insecure -u $DV01_DVA: --data-binary @filename_double.zip -H "Content-Disposition: filename=filename.zip" -H "Content-Type: application/zip" -H "Packaging: http://purl.org/net/sword/package/SimpleZip" https://archive.data.jhu.edu/dvn/api/data-deposit/v1.1/swordv2/edit-media/study/doi:10.xxxx/xx/XXXXXX

When the command completed, there was a "SWORD-{uuid}" file in the ${dataverse.files.directory}/sword directory. Its filesize matched that of the file referenced above.

Response

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
 root@localhost to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

Corresponding log messages

NB: The command ran from 17:32:03 until 17:33:35. The second entry might not be associated with this transaction, as it is logged at 17:33:50, 15 seconds after the curl command ended.

[#|2017-02-20T17:33:05.133-0500|SEVERE|glassfish 4.1|edu.harvard.hul.ois.jhove|_ThreadID=50;_ThreadName=jkconnector(4);_TimeMillis=1487629985133;_LevelValue=1000;|
  Testing SEVERE level|#]

[#|2017-02-20T17:33:50.279-0500|SEVERE|glassfish 4.1|edu.harvard.hul.ois.jhove|_ThreadID=50;_ThreadName=jkconnector(4);_TimeMillis=1487630030279;_LevelValue=1000;|
  Testing SEVERE level|#]

Any suggestions on configuration/logging changes to get a better idea of what's going on?

@djbrooke
Copy link
Contributor

@tdilauro - thanks for sending on the test file. I'll try it on this side and let you know what happens.

@tdilauro
Copy link
Contributor Author

@djbrooke, @pdurbin: An update. In spite of the error I receive above during the SWORD upload, the extracted zip file eventually appeared as a DataFile and the uploaded doubly zipped file disappeared from the SWORD directory.

I'm gonna try to upload the other large files for this dataset. I'll report back.

@tdilauro
Copy link
Contributor Author

tdilauro commented Feb 22, 2017

@djbrooke, @pdurbin: I ran a SWORD upload for the last three problem files. The response document is still the status 500 Internal Server Error page, but the container zip files land in the {datafiles}/sword directory and their contents eventually get converted into DataFiles and appear in the file inventory for the draft dataset.

Here's the output of the run with a few things redacted and a duplicate error responses summarized. We are out of the woods for now, but there's still the issue of of the 500 response for SWORD and the hanging UI.

$ date; time ~/bin/dv-sw-upload --api-key "$DV01_DVA" --pid 'doi:10.xxx/xx/XXXXXX' 6-[345]*double.zip ; date
Tue Feb 21 19:55:14 EST 2017
The following datafiles will be processed: 
- 6-3_Data-changing_relax_T170_res_test65-80_double.zip 
- 6-4_Data-changing_relax_T170_res_test70-75_double.zip 
- 6-5_Data-changing_relax_T170_res_test75-80_double.zip

--- SWORD upload - file: '6-3_Data-changing_relax_T170_res_test65-80_double.zip' ...  ...done. ---
--- Result ---
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
 root@localhost to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

--- SWORD upload - file: '6-4_Data-changing_relax_T170_res_test70-75_double.zip' ...  ...done. ---
--- Result ---
... 500 Internal Server Error document, as above ...
 
--- SWORD upload - file: '6-5_Data-changing_relax_T170_res_test75-80_double.zip' ...  ...done. ---
--- Result ---
... 500 Internal Server Error document, as above ...

real    6m47.415s
user    0m47.530s
sys     0m9.750s
Tue Feb 21 20:02:01 EST 2017

@djbrooke
Copy link
Contributor

Thanks @tdilauro.

I just checked on the file that I attempted to upload through the UI, and it doesn't appear that it was successful.

Let me know if you had more success with SWORD.

@djbrooke
Copy link
Contributor

@tdilauro I think we were typing at the same time! That's not a great experience. I wonder if the work done in #1612 will provide a better experience for non-SWORD use.

@kcondon
Copy link
Contributor

kcondon commented Feb 23, 2017

Also see #3645 where uploading large FITS files also hang at the end of the upload meter. Closing that as related but contains specific examples and a couple other minor issues.

@pdurbin
Copy link
Member

pdurbin commented Mar 7, 2017

@tdilauro out of curiosity, what if you had a workaround where you manually placed large file in question on disk using scp or some other means and then ran an API endpoint to tell Dataverse to read the file and enter it into the Dataverse database? Any interest in this or is it too much of a hack? I'm only bringing this up because I think this endpoint was included in #3497 which was recently merged. We talked about leaving it in the branch, at least. 😄

@tdilauro
Copy link
Contributor Author

tdilauro commented Mar 8, 2017

@pdurbin That is not practical for us, especially with me no longer being on that team for support. I wrote a script and made some shell start-up changes to make it easier for our Data Management Consultants to do large file (and small fille -- why not? :) deposits via SWORD API.

@djbrooke We still get the 500 error mentioned above, but the datafiles in question do make it into the draft object. I've warned our consultants about the spurious message and that they should simply verify that the files appear online and that the checksums match.

@pdurbin
Copy link
Member

pdurbin commented Mar 10, 2017

@tdilauro ah, well, I'm glad the SWORD API is a work around. In Dataverse 4.6.1 you could also try the the "native" file upload (and replace) that got merged as pull request #3579.

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

@tdilauro I'm going to close this but if you feel like the SWORD API workaround isn't sufficient, please let us know!

@pdurbin pdurbin added the User Role: Depositor Creates datasets, uploads data, etc. label Jul 4, 2017
@pdurbin
Copy link
Member

pdurbin commented Jan 25, 2018

Huh. I guess I said I was going to close this back in June and I never did. I'll go ahead close it now, especially since we now have a new related issue over at #4433 that people can track.

@tdilauro I'm not sure who's on the support team these days for the installation of Dataverse at Johns Hopkins but please pass along that we are happy to help them try to resolve any issues they're having.

@pdurbin pdurbin closed this as completed Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants