-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate upload of the same file #357
Comments
Original Redmine Comment Elizabeth Quigley wrote:
I'm curious what we want users to see. "Copy of file1"? Or should they get an error? |
See also some discussion with @jwhitney and @posixeleni about duplicate files in the context of the Data Deposit API as its implemented in DVN 3.x: http://irclog.iq.harvard.edu/dataverse/2014-07-14 |
What do we want this do? Should we have the system check for: Example of a way Spotify does something like this when they recognize a duplicate file: |
OK, in its current form, the ticket still doesn't say how exactly this should be handled. Elizabeth, per your last comment:
What do you think? If you would rather just have a warning, I could do that too. (but that would give them a possibility to ignore the warning and add a file with the same name but different content... - kinda sounds like a mess - ?) |
@landreev The way duplicate filenames are done in DVN 3.* works for me. Since a user can edit a file name after its uploaded, they can always change a duplicate filename then so no need for a warning. |
Great, thanks. |
What do we want to happen if users attempt to upload the same files via SWORD? As https://redmine.hmdc.harvard.edu/issues/3301 explains, right now an error states, "Filename 50by1000.tab already exists." |
Phil, Are you seeking a separate processing route for the Deposit API for some On 9/9/2014 8:15 AM, Philip Durbin wrote:
Akio Sone |
@akio-sone in DVN 3.x SWORD code duplicates code elsewhere in the system, unfortunately. Refactoring to use common code would have been too much effort. In Dataverse 4.0, as much as possible, I would like SWORD to use the same code path as the GUI. @landreev and I have already talked about how I should switch the SWORD code to his new back end method (#611 I think) for expanding zip files. |
While I agree with having distinct file names, please note that On Mon, Sep 8, 2014 at 3:54 PM, landreev [email protected] wrote:
|
If an uploaded file appears to be a duplicate of an existing file, *by content* (i.e., by md5), a warning message will be displayed, and the matching md5s highlighted. This way the user has an option of either canceling the entire upload, or checking the delete chekboxes next to the files that don't want, before they hit 'save'; or just to proceed with adding the files as they are - if they have some kind of a weird reason to have multiple identical files in the dataset...
OK, for this week's beta push, this is going to be implemented as agreed upon yesterday:
Phil, Akio: answering your question - in 4.0 the part above, where identical/already existing file names are modified until unique, is done in the IngestService (not in the Dataset page). Both the page and SWORD deposit call the service to process the files that are being uploaded. So the Deposit API will match the default behavior of the page - modified until unique. If anybody has suggestions/ideas for the GUI part of this, we may revisit this in Beta 8, when/if the dataset page is rebuilt to switch to using search for file display. |
Phil: yes, you should definitely switch to the ingest service method that supports the "file spawning model" - where an uploaded file may result in several datafiles created (currently supported cases are zip files and geo shape files). |
Basically works but doesn't detect duplicate filename if file was previously ingested as subsettable: ingest 50by1000.dta, then upload again. Name isn't automatically changed to -1. We think this is because subsettable filename changes to .tab after ingest when it should compare subsequent uploads to original filename. |
Good find - yeah, the filename check wasn't working on tabular data files (because foobar.dta becomes foobar.tab once ingested!). |
@sbarbosadataverse Have you gotten complaints about this? According to @landreev, this is how it is being done in 3.* already so we won't be implementing a change. |
@eaquigley, @sbarbosadataverse: |
ok, seems to be working now. Closing |
Author Name: Elizabeth Quigley (@eaquigley)
Original Redmine Issue: 3772, https://redmine.hmdc.harvard.edu/issues/3772
Original Date: 2014-03-25
Original Assignee: Leonid Andreev
Noticed during testing that a user can upload a file multiple times at once without issues.
The text was updated successfully, but these errors were encountered: