Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same filename with the same checksum (i.e. MD5) shouldn't be able to appear twice in the same dataset #3571

Closed
pdurbin opened this issue Jan 13, 2017 · 10 comments
Assignees

Comments

@pdurbin
Copy link
Member

pdurbin commented Jan 13, 2017

The same filename with the same checksum (i.e. MD5) shouldn't be able to appear twice in the same dataset. That's my understanding anyway.

This was tested on v. 4.6 build 62-c913662 on https://demo.dataverse.org

Here's a screenshot:

screen shot 2017-01-13 at 5 01 53 pm

@pdurbin
Copy link
Member Author

pdurbin commented Jan 13, 2017

Related issues about duplicate files: #2955, #2467, #357.

@landreev
Copy link
Contributor

Wow, yes, it looks like this bug is in the production v. 4.6.
Specifically, it's not going to upload the same file twice in the same upload session.
But it does allow it if this is an existing file; and then you go into "upload files" and upload it again.

The duplicate check is there. It just appears to be failing for whatever reason. I'll investigate.

@landreev
Copy link
Contributor

The good news is, I believe I have fixed this, in the 2290 branch.

tl;dr version:
That was another issue where it was important if the dataset already had an existing DRAFT version; or if a brand new DRAFT was being created as part of this current upload process.
The duplicate check was working properly if there was an existing DRAFT. But it was failing in the latter case, in a newly created DRAFT.

I accept full responsibility for this one. I was testing this stuff extensively - but I guess I must have been working with existing drafts only (?).

@landreev
Copy link
Contributor

I was also told about a similar issue with the file replace; that it was possible to replace a file with a duplicate of a file already in the dataset. That I tested, but COULD NOT reproduce. Tested published files with no drafts and published files with existing drafts... let me know if I'm missing something.

Note that if true, with file replace, that it would indeed be a "similar", parallel issue, but not the same thing that I just fixed. Because file replace has its own duplicate logic, separate from the code path that regular uploads go through.

@landreev
Copy link
Contributor

landreev commented Jan 17, 2017

The 2 older issues that were tagged as similar - #2955 and #2467: I really feel they should be closed for good. Both were opened before the upload improvements in 4.6; the duplicate messages were specifically addressed there.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 17, 2017

Since 505fba0 appears in the 2290-file-replace branch I'm adding the 4.6.1 milestone and moving this into Development at https://waffle.io/IQSS/dataverse

@landreev
Copy link
Contributor

OK, I'm moving this into code review.
Since the fix for this issue was in the #2290 branch, there's not going to be a dedicated pull request for this issue.
The de-facto procedure for code reviews under this scenario is that you review the actual code change (see below), and add a comment here saying "reviewed - looks ok" or "reviewed, have comments/suggestions", and then move it back into the dev. column; and maybe add another comment saying "this issue should be added to the pull request for 2290, once it is made; and then moved into QA, once 2290 goes into QA".

The actual fix was a one line variety. In EditDatafilesPage.java, changed this:

if (fm.getId() != null && fm.getDataFile() != null) {

to

if (fm.getDataFile() != null && fm.getDataFile().getId() != null) {

  • note the subtle difference... The intent of that check was to only look at the existing files, i.e., files already in the dataset. But going about it by checking if the FileMetadata ID was not null was of course WRONG. FileMetadata ID being null merely indicates that the current DRAFT version is brand new and hasn't been saved yet. It can still be referencing an existing, already saved in the database DataFile. So the correct way to go about it is to check if the DataFIle ID is not null.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 18, 2017

I'm grabbing this issue to do the code review.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 18, 2017

Ok, the fix looks good, makes sense, and even works from a quick test. 😄 As @landreev explained. This issue will be parked in Development until the pull request for #2290 is made.

@kcondon
Copy link
Contributor

kcondon commented Jan 26, 2017

OK, looks good, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants