-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there an option to batch file replace? #5924
Comments
@amberleahey I have this problem too. If you compare version 2 and version 3 of my dataset below you'll see that the file names are very similar. I'm lazy so this is what I've done so far:
What I think I'd rather do since there's currently no support in the UI for batch file replace is:
This is enough work that I haven't bothered. Lately I've been talking about an idea for what I started calling "sidecar" files. What if I could upload a zip file that contains something like this:
The idea here is that Dataverse would be smart enough while unpacking the zip file to populate file descriptions based on the filemetadata.json sidecar. This would be a fix for #723. If the file is the same file, an automatic File Replace could happen. If there's provenance information for a file, great, add it. Maybe something like BagIt does some of this stuff already? Or ORE? I've been meaning to ask @qqmyers about this. Anyway, here are versions 2 and 3 of my dataset so you can see what I'm talking about in terms of opportunity to use the File Replace feature, if I weren't so lazy. 😄 🛌 Version 2Version 3 (similar files) |
yes, my sense too, it would be nice to see DV get smarter about automatically replacing files (if newer) via the regular file upload (which would also support batch and zip upload and replacing on unpacking). i think its hard to tell when DV will accept new versions of files via the regular file upload, so replace is nice to have. A batch file replace option from the main dataset landing page could also work. For now, I've recommended as you say, delete and upload all new (doesn't remove files from previously published versions viewed under 'versions'). |
"This PR adds a /replaceFiles api call to allow bulk direct upload/out-of-band upload replace operations." Also...
I'm still making this recommendation because it's way easier than other options. Most recently @atrisovic asked about this and implemented a delete step in her GitHub Action uploader: IQSS/dataverse-uploader@3e5c567 . In that context the uploader (the client, basically) could take its best guess of which files are being replaced (probably based on filenames) but I'm sure there could be tricky edge cases. Git does something similar based on the content of the file but even it gets confused. Perhaps the focus should be on the most straightforward case: all filenames match exactly. If not, tell the user that bulk replace is not available. It would be a step in the right direction, at least. On more thought, in addition to the GitHub Action, there are two other clients, both created by @qqmyers, where the logic could be placed:
Anyway, the point is that perhaps the logic could be figured out in some client code first, and then maybe Dataverse itself could follow. |
@amberleahey - once #9018 is merged, should we consider this closed? (#9018 provides a bulk replace for s3/direct upload) |
Potentially yes, can anyone run the API call? If it's not as important to have this available in the UI, then we will promote the API call, thanks for flagging! |
It's an option in the S3 direct upload API (which is a sequence of 3 calls). Those are open to anyone (who can upload to a given dataset). It could eventually be added to DVUploader, etc. I raised the question because I was asked if merging should auto-close this issue and didn't want to just say yes and have this issue disappear. Up to you whether you think #9018 covers what you wanted, or not, or perhaps means this issue should close and a new issue opened for just a UI option, etc. |
yes close it :) |
@amberleahey you're the boss! Closing. |
Hello, related to file replace, we are wondering if there is a way to replace ALL files in a dataset at once?
"My question is how do I replace a dataset that has over 47 files? I have deposited my dataset in geodatabase format which has over 46 separate files in the structure. Is there a way to do a bulk replace?"
For now we see individual files can be replaced, using file replace feature, but is there a better way for many files to be replaced? Deleting the replaced files and uploading new files is I guess another option.
Any other thoughts?
Thanks,
Amber
The text was updated successfully, but these errors were encountered: