-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple formats under "Download All" dropdown #4000
Comments
I do see your point. The rational behind this behavior has been to take the user's data, that may have been in some proprietary format - like SPSS or Stata - and change it to tab-delimited, for archival purposes, since it's a format that's guaranteed to be readable without any special software... And then it makes sense that this format becomes the default. Admittedly, it probably makes less sense with files that were originally CSV (now that we support converting CSV files into tabular data...). If nothing else, CSV is just as good of an "archival format" as tab-delimited... So, once again, I do see your point. Still, this is very, very ancient legacy and it would honestly be difficult for us to just change this behavior, without upsetting or at least confusing many existing users. But we should still be able to make it less frustrating for users like you. First of all, we already have an issue that's very close to the head of the dev. queue that will add an option for the user to opt out of converting a file to tabular data in the first place. Kind of a nuclear option, really - because then you would not be able to do things that require tabular metadata. So, not sure if that will help with your use case. And then we can make it configurable for individual files. As in, keep the default behavior as it is now - tab. is the default download format; but make it possible to specify, per file, which format should be the default. Also, it sounds like you were talking about downloading files programmatically, via API calls. I'm assuming you were able to work around this, using our API methods. Should be relatively easy, to first look up a file and determine if it's tabular or not, and if it is, ask for the original, instead of the default file. Still some extra stuff to do, of course - but doable. To summarize, we are open to suggestions, and we should be able to make the download API better suited to your needs - via extra options/features, etc. But just changing the default behavior for every existing tabular file may not be an option, for legacy reasons. Cheers. |
Ah I see!
Purpose 1: archiving. .rdata is likely not a great archive format, so I see the need to convert here. I think the *right* approach is to convert all data.frames in an .rdata file to tabular, not just the first one. Leaves open the question of what to do with r objects that are not data.frames.
Purpose 2: downloading. I’ve been using the website, not downloading programatically. The main point of friction for me is when I use the check box to select all files in an archive. That’s when it would be especially useful to have the default be the original file format. And I think this point is not r-specific — it’s been frustrating when the replication .do files call for the .dta versions of things but I only have the .tab!
But since changing these defaults appears to be hard for legacy reasons, perhaps the moment to fix things is to have a drop-down on the “download” button associated with the multi-file download that says, “download all files as original.”
I also like users being able to *set* the default download type per-file. I would use such a feature.
Thanks very much for your response,
Alex
… On Jul 17, 2017, at 6:10 AM, landreev ***@***.***> wrote:
I do see your point. The rational behind this behavior has been to take the user's data, that may have been in some proprietary format - like SPSS or Stata - and change it to tab-delimited, for archival purposes, since it's a format that's guaranteed to be readable without any special software... And then it makes sense that this format becomes the default. Admittedly, it probably makes less sense with files that were originally CSV (now that we support converting CSV files into tabular data...). If nothing else, CSV is just as good of an "archival format" as tab-delimited...
So, once again, I do see your point. Still, this is very, very ancient legacy and it would honestly be difficult for us to just change this behavior, without upsetting or at least confusing many existing users. But we should still be able to make it less frustrating for users like you.
First of all, we already have an issue that's very close to the head of the dev. queue that will add an option for the user to opt out of converting a file to tabular data in the first place. Kind of a nuclear option, really - because then you would not be able to do things that require tabular metadata. So, not sure if that will help with your use case.
And then we can make it configurable for individual files. As in, keep the default behavior as it is now - tab. is the default download format; but make it possible to specify, per file, which format should be the default.
Also, it sounds like you were talking about downloading files programmatically, via API calls. I'm assuming you were able to work around this, using our API methods. Should be relatively easy, to first look up a file and determine if it's tabular or not, and if it is, ask for the original, instead of the default file. Still some extra stuff to do, of course - but doable.
To summarize, we are open to suggestions, and we should be able to make the download API better suited to your needs - via extra options/features, etc. But just changing the default behavior for every existing tabular file may not be an option, for legacy reasons.
Cheers.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#4000 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIvwQQjXDapmAx0aDXr71ZCsAOdwRpIuks5sOt7GgaJpZM4OWxWd>.
|
Hello! Is there any chance of letting users have a choice within the "download all" menu between download all in tabular format and downloading all in original formats? That would seem to address your concern about legacy users. You could then possibly run some analytics, and if it turns out that 98% of users prefer to download in .tab vs original format, we will all know definitively that this was a fringe issue; if it turns out the other way, we will know that there was demand for original file formats. If this won't be possible, could there be some documentation on repositories for which data files have been converted? I ask because recently some colleagues were trying to download files from dataverse to replicate a study and were unable to do so, and had no idea even what the source of the problem was or how to address it based on the repository they had navigated to. Perhaps a pop-up menu when you select "download all" explaining the issue? |
Regarding the last request: This would be very easy to achieve on the API side (i.e. to make that API method that zips up multiple file bundles accept an extra "format=original" option; that would make it use the originals for the files that were converted to tabular data...) On the dataset page, can we add an extra checkbox ("use originals"?) next to that download-multiple-files-button? (I mean, of course we can add a checkbox - but can we do it without making the whole thing more, rather than less confusing?) |
This checkbox would make my life a lot easier, thank you! |
Also, we have a github issue already opened for giving the dataset owner an easy way to "un-ingest" a tabular data file; i.e. to convert it back to the original. Let's implement it finally. It should be easy. And for a researcher whose needs are primarily archival (like providing replication data to the research community), who don't need/care about running online data exploration/analysis on the site, this by itself would solve an issue like this one. |
Yep. Good old #3766. |
What's the actual action item here? |
@oscardssmith good question. You could bring this up during backlog grooming to get a "definition of done". |
No worries, I'll bring it to backlog grooming once it's a priority and there is some consensus on an approach. |
We just discussed this issue in our weekly design meeting. For this issue, our goal is to allow users to easily “download all” files in a dataset in their original format using our UI. On the dataset page's "Download all" button, we want to add two dropdown options:
But we're open to suggestion, leave your comments if you have any thoughts on this solution. |
Do we want to have logic to only give these options in the case where you have ingested files? Also to consider, currently we only ingest tabular files, but we have discussed the idea of other types of ingest, e.g. ingest zip files as a dataverse "package". Nt sure if this affects the design for this at this stage or if it's a bridge we should cross later. |
Yes, we only want to offer these options in cases where the distinction matters, i.e. when the dataset has at least one ingested file. |
That's what I assumed. Just wanted to make sure it got tracked in the issue. Thanks! |
Is #4464 a duplicate of this ticket? |
assigning to @dlmurphy to talk about this at next estimation session |
@matthew-a-dunlap For the purposes of 4000, I feel like we probably should revert back to checking the permissions as we generate the zipped stream. We may still end up using the current implementation when working on #4576; but let's think about it then. The 207 code may beuseful to have, for the API users (even though, it looks like it was never specifically requested in #4576 - it was something we offered along the way); but the UI users will Also, c) for the UI users, we are already checking the permissions on the UI side (and are warning the users there, via a popup, that we are dropping some files that they cannot download); and, per #4576, we'll be doing the same for the files that have to be dropped because of the size limit. Meaning, when this API receives a call that's a redirect from the UI, it will only contain the file ids that the user is in fact allowed to download. We do of course want to double check that it is indeed the case; but no need doing it in a separate first pass, before generating any output. So we may want the API to do both things. I.e., handle it the way you have it implemented now, by default: - run the full check, if any files have to be dropped - generate 207, only then generate output. But, also support some kind of a "start streaming asap" flag, to be used when we redirect the user to that API from the UI. But, again, we should probably address that when we work on #4576. |
Tests still need fixing and expansion
This may not need another round of review but I am dragging it back in incase someone wants to put eyes on it. |
I just want to express my excitement and enthusiasm for this ticket. Thanks for all the hard work! We're just moving our datasets to Dataverse and it probably would have been a dealbreaker if this wasn't in progress! |
Since many repositories include code that expects data files to be in a particular format, it's frustrating that dataverse defaults to downloading data files as .tab.
IMHO, the default should be the original file format, with options for all the others.
The text was updated successfully, but these errors were encountered: