Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending the File > Export > Export Dataset dialogue #5590

Closed
rdstern opened this issue Nov 25, 2019 · 16 comments · Fixed by #5873
Closed

Extending the File > Export > Export Dataset dialogue #5590

rdstern opened this issue Nov 25, 2019 · 16 comments · Fixed by #5873
Assignees
Milestone

Comments

@rdstern
Copy link
Collaborator

rdstern commented Nov 25, 2019

Some small changes are being made to this dialogue currently, see #5585.
There are further improvements that may be easy to implement and will also more than justify extending the dialogue.
This uses the export dialogue from the rio package and we will continue to use that.

  1. It currently can export to Excel (xlsx files).
  2. There are some formats that could be added to the file list. In particular it exports to matlab and sas that we don't have in our list. Also open document (for open Office) ods files
  3. The big one is that it can export multiple data frames to Excel, rdata, HTML and (I assume) rds.
  4. There are other features that we may wish to explore. In particular exporting directly as a zip file, exporting with labels (serialize) and also appending to an existing file, rather than overwriting.

But I suggest the big one is the facility to export multiple data frames.

(It would also be very useful for our climatic work if we could take daily data, then produce (say) annual summaries and then export all these easily to Excel.)

So I propose the main change is to add one of our famous new-style radio buttons at the top. They could have an initial label Data Frame:. Then the first button is Single and the second is Multiple.

If Single, then the dialogue is initially roughly as now. Useful to add a checkbox Add new sheet (Excel only), which would add the command which="sheet name". (I think it is ok, - i.e. does no harm, but ignored if the format isn't Excel, or there isn't a file of the given name to add to. This needs to be checked.)
A second checkbox would be labelled zip file. This just adds .zip to the file extension and results in a zipped file. This checkbox would be for both single and multiple

If Multiple, the single data frame control is replaced by a multiple one. The main part of the dialogue would look like the one for File > Export > Export R workspace, which can do roughly the same thing. The check-boxes would be replaced with the one for zip .

In the Multiple case case, the call to the Save dialogue has a very reduced list of formats with xlsx as the default.

Here is the code that seems to work for 2 sheets to be exported:

# Code generated by the dialog, Export Datasets

dodoma <- data_book$get_data_frame(data_name="dodoma")
survey <- data_book$get_data_frame(data_name="survey")
rio::export(x=list(dodoma=dodoma,survey= survey),file="C:/Users/RogerStern/Dropbox (SSD)/Roger/Temp/dodoma.xlsx")
rm(dodoma,survey)
  

The only problem is the separate get_data_frame for each - could get long if many data frames. We need to check with @dannyparsons or @volloholic

@Patowhiz
Copy link
Contributor

@rdstern you last issue can easily be solved by setting an allowable limit to the number of dataframes selectable. @Wycklife I can work on this if you are yet to start work on it.

@rdstern
Copy link
Collaborator Author

rdstern commented Jun 30, 2020

@Patowhiz that would be great.

@rdstern rdstern assigned Patowhiz and unassigned maxwellfundi and Wycklife Sep 1, 2020
@rdstern
Copy link
Collaborator Author

rdstern commented Sep 1, 2020

@Patowhiz was this on your current list. For me the important feature is to be able to export multiple sheets to Excel. This would be very useful as part of our quality control work for climatic.

@Patowhiz
Copy link
Contributor

Patowhiz commented Sep 5, 2020

@rdstern do we really need the radio buttons? The dialog could be intelligent enough to detect when its a single data frame selected or a multiple data frames and automatically change the file types available and the options too.

@rdstern
Copy link
Collaborator Author

rdstern commented Sep 6, 2020

@Patowhiz I looked at your File > Export Objects dialogue where you have our "ordinary" data selector and then a multiple receiver and that would be fine. As you say, we don't need radio buttons. And I like the idea of having the data selector where you obviously view all the data frames, compared to the current arrangement with a drop down. Good idea.

@Patowhiz
Copy link
Contributor

Patowhiz commented Sep 7, 2020

Addressing item 3.

From my analysis of the rio package in regards to exporting to excel, there are important things that I would like to point out;

  1. It cannot add a list of new sheets to an existing workbook ( and I'll be very happy if someone else finds this possible), you can only add a single sheet name using the which parameter, the which list cannot be vector or a list of names to overwrite(I was surprised because the same parameter in the import_list accepts a vector).
#below code throws error. which only accepts a single string value. Any alternative to this?
rio::export(list(b = mtcars, c = iris), "multisheet1.xlsx", which = c("a","b"))
  1. It cannot overwrite an existing sheet in an existing workbook. If the sheet name indicated by the which parameter already exists, and error is thrown.
#below code throws an error if run twice 
rio::export(iris , "multisheet2.xlsx", which = "iris")

@rdstern from my above analysis (which I welcome corrections if I missed anything), appending sheet names to an existing workbook(excel file), will be a bit tricky. It will entail manually; Checking if the sheet name exists, if it doesn't, call in the rio command, if it does, deleting the sheet name then calling the rio command. This is necessary to avoid confusing users with errors thrown.

An easier and straightforward alternative will be to just always overwrite the existing workbook(excel file) i.e take the list of selected data frames(whether multiple or single) and just write them to a single file, if the file already exists, just overwrite it.

@rdstern what's your opinion in regards to this, which implementation works best for you. Both are doable. Personally I like the second option, fundamentally it blends in with the warning prompt that VB.NET gives when you type a file name that already exists, this means the user already knows what to to expect.

If you agree with my preferred option then I think, a sensible checkbox text would be something like "Export as single file" which would be checked by default. The big question now will be when it is unchecked, ideally the browse button should open an inbuilt directory prompt, that allows the user to select a directory/folder in which the data frames would be saved. Going by this, the next stage would be specifying the file type or extension to be applied to all the files. We could decide to have the files saved as excel and not give the user the freedom to choose the allowable file types; xlsx, rdata, rds, html.

Below is a partial screenshot of how I think the dialog will look like(the file type feature/control excluded)

export multi

@rdstern
Copy link
Collaborator Author

rdstern commented Sep 7, 2020

I really like this layout. I assume the checkbox on save as a single file is disabled (or invisible) if only one sheet is selected?

While here, the Export File control (and Save File Control used to look as though you could type into the space. I think this has been mentioned before. I am happy if it became even wider than this, because the path is sometimes long, and that you could left or right-justify.

In your comments above if I have understood correctly:
a) As we discussed, if it is complicated to add to an Excel book, then just omit that option. It writes a new workbook/file, or overwrites and existing one - but asks you first.
b) If a single sheet could be added, but not a set of sheets, that would be great. It would be worth having, if it isn't too complicated. That's what I sometimes do with a copy and paste between Excel worksbooks - and just for one sheet at a time. (If the sheet name has been used before, then either it overwrites, or it gives the new sheet an extra name, e.g. adds a 1 to the existing name.)

@dannyparsons
Copy link
Contributor

I don't think we should worry about trying to append to existing Excel files, that's messy in all sort of ways as you point out and would need quite a bit of work to be done very well.

I think we should just have one option to create a new Excel file. If you choose an existing file, then it is replaced, and you get the warning about this to tell you from the file dialog.

I am not sure about getting rid of the radio buttons for single/multiple. I think they are probably different enough ideas that the user should make a more conscious choice about what they want to do. We can be clever and detect this ourselves, but I don't think that is more helpful to the user. We would have to automatically switch file types depending on how many data frames are chosen and I think this is overly confusing for limited benefit.

I am not sure what other option there will be for multiple other than Excel? Maybe RData which is a workspace export, although that is best done in the Export Workspace dialog which has more options. RDS is for a single structure so only one data frame. This makes me feel more that it is an option for the user to choose and not detected automatically.

I think the dialog should always export a single file, at least initially. There's a lot more to consider about choosing file names and overwriting once you export multiple files which I'm not sure is worth the work needed at this stage.

@Patowhiz
Copy link
Contributor

Patowhiz commented Sep 7, 2020

I am not sure about getting rid of the radio buttons for single/multiple. I think they are probably different enough ideas that the user should make a more conscious choice about what they want to do. We can be clever and detect this ourselves, but I don't think that is more helpful to the user. We would have to automatically switch file types depending on how many data frames are chosen and I think this is overly confusing for limited benefit.

@dannyparsons this is what I initially thought, on the contrary I think and found the rio package is updated to handle this well. And it's also very easy to detect this in R-Instat. The rio package already clearly explains the file types that can be appended to and it matches what @rdstern proposes.

I am not sure what other option there will be for multiple other than Excel? Maybe RData which is a workspace export, although that is best done in the Export Workspace dialog which has more options. RDS is for a single structure so only one data frame. This makes me feel more that it is an option for the user to choose and not detected automatically.

@dannyparsons the options are the same .rds file formats now accepts more than a single structure (they are just serialised objects anyway). And rio is able to import multiple structures inside a single rds file, we have already discussed and implemented this on another issue #5602.

I think the dialog should always export a single file, at least initially. There's a lot more to consider about choosing file names and overwriting once you export multiple files which I'm not sure is worth the work needed at this stage.

@dannyparsons other than the issues I have outlined in my previous comments I didn't find any. We have now have a custom control for saving files, we can easily extend that control to warn the user that existing files will be overwritten.

Please have a test my partial implementation of this feature in PR #5873. The generated commands are generally the same for both single files and multiple files(saving into single file).

@Patowhiz
Copy link
Contributor

Patowhiz commented Sep 7, 2020

I really like this layout. I assume the checkbox on save as a single file is disabled (or invisible) if only one sheet is selected?

@rdstern yes. If only 1 sheet is selected, it becomes invisible. If several are selected then it becomes visible.

While here, the Export File control (and Save File Control used to look as though you could type into the space. I think this has been mentioned before. I am happy if it became even wider than this, because the path is sometimes long, and that you could left or right-justify.

@rdstern I'm happy to make it wider. But also that is limited. Would you welcome a popup similar to the one in the comments text box? That will ensure that no matter how long the file path will ever be, the user can easily see the whole of it if he/she decides to.

b) If a single sheet could be added, but not a set of sheets, that would be great. It would be worth having, if it isn't too complicated. That's what I sometimes do with a copy and paste between Excel worksbooks - and just for one sheet at a time. (If the sheet name has been used before, then either it overwrites, or it gives the new sheet an extra name, e.g. adds a 1 to the existing name.)

I can look into this.

@dannyparsons
Copy link
Contributor

I am not sure you understood my point on the radio buttons, I would like them so that the user first makes a choice between a single or multiple export, because they are quite different in terms of options and choices. I know we can detect this easily and the R code is easy but I don't think that actually makes it easy for the user. In fact, I think without the radio buttons we are adding confusing by silently changing file types available as the number of data frames selected changes.

An RDS file doesn't save multiple structures, it can save a list as a structure i.e. a list of data frames, which is what rio does, and I can see that could be a useful option different to saving an RData workspace.

I think what would make the dialog clearer is to have a drop down to choose the file type being saved. I think this should be an explicit option on the dialog instead of hidden in the file dialog. This way we could also explain what the options were for multiple. e.g. "Excel (multiple sheets)", "RDS (list of data frames)", "RData (separate data frames)"

I don't see an easy way of exporting multiple files while allowing the user to give a name for each of the files that will be exported, and check whether each of these files already exists and will be overwritten. This is all done automatically with the built in file dialog for a single file but we would have to do a lot to replicate this for multiple files. The R code is easy, I'm not worried about that I'm worried about usability.

@rdstern
Copy link
Collaborator Author

rdstern commented Sep 7, 2020

I don't feel strongly either way. Happy to have the button if Danny feels strongly about it.
What I do like is our usual selector, rather than the drop down we used to have. I have not noticed when using the dialogue repeatedly, that it has remained on the previous data frame, rather than the current one - and so exported the wrong one.

So I like our "usual" selector as we have elsewhere, with a single receiver or multiple receiver.

@Patowhiz
Copy link
Contributor

Patowhiz commented Sep 7, 2020

I think what would make the dialog clearer is to have a drop down to choose the file type being saved. I think this should be an explicit option on the dialog instead of hidden in the file dialog. This way we could also explain what the options were for multiple. e.g. "Excel (multiple sheets)", "RDS (list of data frames)", "RData (separate data frames)"

I don't see an easy way of exporting multiple files while allowing the user to give a name for each of the files that will be exported, and check whether each of these files already exists and will be overwritten. This is all done automatically with the built in file dialog for a single file but we would have to do a lot to replicate this for multiple files. The R code is easy, I'm not worried about that I'm worried about usability.

@dannyparsons thanks, I agree, these are concrete concerns. Let me think about this in the design. I like the option of explicitly telling the user what to expect, in both saving multiple data frames into a single file and saving multiple data frames as separate files.

@rdstern
Copy link
Collaborator Author

rdstern commented Sep 7, 2020

I give in. There are too many cooks here. I would just like multiple data frames, when needed to be able to be exported to Excel. The details of the dialogue and other options when there are multiple data frames are all bonuses. I would also like it to be done quickly. There is a lot more to be done.

@Patowhiz
Copy link
Contributor

Below are screenshots of what I have come up with.

Exporting single data frame
single data frame

Exporting multiple data frames to a single file
single file

Exporting multiple data frames as separate files
multiple separate files

@Patowhiz
Copy link
Contributor

Patowhiz commented Sep 10, 2020

There are further improvements that may be easy to implement and will also more than justify extending the dialogue.
This uses the export dialogue from the rio package and we will continue to use that.

  1. It currently can export to Excel (xlsx files).
  2. There are some formats that could be added to the file list. In particular it exports to matlab and sas that we don't have in our list. Also open document (for open Office) ods files
  3. The big one is that it can export multiple data frames to Excel, rdata, HTML and (I assume) rds.
  4. There are other features that we may wish to explore. In particular exporting directly as a zip file, exporting with labels (serialize) and also appending to an existing file, rather than overwriting.

@rdstern I have been able to enhance the dialog to include all the items except item 4. Which I would like more clarification;
i. In regards to zip file, do you mean exporting something like survey.csv.zip which is documented by rio.
ii. What do you mean by exporting with labels (serialize)?

@rdstern rdstern modified the milestones: 0.7.x, 0.6.7 Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants