-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the Describe > One Variable > Summarise dialogue #7123
Conversation
@Ivanluv well done. That looks great for a start. |
@Ivanluv it now seems to get the correct summaries, but in the wrong order. a) You can see that Blue Sky remains at the top in the results, but its summary is at the bottom! And, may need @lilyclements help - it doesn't work for factor variables. etc. I would like it to do something for variables of any type. Not sure what to do here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments given for changes
@lilyclements any suggestions here |
Updating branch
@lilyclements I have updated this branch and its now working okay |
@lilyclements do you have any ideas how we could fix this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ivanluv this is now working fine. Now I tried with other types of data (not-numeric) and it doesn't like that.
Now I thought you might need help from @lilyclements for this. But then I tried with our summary system and that handles all types of data perfectly - even character variables. And dates come out fine as dates in the summary. So I wonder if that can provide the code to be used before going into mmtable?
I treid with dodoma data, so there is a date variable and also month as a factor. Then I made month into character. It just gives NA when it can't work out the summary!
On the actual dialogue could you have the Variables as rows checked by default.
@rdstern by the "summary system" do you mean summary tables? |
No, I meant Prepare > Data: Reshape > Column Summaries. The first receiver is Variable(s) to Summarise and it allows all types of variable - as does your Describe > One Variable > Summarise. And it also has the same sub-dialogue as yours, and copes with any summary for any type! Just what we need in your One Variable dialogue! |
@rdstern should we have the results stored as it is being done in the Describe > One Variable > Summarise? |
@Ivanluv you ask - " should we have the results stored as it is being done in the Describe > One Variable > Summarise?" I assume you mean as is done in the Prepare > One Variable > Column Summaries" I would not, at least for now. Once we are able to do sensible summaries for variables of all types, then I would like to merge and then reflect and consider. What is done here is a great update, and very useful. I am more concerned here with the tables of results. And they are able to use any of the summaries we have in our system. There will be more to do, but that can come later. We need it to stay simple, but could usefully add the options for the missing values - now that we have them in a package. Then (as you say) there will be 2 further directions: We also apply these ideas to the 2-way and 3-way tables - if we have the 3-way - I think so! Then in simple graphs and tables (which IS descriptive statistics after all) we have a package to be envied! And I continue while I am thinking what's needed. I use this, because one-variable summaries is soooooo simple. Who would have thought you could do so much with it! In the presentation of the results we can choose the order of the summaries in the general case, but not here. And, even in the general case, we can't yet choose the labels for those summaries - but we will in the General Summaries. Then the challenge in this dialogue is to make it as attractive and flexible as possible - but not complicated. Here it must stay also simple for the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ivanluv not sure what has happened here? This now seems like the old dialogue again?
@rdstern Am still making the changes for the dialogue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ivanluv I assume these changes are "work in progress?", but I had a quick look.
a) The options are now visible all the time. Should not be, for the default where the summary is used.
b) The default of ticking the Variables as rows is still to be done.
c) The summary gives an error now.
@rdstern could you have a look |
@Ivanluv if the user selects to modify/choose the summary statistics to display, you are currently using
In PR #7230, the code is fixed to use date variables in the summary functions. Once this is merged in, then update your master and hopefully that will work! (Or you can merge it into this PR if you think that's more appropriate). |
@Ivanluv I'm not sure if this will fix the proportions problem. If it is still a problem after changing the function then I can look into it then |
@Ivanluv how is this work going? |
@lilyclements am waiting for PR #7230 to be merged |
@rdstern @lilyclements I have merged PR #7230 into this and made the changes as suggested by @lilyclements i.e using the After the error the table won't be produced even if one of the selected columns supports the summary |
Note sensible summaries for date variables. And I made month into an ordered factor, so it can get the min and max! That's nice to explain to people! But we do need to solve the problem that @Ivanluv mentioned, namely that it can't cope with some summary statistics - like sum, or mean, for a factor. When values are missing, and that option isn't given, then that cell is blank. We need the same for those statistics for a factor variable, etc. One small change if @Ivanluv can't solve the current error, is to move the Missing option to below the other 2 checkboxes. Then it will be in the same relative position as in the summary dialogue. |
So the sum/mean for factor is a problem in I've tried to work on getting this into the |
@@ -1418,28 +1423,29 @@ DataBook$set("public", "summary_table", function(data_name, columns_to_summarise | |||
for (facts in power_sets_summary) { | |||
if (length(facts) == 0) facts <- c() | |||
summary_margins_df <- data_book$get_data_frame(data_name = data_name) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change data_book
to say self
? Realised I missed these earlier.
dplyr::select(c(factors, columns_to_summarise)) %>% | ||
tidyr::pivot_longer(cols = columns_to_summarise) | ||
dplyr::select(c(tidyselect::all_of(factors), tidyselect::all_of(columns_to_summarise))) %>% | ||
tidyr::pivot_longer(cols = columns_to_summarise, values_transform = list(value = as.character)) | ||
data_book$import_data(data_tables = list(summary_margins_df = summary_margins_df)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
dplyr::select(c(factors, columns_to_summarise)) %>% | ||
tidyr::pivot_longer(cols = columns_to_summarise) | ||
dplyr::select(c(tidyselect::all_of(factors), tidyselect::all_of(columns_to_summarise))) %>% | ||
tidyr::pivot_longer(cols = columns_to_summarise, values_transform = list(value = as.character)) | ||
data_book$import_data(data_tables = list(summary_margins_df = summary_margins_df)) | ||
summary_margins[[length(summary_margins) + 1]] <- data_book$calculate_summary(data_name = "summary_margins_df", columns_to_summarise = "value", summaries = summaries, factors = facts, store_results = FALSE, drop = drop, na.rm = na.rm, return_output = TRUE, weights = weights, result_names = result_names, percentage_type = percentage_type, perc_total_columns = perc_total_columns, perc_total_factors = perc_total_factors, perc_total_filter = perc_total_filter, perc_decimal = perc_decimal, margin_name = margin_name, additional_filter = additional_filter, perc_return_all = FALSE, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
dplyr::select(c(factors, columns_to_summarise)) %>% | ||
tidyr::pivot_longer(cols = columns_to_summarise) | ||
dplyr::select(c(tidyselect::all_of(factors), tidyselect::all_of(columns_to_summarise))) %>% | ||
tidyr::pivot_longer(cols = columns_to_summarise, values_transform = list(value = as.character)) | ||
data_book$import_data(data_tables = list(summary_margins_df = summary_margins_df)) | ||
summary_margins[[length(summary_margins) + 1]] <- data_book$calculate_summary(data_name = "summary_margins_df", columns_to_summarise = "value", summaries = summaries, factors = facts, store_results = FALSE, drop = drop, na.rm = na.rm, return_output = TRUE, weights = weights, result_names = result_names, percentage_type = percentage_type, perc_total_columns = perc_total_columns, perc_total_factors = perc_total_factors, perc_total_filter = perc_total_filter, perc_decimal = perc_decimal, margin_name = margin_name, additional_filter = additional_filter, perc_return_all = FALSE, ...) | ||
data_book$delete_dataframes(data_names = "summary_margins_df") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finally here. Thanks!
|
Calculation summary edits
@lilyclements I have pulled you changes from PR #7338 and am getting the following error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ivanluv is village or size a character variable? If so then the following should fix it. If not, then the following will fix another potential bug so can you implement the following anyway:
property = "class") | ||
# note: important that there *is* a space between | for str_detect function | ||
if (any(stringr::str_detect("factor | Date", col_data_type))){ | ||
curr_data_list[[c_data_label]] <- curr_data_list[[c_data_label]] %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ivanluv can you change “factor | Date” to
“factor | character | Date”
it is important to have a space between the | too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilyclements the bug is now fixed
@Ivanluv was this supposed to work now? I tried with a minimum for a factor and it said that isn't allowed. |
@Ivanluv and @lilyclements I am hoping this will be included in the Version 0.7.5 update? That's in 2 weeks time. OK? |
@rdstern which combinations do we not want to work? Factors and characters: Ordered factors: Dates: I have not yet put in that min/max (amongst others in the list above) shouldn't work for factors or characters, but that is very quick to add. I'll add any other combinations when you've responded to this so I know if there's any I'm missing here. Thanks! |
@lilyclements and @Ivanluv I'd prefer to look at this the other way round, namely what is sensible for different types. I am just keen that we have an option to mix the data types in this dialogue, rather than having to think carefully what to put in the the receiver. |
@lilyclements what is you suggestion on how to proceed ? |
Updating Branch
fixes #7081
@africanmathsinitiative/developers this is ready for review
@rdstern In this PR I have added the functionality to control the orientation, i.e. choosing whether the variables or the summaries go down in the summary table