-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe > Two/Three Variables > Summarise #4952
Comments
Next proposal for these 2 dialogues:
On the Describe > Two Variable > Summarise there is currently a sub-dialogue - I guess there will be! On the Describe > Two Variable > Graphics there is: So, suddenly you do have to understand this idea! |
@rdstern is this still awaiting further discussion? |
I would like to make some progress on improving this. Here are some suggestions:
For the summarise dialog the summaries will be:
This is sort of how it works now, so I think this was discussed before. Questions
|
I wonder, with the two variable situation, whether we (at least) have two radio buttons called By and 2 Variables? The By is simply the one variable dialogue with results By a second variable. This would be like the grouped data frame idea in dplyr. I suggest we consider the three variable dialogues (and possibly add a 4 level item to the menu at the same time. I suggest this will be useful, and continue David's initial idea. So the three-level could (at least initially) be simply 2 By, and By. For consistency we might include a 3rd button which is 3 Variables, but this would be disabled for now. Here the 2 By is the same as the one Variable multiple receiver, split by 2 factor variables. The By is all the 2-variable options split by one factor. If it looks useful, then we could add the 4 Variables situation, which might just be the 2-variables summaries split by 2 factors. Of course there are other options for 4 variables, but many analyses seem to stop at 2-way tables, etc, (and there is a fair bit to teach here) and we do want to encourage users to move to the more general situation. So, at least for now, I suggest we don't worry about too may 3-variable tables etc, This split will mean that the one-variable by will allow the multiple receiver to permit any, or all variables, while the tow variable summaries can restrict the multiple receiver to be of a single type. Allowing the By up to 2 factors also fits well with the graphics, where the default can be for a by to be a facet. |
How should date columns be treated? We had thought like a numeric column, but you can't do correlations with them like a numeric column, and they also can't be used as the response variable in an ANOVA table. |
Interesting. Is this a logic, or an R question. I can think of examples where correlations or regression could be useful.
Is this more a question of a suitable origin being needed, or perhaps there is usually a logical start so it is a difference in dates that is being used. In many studies there is a natural origin (often zero), while dates have an arbitrary origin. I get this problem when looking at trends in temperature, with year as the x - but could be daily with a date as the x. Then (with year) the origin is year zero, which is a long time ago! In a practical sense this can mess up the regression modelling, so better to have a more sensible origin. Is the paper by Cox useful here? Perhaps the issue of it being a date is less relevant than the fact there are instances where just making it numeric is not sensible, because the variability of the different date/times may be very low compared to the size of the observations? |
The question was sort of both, in R these give an error, but you can convert to numeric to get days since 1970/1/1. As you say I think the origin is arbitrary, since its the differences and not the actual values that are of interest usual. |
I assume this is the place to comment on the 2-variable summarise. I am using @dannyparsons new version. Currently you only get the Counts. You don't get the margins - I think?. The margins are also interesting. Sometimes you want one, but not both. Initially I would be happy with a single checkbox so you either get the margins or not. Ideally there would eventually also be another checkbox, perhaps only visible when you ask for one of the percents. It could be labelled as "Counts for 100%". (This is what Genstat does as default.) |
@Ivanluv the suggestions just above are for the situation with categorical by categorical. There are no options and the display is pretty awful! |
@rdstern should I use the |
@Ivanluv now I see at least one example of the questions where you expected an answer - and I missed it, and you didn't remind me! Also perhaps that you would like more specific direction - though (as a programmer) you are asking more detail from me than (as a user) I have. |
Using mmtable2 here seems like a good solution, and implementation-wise (I assume) is very similar to the work already done in the summaries dialog. |
@lilyclements the object produced by |
It cannnot be passed to Out of interest, how does |
I wonder if this starts to raise the more general question of whether we have a separate Describe > Specific > Frequency and Summary tables dialogue? Should we consider having a Tables dialogue with a frequency and Summary button at the top? The main differences in the frequency tables is just the need for the percentages. |
That sounds sensible. |
@lilyclements many thanks for that. With your nice neat layout above why not have another summary table from Categorical by Numeric by Categorical? So it is the same as Categorical by Categorical by Numeric? |
@rdstern I've amended my table to reflect those changes :) |
@derekagorhom can you try this one? Perhaps even share with Raphael, if he is ready? It is a good one to build carefully on the 2-variable code and includes a lot of statistics too! It would be good for Sabi to test. |
@derekagorhom I can understand why you have been quiet on this one, given all you have been doing concerned with the AIMS course. Are you happy to work on this one, once that is over, or do you have too many other tasks just now? |
@rdstern sorry for the late reply, yes i will work on it next week but if someone else would like to attempt it. that is fine with me |
@derekagorhom this is an important dialog we need to get working. I am starting to be concerned that you may be spending too long helping on the new visualise dialog, which is fun but much less important. I had hoped that work on this one might have started while Lily was visiting. Now it could involve @fran2or for support. I'd be happy for him to be spending a bit more time on R-Instat, and this one is also now in the climatic menu as well as in describe. |
@derekagorhom you have been very quiet these last 2 weeks. Is everything ok? |
@rdstern sorry for being quiet on this issue. |
@derekagorhom that's great - many thanks. I was only concerned if the work hadn't started. |
@Vitalis95 I have yet to check your recent pull request. But I had a good discussion with @volloholic and am now ready to list the way I suggest this dialog - with our summary metods should work.
That's stage 1 for 2 variables. Notice we have lost the correlations option. Don't delete that code, because I suggest we still need it, see below. So two (new) improvements: a) Numeric by Numeric we could also have the correlations. So add a checkbox Correlations. Default unchecked. If checked, then it gives the ANOVA anyway, plus the correlations. (Later we may add another checkbox perhaps saying Model where we give the formula for the regression line. Again default is unchecked.) c) .And another change - maybe later. (But I think it is a reall "goodie" and the first steps can be done now!) Add a Checkbox saying Swap y and x) Default unchecked. For now make it disabled. d) In the variables for this Summary (top radio button) Add (y) to the name, so it becomes I would like to merge initially at this stage. Then continue with the rest below: Initially I am just interested in |
@rdstern , @lilyclements a clarifications on the following; |
If the If the |
@Vitalis95 we can chat today. I think you are correct and that's what I posted last week. You may want to read that post again? |
This still needs the 3 variable, so I'm re-opening |
@lilyclements , for the 3 variables , when the
We get the following error; Please can you also add the |
@lilyclements , here is the code;
|
@Vitalis95 thanks for this. To fix this, can you amend the (Really simple - just changing the line
to
) If it's easier: The entire function should now be:
|
There is a lot to improve here.
I suggest a few steps, so (at least) the dialogues are consistent. But then I think discussion is needed with @dannyparsons and @volloholic to check on the strategy. Once that's agreed, and I would like to write the discussion points in this issue, then I suggest it could be an interesting and important task largely for @Muthenya , with support from the others?
So an initial suggestion. The 2 dialogues remain inconsistent and should not be. Summarise has a single receiver first and then a multiple receiver. Graph has the same, but the other way round.
I suggest that most people will be considering 2 specific variables when they visit this dialogue. So could we have the same idea as in the specific graphs, namely
a) The first receiver is always a single receiver.
b) The second one is also a single receiver (by default), but with the same button as on the initial receiver for the Describe > Specific > Boxplot, etc. So it says Single, and can be changed to Multiple, in which case it becomes a Multiple receiver.
I don't necessarily expect @dannyparsons and @volloholic to agree, even with this, but propose that the extra button will allow the idea of the dialogue to be explained clearly.
The text was updated successfully, but these errors were encountered: