Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Updates Describe > Two Variable Summarise #9074

Merged
merged 27 commits into from
Oct 14, 2024

Conversation

Vitalis95
Copy link
Contributor

@Vitalis95 Vitalis95 commented Jul 16, 2024

Fixes partly #4952
@rdstern , have a look at the progress made

@rdstern
Copy link
Collaborator

rdstern commented Jul 19, 2024

@Vitalis95 that's looking very promising. Here is the option for 2 variables when Numeric by Numeric

image

a) In the 3 variable case, with the first and second numeric could you add the Means and Sig Level as well and also the Swap and the misssing values option.
b) @lilyclements may want to comment on this point. I suggest, when we have Means (both 2 variable and 3 Variable) we should change the word Means to Model. The Model is the means when a variable is a factor, and it is the Slope (or Slopes) for a Variate. So that's what we should be giving there?
c) In 3 variable with ANOVA could there be another option, with now some radio buttons. There will be more, but this one to start with is Interaction. If chosen then it gives the ANOVA with the interaction term.
d) And a third radio button could we have Compare as the label. This would not have the option of means. It would include sig level and the swap. I forget the command, but it would give the ANOVA line with just the second variable, then the third and then the intereaction. So one could find the decrease in the residual when we add the third variable (so the second in the x of the model, and then the extra when we also add the interaction term.

@Vitalis95
Copy link
Contributor Author

@Vitalis95 that's looking very promising. Here is the option for 2 variables when Numeric by Numeric

image

a) In the 3 variable case, with the first and second numeric could you add the Means and Sig Level as well and also the Swap and the misssing values option. b) @lilyclements may want to comment on this point. I suggest, when we have Means (both 2 variable and 3 Variable) we should change the word Means to Model. The Model is the means when a variable is a factor, and it is the Slope (or Slopes) for a Variate. So that's what we should be giving there? c) In 3 variable with ANOVA could there be another option, with now some radio buttons. There will be more, but this one to start with is Interaction. If chosen then it gives the ANOVA with the interaction term. d) And a third radio button could we have Compare as the label. This would not have the option of means. It would include sig level and the swap. I forget the command, but it would give the ANOVA line with just the second variable, then the third and then the intereaction. So one could find the decrease in the residual when we add the third variable (so the second in the x of the model, and then the extra when we also add the interaction term.

@lilyclements , what is your comment on this? The interaction term is not in the $anova_tables2 function

@rdstern
Copy link
Collaborator

rdstern commented Sep 23, 2024

@lilyclements and @Vitalis95 I was delaying on this dialog. But it has now become urgen, because the merged version of this dialog, doesn't do the ANOVA at all. It gives a bug. The 2-way ANOVA is working on this dialog, so let's merge it as soon as we can.
If @lilyclements has time this week to improve the function, then let's do this too? If so, then here are the 2 items it would be good to add for the 2-variable ANOVA options. Lily may have more?
a) Add a Total checkbox, default unchecked.
b) Change the Means option to Means/Model. Lily, currently it gives the means whatever the x. Here is an example:

image

When x is numeric it should give the intercept and slope.

In this example the means are even more ridiculous if you swap the variables.

c) The categorical by numeric should give summaries. This is fine in the current version, but has gone (wrongly) to yet more ANOVA tables in this pull request.

Here is the running version:

image

If we can get these 2-variable options working well, then either attack the 3 variable stuff quickly as well - ideal, or disable the 3 variables and merge this. Then attack the 3 variable case.

@lilyclements
Copy link
Contributor

@rdstern thanks for this - I've fixed (a) and (b) in this PR #9152

@rdstern
Copy link
Collaborator

rdstern commented Sep 23, 2024

@Vitalis95 @lilyclements has now added trhe option for the Total and changed the Means into Means/Model. That's great.
On layout (you may have better ideas. I wonder about 2 rows of these options now
Checkbox Total instead of Means and then Sig Level as now.
Then the Means Checkbox on the line below, by itself, and now written as Means/Model. I assume that it will give either the means or the model, automatically, depending on whether the x is numeric or a factor.

@Vitalis95
Copy link
Contributor Author

@rdstern about c), you had suggested earlier the the Swarp First/Second variables checkbox be visible, when checked it gives ANOVA, when unchecked it gives summaries that is when we have categorical by numeric

@rdstern
Copy link
Collaborator

rdstern commented Sep 24, 2024

@Vitalis95 that's right. Did I get that wrong in my checking? That would be a relief. So now, does categorical by numeric still give summary tables? And when you swap it gives ANOVA? If so, that's great and please disregard c).

@Vitalis95
Copy link
Contributor Author

@Vitalis95 @lilyclements has now added trhe option for the Total and changed the Means into Means/Model. That's great. On layout (you may have better ideas. I wonder about 2 rows of these options now Checkbox Total instead of Means and then Sig Level as now. Then the Means Checkbox on the line below, by itself, and now written as Means/Model. I assume that it will give either the means or the model, automatically, depending on whether the x is numeric or a factor.

@lilyclements , following this comment, could the total be passed like the sign_level and mean arguments, such that when the Total checkbox is checked it produces the totals

@rdstern
Copy link
Collaborator

rdstern commented Sep 24, 2024

@Vitalis95 that's exactly what I was hoping for with the Total line.

@@ -4524,8 +4524,7 @@ DataSheet$set("public", "has_labels", function(col_names) {
}
)

DataSheet$set("public", "anova_tables2", function(x_col_names, y_col_name, signif.stars = FALSE, sign_level = FALSE, means = FALSE) {

DataSheet$set("public", "anova_tables2", function(x_col_names, y_col_name, total = TRUE, signif.stars = FALSE, sign_level = FALSE, means = FALSE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 apologies. I have now added total as a parameter> I've set it to be TRUE by default but if you want it as FALSE by default then set this line here to be total = FALSE instead of total = TRUE

Make sure you pull changes into your branch to avoid potential conflicts too.

@Vitalis95
Copy link
Contributor Author

@lilyclements , I pulled the changes and now there is an error,

image

@lilyclements
Copy link
Contributor

@Vitalis95 thanks for this - yes, this is a great point. I forgot to update anova_tables2 in the instat_object after updating in the data_object file. This is done here and ready for you to merge into this branch if you are happy with it.

@rdstern
Copy link
Collaborator

rdstern commented Sep 25, 2024

@Vitalis95 and @lilyclements this is looking great!
a) Vitalis I remember Lily mentioned that the default was to give the total, but you could change that, in her function. Pleasec could you do that. Currently it gives the total whatever I do. The default is not to give the total, because that's what R does.
b) Please could you move Sig Level up, so it is after the total. (They are both concerned with the anova table, so they come above the Means/Model.
c) @lilyclements everything is there now, but can the output be improved? I hope that's not too time consuming. Could Francis help if needed? Here is an example:

image

  1. It isn't a Means table of yield. I presume we could say just Model?
  2. Could there be blanks instead of NA?
  3. Could the F probability be formatted better?
  4. And do we need to know it is a tibble and have all the [[1]] and list the type of each variable?

@Vitalis95 Vitalis95 marked this pull request as ready for review September 25, 2024 08:43
@Vitalis95
Copy link
Contributor Author

Vitalis95 commented Sep 25, 2024

@rdstern , let me include the above comments then you will check

@rdstern
Copy link
Collaborator

rdstern commented Sep 25, 2024

@Vitalis95 looks really good now. I'll just wait to see if @lilyclements makes any changes?

lilyclements
lilyclements previously approved these changes Oct 9, 2024
Copy link
Contributor

@lilyclements lilyclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 looks good to me too!

@lilyclements
Copy link
Contributor

@Vitalis95 @rdstern apologies didn't mean to approve. I will make some changes to the R code

Removing lists to remove the square brackets, and adding cat to print
@lilyclements
Copy link
Contributor

@rdstern I made the changes you requested. You can now test it

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyclements you now give the means twice, and not the ANOVA table!
If possible could you, or @Vitalis95 also improve the spacing of the controls, when there are both numeric variables

@lilyclements
Copy link
Contributor

@rdstern so I did, apologies. I've fixed that now (hopefully!). Very sorry for that

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyclements that's looking much better now.
image

a) Great that the F-prob is like that now. Thanks
b) Could you swap the order back, so the ANOVA comes before the means or the model? I see the ANOVA as the basic result, optionally followed by the model or means when needed.
c) Can you get rid of the extra tibble details, and the [[1]], etc? If that's too big to do now, then we could approve with these, and then improve the presentation, when we add the 3-variable stuff in a new pull request?

Copy link
Contributor

@lilyclements lilyclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 I've updated map to be walk for two classes/functions?

Are these classes functions only used for anova_table2 bits?

walk means that it will not print output, so will not give the list output!

@Vitalis95 are you happy with these changes? (Make sure you pull changes too next time you work on this - I've made changes now sorry!)

@rdstern does this now work for you how you would like it?

rdstern
rdstern previously approved these changes Oct 9, 2024
Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyclements I was not sure what has changed. There is still the tibble-like output.

I am approving, because it is now so much better than the vurrent merged one. We will add another pull request for the 3 variable case, and that also gives another opportunity to improve the formatting of this output.

@N-thony over to you to check, and hopefully merge.

Copy link
Collaborator

@N-thony N-thony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 I made a few comments. I'm working now with Chris to have our first 64 bits only version. If you can make the change quickly, we can get this included in this version.

Comment on lines 859 to 874
'ElseIf IsFactorByFactorByNumeric Then
' ucrReorderSummary.Visible = True
' cmdSummaries.Visible = True
' ucrSaveTable.Visible = True
' ucrChkSummariesRowCol.Visible = True
' ucrSaveTable.Location = New Point(23, 440)
' ucrBase.clsRsyntax.SetBaseROperator(clsJoiningPipeOperator)
' ucrSaveTable.SetPrefix("summary_table")
' ucrSaveTable.SetSaveType(RObjectTypeLabel.Table, strRObjectFormat:=RObjectFormat.Html)
' ucrSaveTable.SetAssignToIfUncheckedValue("last_table")
' ucrSaveTable.SetCheckBoxText("Save Table")
' clsJoiningPipeOperator.SetAssignToOutputObject(strRObjectToAssignTo:="last_table",
' strRObjectTypeLabelToAssignTo:=RObjectTypeLabel.Table,
' strRObjectFormatToAssignTo:=RObjectFormat.Html,
' strRDataFrameNameToAddObjectTo:=ucrSelectorDescribeTwoVar.strCurrentDataFrame,
' strObjectName:="last_table")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 661 to 662
'ucrChkDisplayMargins.Visible = IsFactorByFactorByFactor()
'ucrInputMarginName.Visible = ucrChkDisplayMargins.Checked AndAlso IsFactorByFactorByFactor()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 646 to 650
'ElseIf rdoThreeVariable.Checked Then
' ucrChkOmitMissing.Visible = IsFactorByNumericByNumeric() OrElse IsNumericByNumericByFactor()
'Else
' ucrChkOmitMissing.Visible = False
' cmdMissingOptions.Visible = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 1233 to 1239
'If IsNumericByNumericByFactor() Then
' ' If ucrChkSwapXYVar.Checked Then
' ' clsCombineSwapAnova2Table.AddParameter("y", ucrReceiverThreeVariableThirdVariable.GetVariableNames(True), iPosition:=2, bIncludeArgumentName:=False)
' ' Else
' ' clsCombineAnova2Function.AddParameter("y", ucrReceiverThreeVariableThirdVariable.GetVariableNames(True), iPosition:=2, bIncludeArgumentName:=False)

' ' End If
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Vitalis95
Copy link
Contributor Author

@N-thony , have a look at it

Copy link
Collaborator

@N-thony N-thony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdstern over to you

@rdstern rdstern changed the title Implemented Describe > Three Variable Summarise Implemented Updates Describe > Two Variable Summarise Oct 13, 2024
Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony I am approving, as this is an important advance. It will be good for it to be merged.
@Vitalis95 please can you make a new pull request for the Three Variable part. That will be even more exciting. And remind @lilyclements about the R bit, for parallel and full, as well as making the result neater for all the ANOVA.

@N-thony N-thony merged commit 243f37e into IDEMSInternational:master Oct 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants