Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added geom_col and added color parameter to geom_smooth #5069

Merged
merged 6 commits into from
Oct 29, 2018

Conversation

Ogik99
Copy link
Contributor

@Ogik99 Ogik99 commented Oct 23, 2018

fixes #5067 and fixes #4895
@africanmathsinitiative/developers kindly review

@@ -1476,4 +1504,8 @@ Public Class ucrGeom
Public Function GetGeomName() As String
Return ucrInputGeoms.GetText()
End Function

Private Sub lblTypeofLayer_Click(sender As Object, e As EventArgs) Handles lblTypeofLayer.Click
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been added and can be removed.

@dannyparsons dannyparsons changed the title Added color layer parameter to geom_smooth. Added geom_col and added color parameter to geom_smooth Oct 23, 2018
@dannyparsons
Copy link
Contributor

@Ogik99 Thanks for this. I have commented on a small change that I think was made accidentally and can be removed. It's good to always check through your changes in VS before committing to see if there are any unwanted changes to undo.

I also changed the title of your pull request, it only mentioned geom_smooth and not the addition of geom_col. It's important the title reflects the changes so that reviewers know what has been done.

I slightly modified your comment so that "fixes" appears twice. It needs to be there for each issue to be linked automatically to it. You can tell if this has happened because fixes will have .... under it to show which issue it will close.

It will be good for someone to review this now if it is ready.

@maxwellfundi
Copy link
Contributor

Will be reviewing this afternoon.

clsgeom_col.AddLayerParameter("size", "numeric", "0.5", lstParameterStrings:={1, 0}) ''Varies the size of outline. Note: negative size gives size 0 in general, but 'Warning: sometimesgive errors...

lstAllGeoms.Add(clsgeom_col)

End Sub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ogik99 This needs to be moved to the correct alphabetical position, else it will always be the last on the list and we do not want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geom_col is now correctly placed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this looks good.

@maxwellfundi
Copy link
Contributor

This looks good and working well. @Ogik99 just move the geom_col code to the correct position and will be ready to be merged to master.


clsgeom_col.SetGeomName("geom_col")
'Mandatory Aesthetics
clsgeom_col.AddAesParameter("x", strIncludedDataTypes:=({"numeric", "factor"}), bIsMandatory:=True)
Copy link
Contributor

@maxwellfundi maxwellfundi Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought this was just factors and y numerics?

@maxwellfundi
Copy link
Contributor

@dannyparsons This looks good now!

clsgeom_col.SetGeomName("geom_col")
'Mandatory Aesthetics
clsgeom_col.AddAesParameter("x", strIncludedDataTypes:=({"factor"}), bIsMandatory:=True)
clsgeom_col.AddAesParameter("y", strIncludedDataTypes:=({"numeric"}), bIsMandatory:=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried a modified example from the geom_col help with x as numeric and y as factor and it worked. So this doesn't look correct yet.

df <- data.frame(trt = c("a", "b", "c"), outcome = c(2.3, 1.9, 3.2))
ggplot(df, aes(x = outcome, y = trt)) +
    geom_col()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried testing using dodoma data and for the x variable for example, both character and numeric data types are acceptable. The y variable only accepts numeric data types. This conforms to the above code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My example shows that the y variable can be a factor.

Copy link
Contributor Author

@Ogik99 Ogik99 Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, however it does not give meaningful results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken your comment as a challenge! Here's an example which I claim is meaningful. A data frame with the ages of 10 objects and their quality from low, medium, high. To compare age and quality I have produced a bar chart with age (numeric) on the x axis and quality (factor) on the y axis. This looks perfectly meaningful and sensible to me.

df <- data.frame(age = c(1:10), quality = factor(sample(c("low", "medium", "high"), replace = TRUE, size = 10), levels = c("low", "medium", "high")))
ggplot(df, aes(x = age, y = quality)) + geom_col() + scale_x_continuous(breaks = 1:10)

image

Even if we couldn't think of an example which we thought was "meaningful" or even sensible, what we are trying to do when we implement geoms from ggplot2 is to allow them to be used as flexibly as possible (like in R) but without causing problems. So, if having x as numeric meant that ggplot2 gave an error then we would prevent x being numeric. However, if we only thought that it wasn't sensible to have as numeric then we would still allow it for two reasons:

  1. We want the general graphics system to be as flexible and functional as possible so we only prevent users doing something if it causes a problem (like an error) even if we don't think doing that is sensible.
  2. We might think that something isn't sensible and no one should do, but we can't possibly know what the many, many different uses of this that there might be and there may be an application which is rare, but would be perfectly sensible. It would then be very annoying if this wasn't allowed by us.

So for the general system of using geoms we are making it as flexible as possible, we don't need to consider what might or might not be sensible. The basic rule is that if ggplot2 allow it then we try to allow it.

For main dialogs it might be different as our main aim is to make them easy to use, and therefore we might restrict functionality to make it easier to get something "sensible", but we would often try to keep the flexibility through the (more advanced) sub dialogs.

Copy link
Contributor Author

@Ogik99 Ogik99 Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that a factor variable can be a character string or a numeric value. In the above example, age is a numeric value but its actually a factor (a categorical variable). So in this case having x (age) as a numeric value is okay but in essence it is actually a factor (I stand to be corrected).

I agree that different users may have different needs and what is sensible to one person may not be sensible to another. I am working on removing the restrictions on the data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points to bring up and there are lots of different ideas in here that are mixed up together. Let's be clear on variable types. When you define the aesthetics for a geom you are setting the types of variables that are valid for each parameter. These are the R variable types. Each variable has a type. It can be factor, numeric or character (or others) but it can't be more than one of those 3 types at the same time, so you're not quite correct to say that a factor can be a character string or numeric value. A factor is a factor only.

A factor could be created from a character or numeric column but then it is no longer numeric or character, it's factor. R will only recognise it as factor even if it originally came from a numeric. For example, in R these two columns are exactly the same type:

x <- factor(c(1, 2, 3))
y <- factor(c("a", "b", "c"))

To R, x and y are the same type of variable: factor. You can check this by doing class(x) or class(y). Even though x was made from a numeric vector and y from a character vector that is unknown to R now.

You are right that in my example I was treating x as a categorical variable because there is one bar for each distinct value of x. And there would be lots of numeric columns where treating them as categorical wouldn't make sense. However, categorical is not a variable type in R. What we've discovered is that geom_col will treat the x aesthetic as categories even if it isn't a factor. So if I want to use age on the x axis I don't need to convert it to factor. So in essence age it is being treated as categorical for this graph but it is definitely not a factor.

Categorical and factor are very similar terms but in R factor is a technical term for a type of variable it stores. Any variable is either a factor or not which can be checked with class(). Anything can be made into a factor, even if its not sensible to do so. Whereas categorical is a statistical term. You could say that age is categorical even though its stored in R as numeric. There is no way in R to say that a variable is categorical without converting it to a factor. So when we restrict something to factors only we need to remember that this will exclude any non factor columns, even if they are thought of as "categorical", like the age.

For implementing geoms its much easier because we use the simple rule that if ggplot2 allow it, then we try to allow it. For geoms, we want maximum functionality and so we don't need to consider what is sensible and what is not (which is much harder to decide).

I'm not sure if that will clear things up or make things more confusing, but this was a good point so please keep raising these when you have thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely clears a lot of things. Thank you Danny.

@Ogik99
Copy link
Contributor Author

Ogik99 commented Oct 29, 2018

@dannyparsons kindly review

@dannyparsons dannyparsons merged commit 1a7f6f2 into IDEMSInternational:master Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing parameter(s?) from geom_smooth Add geom_col to the set of geoms
3 participants