You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
8. Using `ggplot2`, create a scatter plot of the average sentiment score for each album (y-axis) and the album release data along the x-axis. Make the size of each point the album sales in millions.
9. Add a horizontal line at y-intercept=0.
10. Write 2-3 sentences interpreting the plot answering the question "How has the sentiment of Taylor Swift's albums have changed over time?". Add a title, subtitle, and useful axis labels.
). I've reproduced it here:
For the graph in Part 2E (8-10), although my table shows that the album Lover has sales in the 1-million range, and the graph correctly displays a point of that size, the legend for Album Sales (in millions) only shows sizes from 2 to 7. Even when using scale_size_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7)), the situation does not change. Could you please explain what might be causing this problem?
Exploration and solution
Using the data from project 3, I was able to reproduce the issue described by @manamiueshima. So I played a bit more with the arguments from ggplot2::scale_size_continuous() documented at https://ggplot2.tidyverse.org/reference/scale_size.html. Overall though, the answer is that ggplot2 is not going to show by default all the unique values of aes(size). But in a case like this one where we want to control which values to show, we can do so. We do need to use 2 arguments of ggplot2::scale_size_continuous() as you'll see further below.
Here's the R code with some comments, but you might want to scroll down to see the reprex::reprex() output.
Best,
Leo
set.seed(20241009)
df<-data.frame(
sales= c(1.1, 2.5, 4.3, 7.21),
released= as.Date(c(
"2020-01-01", "2021-07-01", "2023-10-01", "2024-08-01"
)),
sentiment_score= rnorm(4, 2, 4)
)
## Basic plot with automatic range, limits, and breaks for the "sales"df %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0)
## Let's try to specify breaks at the unique (rounded down) sales we do have
sort(floor(df$sales))
# [1] 1 2 4 7df %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(breaks= c(1, 2, 4, 7))
## It didn't work. Let's also try to set the range.
range(df$sales)
# [1] 1.10 7.21df %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(range= range(df$sales), breaks= c(1, 2, 4, 7))
## Or use the limits instead of the rangedf %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(limits= range(df$sales), breaks= c(1, 2, 4, 7))
## It doesn't work yet. But what if we round down the lowest value of "sales"## and then round up the highest value of "sales"?#### Voilá! This worked!df %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(limits= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7))
## Using the above strategy with "range" instead of "limits" doesn't workdf %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(range= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7))
## Here are some other "solutions" though they include code we don't need.#### For example, here the line about "range" is not neededdf %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(
range= range(df$sales),
limits= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7)
)
## Similarly in this case, the code for "range" is also not neededdf %>%
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(
range= c(floor(min(df$sales)), ceiling(max(df$sales))),
limits= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7)
)
## R Session info
options(width=120)
sessioninfo::session_info()
## Let's try to specify breaks at the unique (rounded down) sales we do have
sort(floor(df$sales))
#> [1] 1 2 4 7# [1] 1 2 4 7df|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(breaks= c(1, 2, 4, 7))
## It didn't work. Let's also try to set the range.
range(df$sales)
#> [1] 1.10 7.21# [1] 1.10 7.21df|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(range= range(df$sales), breaks= c(1, 2, 4, 7))
## Or use the limits instead of the rangedf|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(limits= range(df$sales), breaks= c(1, 2, 4, 7))
## It doesn't work yet. But what if we round down the lowest value of "sales"## and then round up the highest value of "sales"?#### Voilá! This worked!df|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(limits= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7))
## Using the above strategy with "range" instead of "limits" doesn't workdf|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(range= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7))
## Here are some other "solutions" though they include code we don't need.#### For example, here the line about "range" is not neededdf|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(
range= range(df$sales),
limits= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7)
)
## Similarly in this case, the code for "range" is also not neededdf|>
ggplot(aes(x=released, y=sentiment_score, size=sales)) +
geom_point() +
ylab("Sentiment") +
geom_hline(yintercept=0) +
scale_size_continuous(
range= c(floor(min(df$sales)), ceiling(max(df$sales))),
limits= c(floor(min(df$sales)), ceiling(max(df$sales))),
breaks= c(1, 2, 4, 7)
)
Question
The original question was posted at https://courseplus.jhu.edu/core/index.cfm/go/bbs:topic.view/bbsTopicID/187436/coid/21836/ by @manamiueshima in relation to Project 3 (specifically
jhustatcomputing/projects/project-3/index.qmd
Lines 212 to 214 in 8cdb278
Exploration and solution
Using the data from project 3, I was able to reproduce the issue described by @manamiueshima. So I played a bit more with the arguments from
ggplot2::scale_size_continuous()
documented at https://ggplot2.tidyverse.org/reference/scale_size.html. Overall though, the answer is thatggplot2
is not going to show by default all the unique values ofaes(size)
. But in a case like this one where we want to control which values to show, we can do so. We do need to use 2 arguments ofggplot2::scale_size_continuous()
as you'll see further below.Here's the R code with some comments, but you might want to scroll down to see the
reprex::reprex()
output.Best,
Leo
reprex::reprex() output
Created on 2024-10-09 with reprex v2.1.1
The text was updated successfully, but these errors were encountered: