-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider normalizing by total population of the country or region #30
Comments
Support this. Also, perhaps by country size (land area) as a rough measure of distance between population. I would be happy to help get the data for the above. Good work on this. Will be useful for future outbreaks. |
The number of new cases is already being divided by a population - that of those infected. The total population of a country only controls the maximum infections that could occur, not the rate of infection. This is already evident by the graph; Countries follow the same trajectory regardless of vast differences in population. |
Well, the current scaling of new cases / total cases normalizes the angle of the curve, but not the overall scale. So if you compare the curves of China and South Korea for example, the current scaling makes it look like the South Korean curve dipped down "sooner" (yes, I realize the X-axis is not time) than Chinas, after about 6k cases vs 74k. In terms of comparing the impact of containment measures on the curve shape this is misleading though, because China has about 27 times the population of South Korea. If you instead look at it as (new cases per 1M population) / (total cases per 1M population), the curve for China dips at about 53 cases/1M compared to 117 cases/1M for South Korea. Essentially the ask is to have the option of having the X-axis represent the proportion of the overall population that is infected, while retaining ability to compare trajectories. |
Put another way, I think an X axis of absolute case numbers is meaningful as long as cases are essentially localized -- if there's a handful of cases, it doesn't matter how large the uninfected population surrounding those cases is. But once you get to a point where infections are essentially everywhere within a country, the numbers start to scale with the size of the overall population, and looking at it as cases per population starts making more sense. |
I support this request as a toggable option. The main reason for normalizing by total population is that it gives you an estimate of the real impact on the country. A million cases in a country with 2 million people is not the same impact than a million cases in a country of 300 millions. I think it is an cheap transformation, doesn't need any extra data and would give a better way to compare apples to apples. |
+1 |
I'm not yet convinced that we should add per-capita numbers, although I am hearing this requested a lot (especially in my email & mentions). Overall I'm inclined to agree with John Burn-Murdoch on this. Laying out my thinking on this for now. As others have pointed out, essentially this would change the ordering (but not the shape) of the graph so that smaller-population countries would shift further up the graph and larger-population countries would move further down. Some of my concerns:
Some alternate workarounds:
There are clearly situations when it makes sense to ask per-capita questions, but I'm not yet convinced this is one of them. As of now, I'm not sure the benefits of this view outweigh the costs. I'm open to changing my thinking on this down the road. I'm also totally open to anyone who wants to fork this repo and create their own parallel one with per capita statistics. |
@aatishb First of all, thanks for the great project and your dedication to continuing work on it! I feel that there is no one graph that can show everything. Each approach is a different point of view, a different piece of the puzzle. Absolute numbers, per capita, per country area, and per test are all showing different sides of the same story. That's why I'd love to see all those options as switches – perhaps with additional info on which view accents which issue. I know there are a lot of these graphs out there, but what's unique to this project (as far as I know), is not putting time on an axis, which gives a unique view – and I believe, adding those options would only deepen that insight. On the other hand, these are just my opinions, I'm no mathematician, so I might be wrong and in the end, I will value your opinion over my own. Thanks again! |
@aatishb Thanks for the thoughtful response, those are good points. I guess personally I've been looking at this from the point of view of a small country (I live in New Zealand), and am wondering if we're really doing well and if interventions actually happened "early", or if we just have comparatively low numbers because we're a small country. The lack of correlation between deaths and population you linked is interesting. Intuitively it seems like there are a number of factors that should scale with population size though -- e.g. the number of "imported" cases should correlate with the number of people travelling abroad, which should be broadly proportional to total population (of course there are other factors too). |
Thank you for reminding us that phase-space plots are useful, we don't have to graph versus Date. TL;DR
If the goal isn't to answer the question as to whether ROK turned the corner "better" than PRC, but "have we turned yet," this construction is best I've seen yet. For those who want to see a population-scaled version of a phase-space plot, the other phase-space plot (that publicly launched the same day!) over at NYT Upshot, which instead of the classic (I haven't seen Their choice of Growth Rate is interesting but perhaps problematic for communications. The subtle differences among 7%, 9%, 12%, 15%, 19%, 26% daily Growth Rates are the differences between doubling in 10, 8, 6, 5, 4, 3 days (respectively), rather less subtle and important. The latter is a figure of merit that doesn't need a logarithmic scale to see the important distinctions; if we want either a non-time-series plot or even a time-series plot that doesn't need that problematically delightful log scale that confounds the innumerate, Doubling Days shouts the exponential nature of the epidemic (with neither confusing subtle steep tangents w/o log scale nor confusing log scales with). And who wants to plot Percentage on semi-log? (Which opens up another can of worms!) I am happily using BOTH your and their phase space charts to track my state's counties reports. Yours (X vs X') makes the Weekend Effect re new cases reported on Sunday much more obvious (in Mass. counties data) than it is on the their Growth % vs Pop affected ‰ design. A sharp dip and Monday reverse, looks like a head-fake towards safety. While I have worries about data completeness, your lovely web demo reassures that this X vs X' phase-space diagram finds the signal in even the presumably incomplete data from China and is clearly signalling turn achieved in Spain and Italy now, good! Getting a signal out of noisy data is effectively statistical power. That's good. Thank you again. (And Henry too for letting us know.) |
I agree that total numbers is important, should be primary, and should remain the default. As MinutePhysics expertly explains, your chart was designed to show a specific thing really well — right now, which countries are still on the exponential growth line, and which have dropped off? That is something I as a layman can intuitively understand, and your chart does a great job of highlighting it. As I look at the countries climbing up and to the right, knowing they're vastly different population sizes, I have a new, different question — right now, how "severe" or "saturated" is each country? That's why I disagree with you about whether per-capita is too abstract. I think it would give us a clear view into a question that seems intuitive and self-evident when I look at the chart. Here's something that might be revealed by looking at the data this way. We know that confirmed / reported cases are a fraction of total infections, but we don't know how small a fraction. Perhaps very small. If we start to see a trend in several countries where they seem to gradually taper off once they reach a certain per-capita death rate, that could tell us something we might not know otherwise. And even if not, at minimum it will counter the opinion that "most people are already infected." If true, we'd see it start to taper off. But, nope, still on the rise. So I'd like to see it as an option. Or, your proposal of listing large geographical regions I think would also work well. |
Thx a lot for providing these graphics, I think they allow a unique way to look at the Development. Two suggestions
Again thx for your work kindest regards Jo |
Agree with gruenix above. Both options would be helpful in visualizing what is happening.
All of this of course considers whether on not the data is good, and that might be a stretch, especially with regard to China. Thanks for your work! |
Hello, this is a really nice graph animation, thanks! But I do think an option to show cases per capita (or per million people) would be very instructive. For example, I'm in New Zealand and there's a lot of discussion about whether we're doing better than Australia, who have a less severe lockdown policy. If we could view this on a per capita basis I think it would show that both countries are doing very similarly, but as it stands you can't really tell because Australia has 5x as many people. |
Hi we’ve all tried it but he has his reasons.... unfortunately I’m not able to fork and adapt it, as it would be reasonably easy to get the number of inhabitants together.... my programming skills don’t even deserve the name and mainly endet with my commodore C20 back then :-). And these days all I do is script Filemaker or so.... |
@aatishb I don't understand your concern that per-capita (or per 100 000) over-emphasizes data from small countries over big ones'. I think it's actually the opposite. You over emphasize data from big countries when you don't. Borders are totally subjective. You compare apples to oranges when choosing to compare absolute values of USA with Belgium, but also when comparing absolute values from N-Y with Europe. It's only when you divide per capita that you can compare objectively. (when comparing pace instead of values, per-capita is useless, of course) |
Agreed ! US and Belgium might be obvious for everyone, it’s worse when comparing eg Germany to France or UK where a smaller difference in population may not be so obvious. But it still seriously distorts the picture, even I have to look up absolute population for some of our neighbor (and my own} country and need to keep reminding myself and im not sure about the pace, even the „case density“ changing over time would be interesting |
Thanks to all for this and to @jwosty for the per capita addition |
From the source below. Seems like an issue with the underlying data. COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
|
DIVOC-91 simply provides the option to view the data absolute or population normalized. BTW, the graph is the best idea ever. |
@jwosty: Thanks for the per-capita view! I think there's an error in the scale, though. For example, on 7 May New Zealand has 1,490 confirmed cases and 11 weekly cases. On the per-capita graph this is shown as "Total Confirmed Cases per 100,000: 30,848.548452552597". Apart from the excessive number of decimal places :-) the number isn't correct. NZ has a population of about 4.8m, so the number of cases per 100,000 is 1490/4800000*100000 = 31. In other words, the graph is showing the number of cases per 100 million people, not per 100,000. PS it's only in the US that "5/7/20" means 7 May; everywhere else in the world it means 5 July! |
https://github.com/mm0hgw/electoral-analysis/blob/dev/epidemic/out/charts/New.vs.Active.2.png |
Thanks very much to aatish. This is my go-to source for understanding what's going on. Please could we have an EU28 button to select all the countries in the European Union plus the UK. It would save me a lot of selecting tickboxes! |
@wrhite that would be a different issue to group all european countries for auto selection. the current issue is about normalizing the infection count in regards of the total population. i am not sure if a request of auto selection was brought up before. best search the issues and if not already there file a separate feature request. |
Is this one actually working? Would be nice to change the axis labels to what they really are rather than absolute values. Also when I hover it shows things like 113,000 new cases per week per 100,000 people (!). Unless whole population is getting it plus some getting it twice in same week. http://raw.githack.com/jwosty/covidtrends/per-capita/index.html |
While the log vs log plot is great for comparing the shape of the trajectory of individual countries, wouldn't it make sense to be able to normalize by the total population, so it becomes possible to compare between countries more accurately?
The text was updated successfully, but these errors were encountered: