Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear provenance of crimea.json dataset ('Nightingale's Rose') #594

Closed
dsmedia opened this issue Aug 15, 2024 · 10 comments · Fixed by #648
Closed

Unclear provenance of crimea.json dataset ('Nightingale's Rose') #594

dsmedia opened this issue Aug 15, 2024 · 10 comments · Fixed by #648

Comments

@dsmedia
Copy link
Collaborator

dsmedia commented Aug 15, 2024

The information carried by Florence Nightingale's Rose Diagram isn't just a dataset - it's a pivotal moment in the history of data visualization and public health. Given the historical nature of this dataset, it's important that we represent it well. I've noticed some discrepancies that I believe we should address:

Current Situation

  1. Our dataset doesn't seem to match Nightingale's original published data. Nor does it match a Protovis example by Mike Bostock. This makes it challenging to update SOURCES.md.
  2. Bostock's visualization methods to reproduce the chart have been critiqued as not accurately reflecting Nightingale's original technique. This creates an opportunity to properly construct an example using vega or vega-lite.

Details

The dataset in crimea.json appears to be derived from or inspired by a famous polar area diagram from Florence Nightingale's "A contribution to the sanitary history of the British army during the late war with Russia", which was later featured in this Protovis example by Mike Bostock.

As noted by @kgryte:

Bostock's implementation, while visually similar to Nightingale's visualization, is wrong. First, the data is not correct. You can verify this in Nightingale's original work. Second, Bostock directly maps the wedge radius to deaths. This mistake is common. Instead, Nightingale represents deaths in terms of area, thus requiring the radius for each wedge to be calculated (for more information, see Understanding Uncertainty's The Mathematics of Coxcombs). This discrepancy would be apparent if one displayed polar axes and allowed reading of radial values.

The crimea.json hosted here does not match either Bostock's dataset or the Nightingale table that appears to sit behind her original polar area diagram.

Questions

  1. Can we trace the exact provenance of our current crimea.json dataset?
  2. Does crimea.json generate a similarly shaped diagram as the Bostock version? Perhaps it was just constructed to create a similar effect?
  3. Should a Vega or Vega Lite example be created that addresses @Kryte's critique?
  4. Should a provisional description be added to the README.md file that notes this apparent discrepancy, or should we wait until this is resolved?
  5. Should crimea.json eventually be modified to capture the actual data referenced in the original Nightingale source, for the sake of accuracy? It seems possible that at the time this dataset was uploaded, the full text of the Nightingale paper was difficult to access, while today the full text including the dataset is easily retrieved online.
@domoritz
Copy link
Member

@jheer @arvind do you have pointers here?

@arvind
Copy link
Member

arvind commented Aug 15, 2024

It looks like crimea.json dates back to the very first commit to vega-lite from @jheer :) So perhaps he knows?

@dsmedia
Copy link
Collaborator Author

dsmedia commented Dec 15, 2024

Having rolled out our shiny new datapackage.md I thought it might be a good time to bump this to see there may be any collective memory on the origin of crimea.json? Any help would be appreciated from @jheer or others.

The crimea.json hosted here does not match either Bostock's dataset or the Nightingale table that appears to sit behind her original polar area diagram.

@dangotbanned
Copy link
Member

dangotbanned commented Dec 16, 2024

#594 (comment)

@dsmedia
I haven't got an answer for you, but wanted to highlight some points that I've only picked up on a second look at this issue.

Usage

From a quick code search and viewing the example galleries for Vega, Vega-Lite and Vega-Altair - I did not find a single use of this dataset.

Differences

I've added a diff here, after applying some formatting to both crimea.json and what the equivalent would be for protovis/ex/crimea.js.

Current vs protovis

diff --git a/data/crimea.json b/data/protovis_crimea.json
index 04e1641..321182f 100644
--- a/data/crimea.json
+++ b/data/protovis_crimea.json
@@ -2,145 +2,145 @@
   {
     "date": "1854-04-01",
     "wounds": 0,
-    "other": 110,
-    "disease": 110
+    "other": 5,
+    "disease": 1
   },
   {
     "date": "1854-05-01",
     "wounds": 0,
-    "other": 95,
-    "disease": 105
+    "other": 9,
+    "disease": 12
   },
   {
     "date": "1854-06-01",
     "wounds": 0,
-    "other": 40,
-    "disease": 95
+    "other": 6,
+    "disease": 11
   },
   {
     "date": "1854-07-01",
     "wounds": 0,
-    "other": 140,
-    "disease": 520
+    "other": 23,
+    "disease": 359
   },
   {
     "date": "1854-08-01",
-    "wounds": 20,
-    "other": 150,
-    "disease": 800
+    "wounds": 1,
+    "other": 30,
+    "disease": 828
   },
   {
     "date": "1854-09-01",
-    "wounds": 220,
-    "other": 230,
-    "disease": 740
+    "wounds": 81,
+    "other": 70,
+    "disease": 788
   },
   {
     "date": "1854-10-01",
-    "wounds": 305,
-    "other": 310,
-    "disease": 600
+    "wounds": 132,
+    "other": 128,
+    "disease": 503
   },
   {
     "date": "1854-11-01",
-    "wounds": 480,
-    "other": 290,
-    "disease": 820
+    "wounds": 287,
+    "other": 106,
+    "disease": 844
   },
   {
     "date": "1854-12-01",
-    "wounds": 295,
-    "other": 310,
-    "disease": 1100
+    "wounds": 114,
+    "other": 131,
+    "disease": 1725
   },
   {
     "date": "1855-01-01",
-    "wounds": 230,
-    "other": 460,
-    "disease": 1440
+    "wounds": 83,
+    "other": 324,
+    "disease": 2761
   },
   {
     "date": "1855-02-01",
-    "wounds": 180,
-    "other": 520,
-    "disease": 1270
+    "wounds": 42,
+    "other": 361,
+    "disease": 2120
   },
   {
     "date": "1855-03-01",
-    "wounds": 155,
-    "other": 350,
-    "disease": 935
+    "wounds": 32,
+    "other": 172,
+    "disease": 1205
   },
   {
     "date": "1855-04-01",
-    "wounds": 195,
-    "other": 195,
-    "disease": 560
+    "wounds": 48,
+    "other": 57,
+    "disease": 477
   },
   {
     "date": "1855-05-01",
-    "wounds": 180,
-    "other": 155,
-    "disease": 550
+    "wounds": 49,
+    "other": 37,
+    "disease": 508
   },
   {
     "date": "1855-06-01",
-    "wounds": 330,
-    "other": 130,
-    "disease": 650
+    "wounds": 209,
+    "other": 31,
+    "disease": 802
   },
   {
     "date": "1855-07-01",
-    "wounds": 260,
-    "other": 130,
-    "disease": 430
+    "wounds": 134,
+    "other": 33,
+    "disease": 382
   },
   {
     "date": "1855-08-01",
-    "wounds": 290,
-    "other": 110,
-    "disease": 490
+    "wounds": 164,
+    "other": 25,
+    "disease": 483
   },
   {
     "date": "1855-09-01",
-    "wounds": 355,
-    "other": 100,
-    "disease": 290
+    "wounds": 276,
+    "other": 20,
+    "disease": 189
   },
   {
     "date": "1855-10-01",
-    "wounds": 135,
-    "other": 95,
-    "disease": 245
+    "wounds": 53,
+    "other": 18,
+    "disease": 128
   },
   {
     "date": "1855-11-01",
-    "wounds": 100,
-    "other": 140,
-    "disease": 325
+    "wounds": 33,
+    "other": 32,
+    "disease": 178
   },
   {
     "date": "1855-12-01",
-    "wounds": 40,
-    "other": 120,
-    "disease": 215
+    "wounds": 18,
+    "other": 28,
+    "disease": 91
   },
   {
     "date": "1856-01-01",
-    "wounds": 0,
-    "other": 160,
-    "disease": 160
+    "wounds": 2,
+    "other": 48,
+    "disease": 42
   },
   {
     "date": "1856-02-01",
     "wounds": 0,
-    "other": 100,
-    "disease": 100
+    "other": 19,
+    "disease": 24
   },
   {
     "date": "1856-03-01",
     "wounds": 0,
-    "other": 125,
-    "disease": 90
+    "other": 35,
+    "disease": 15
   }
]

Summary

For this comparison, I'm considering only the fields: "wounds", "other", "disease":

  • Only 6/34 records have a single correct value
    • All of these are "wounds": 0
  • 34/34 records have incorrect values for "other", "disease"
    • I cannot see any relation between the figures
    • Most of them are wildly different (110 -> 5, 110 -> 1)

Suggestion

It seems to me that fully replacing crimea.json would not impact any projects in https://github.com/vega.
I can't see how one could reproduce the original chart with the current data

Note

If anyone can find usage of crimea.json that depends on the current values; then I'd have less of a bold stance.

@domoritz
Copy link
Member

Would be cool to make the example in Vega eventually.

https://github.com/stdlib-js/datasets-nightingales-rose/blob/main/data/data.json also uses the protons data it seems.

@dangotbanned
Copy link
Member

Would be cool to make the example in Vega eventually.

stdlib-js/datasets-nightingales-rose@main/data/data.json also uses the protons data it seems.

Interesting find @domoritz!

That repo has a datapackage.json, and lists the source as https://curiosity.lib.harvard.edu/contagion/catalog/36-990101646750203941

Sadly that seems to be less precise than the link @dsmedia provided (https://iiif.lib.harvard.edu/manifests/view/drs:7420433$21i)

@kgryte
Copy link

kgryte commented Dec 16, 2024

@dangotbanned Not sure I follow. What is the issue with what stdlib lists as the source in its datapackage.json? If there is an issue, feel free to submit a PR against stdlib.

@dangotbanned
Copy link
Member

@dangotbanned Not sure I follow. What is the issue with what stdlib lists as the source in its datapackage.json? If there is an issue, feel free to submit a PR against stdlib.

Hey @kgryte, so I'm not sure I'd call it an issue - but to clarify what I meant in #594 (comment)

The stdlib source is listed as (https://curiosity.lib.harvard.edu/contagion/catalog/36-990101646750203941), which lands here:

Screenshot

Image

The other link I mentioned (https://iiif.lib.harvard.edu/manifests/view/drs:7420433$21i), lands here:

Screenshot

Image

Both work (and either would be better than what we have right now 😉) but the latter goes directly to the table

@dsmedia
Copy link
Collaborator Author

dsmedia commented Dec 17, 2024

Quick summary/synthesis. Pardon the morbid theme but it's just the data...

  1. The Bostock and stdlib versions match 👍 each other for all values of disease, wounds, and other deaths, and these column values also match what is printed in the Nightingale book.

  2. There appears to be only one discrepancy between Bostock and stdlib, and that is for army size (called total in the Bostock version) for July 1854. Using the original book as the accurate source, the Bostock version is incorrect (28772) while the stdlib version is accurate (28722). The vega-datasets version of crimea.json does not include the column for total army size.

  3. For the columns disease, wounds and other, the vega-datasets crimea.json 👀 does not tie out with other sources, and also lacks the total army size column present in the others. Without a clear justification to maintain it as is, we should probably revise it. This will allow us to introduce a vega example linked to this repo per @domoritz's suggestion. We should consider whether/when to introduce an additional column for army size (to match stdlib) into this repo's version.

A quick clarification question for @kgryte. As noted in the initial issue, a long while back you expressed concern about the Bostock dataset for this example. Aside from the visualization, were there other discrepancies to note here?

As noted by @kgryte:

Bostock's implementation...is wrong. First, the data is not correct. You can verify this in Nightingale's original work.

@domoritz
Copy link
Member

Happy to update this dataset. Sounds like the Stdlib one is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants