Skip to content

Commit

Permalink
Merge pull request #144 from chnm/blog/hernan-text-analysis
Browse files Browse the repository at this point in the history
Updated italics in posts
  • Loading branch information
hepplerj authored Sep 26, 2024
2 parents b4880c6 + 2d09fb0 commit 4cf7ba2
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion bom-website/content/blog/2024-08-28-plague-spikes.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ categories:
- "analysis"
---

Following 1636's outbreak, the plague cast a shadow over London’s life for almost ten years. Data collected from the Bills of Mortality by the Death By Numbers Project suggests that most summers witnessed a plague flare-up between 1638 and 1647. Though in the late 1630s these summer spikes were mild, the occurrence of the plague increased in intensity in the early 1640s up to 1647.[^1] Indeed, each summer during the 1640s, weekly deaths in London consistently reached into the hundreds, peaking at 250 in the years 1646 and 1647.[^2] Moreover, during the consecutive years of 1641-1642 and 1646-1647, the yearly outbreaks adopted a bi-annual cyclical pattern. Londoners endured the threat of the plague year-round, with fewer deaths in winter and a larger number of casualties during late summer. By the 1650s, however, the plague had nearly vanished, only to return forcefully during the notorious Great Plague of 1665-1666.
Following 1636's outbreak, the plague cast a shadow over London’s life for almost ten years. Data collected from the Bills of Mortality by the _Death By Numbers_ Project suggests that most summers witnessed a plague flare-up between 1638 and 1647. Though in the late 1630s these summer spikes were mild, the occurrence of the plague increased in intensity in the early 1640s up to 1647.[^1] Indeed, each summer during the 1640s, weekly deaths in London consistently reached into the hundreds, peaking at 250 in the years 1646 and 1647.[^2] Moreover, during the consecutive years of 1641-1642 and 1646-1647, the yearly outbreaks adopted a bi-annual cyclical pattern. Londoners endured the threat of the plague year-round, with fewer deaths in winter and a larger number of casualties during late summer. By the 1650s, however, the plague had nearly vanished, only to return forcefully during the notorious Great Plague of 1665-1666.

This blog post explores seven of the nine years between 1638 and 1647 by applying time-to-event analysis, in the same way I did in the previous 1636’s outbreak [post](https://deathbynumbers.org/2023/12/04/death-on-two-legs-analyzing-the-initial-20-weeks-of-the-1636-london-plague-outbreak-using-time-to-event-analysis/).[^3] The analysis shows that the spatial pattern from the 1636 outbreak continued, where, despite plague deaths occurring in all parish groups, the spread within London's walled parishes was notably slower. In contrast, the plague death spikes intensified notably in the city's other areas, including the parishes outside the walls, those in Middlesex and Surrey, and the outer parishes of Westminster. Even when plague deaths occurred year-round, such as in years 1642, 1644, 1646 and 1647, the parishes within the walls did not experience severe plague flare-ups, though their situation deteriorated as the decade progressed.

Expand Down
10 changes: 5 additions & 5 deletions bom-website/content/blog/2024-09-18-death-by-words.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@
title: "Death by Words: Textual Geography of Suicides, Drownings and Killings in the Bills of Mortality "
author:
- Hernan Adasme
date: "2024-09-18"
date: "2024-09-26"
tags:
- adasme
categories:
- "analysis"
---

Alongside quantitatively documenting plague outbreaks in Early Modern London, the Bills of Mortality also provide textual descriptions of causes of death. The _Death by Numbers_ project is transcribing and making available to the public not only the plague numbers but also dozens of recorded causes of death found in the verso of the bills, which include accidents, killings, suicides, and drownings. This will eventually create a considerable --although not massive-- corpus of textual data suitable for the application of several text analysis techniques, as a way to automate the extraction of information. For this blog post, I am using three datasets on causes of death compiled by the 'Death by Numbers' transcription team. The first dataset covers the period from 1636 to 1649, the second from 1649 to 1659, and the third from 1659 to 1677. It's important to note that while these datasets cover most years within these ranges, there are some gaps in the data.
Alongside quantitatively documenting plague outbreaks in Early Modern London, the Bills of Mortality also provide textual descriptions of causes of death. The _Death by Numbers_ project is transcribing and making available to the public not only the plague numbers but also dozens of recorded causes of death found in the verso of the bills, which include accidents, killings, suicides, and drownings. This will eventually create a considerable --although not massive-- corpus of textual data suitable for the application of several text analysis techniques, as a way to automate the extraction of information. For this blog post, I am using three datasets on causes of death compiled by the _Death by Numbers_ transcription team. The first dataset covers the period from 1636 to 1649, the second from 1649 to 1659, and the third from 1659 to 1677. It's important to note that while these datasets cover most years within these ranges, there are some gaps in the data.

For the purpose of this blog entry, I will not focus on text as text, but only on text as a container of relevant information for the exploration of drownings as a historical phenomenon. I will use basic text analysis tools to retrieve the locations from descriptions of drownings, killings, and suicides, as opposed to manually counting the locations. With this data, I will create maps to spatially represent the counts. I won’t analyze text to discover hidden patterns or underlying meanings of a corpus of texts — although I am planning to try that in a future post. Eventually, the techniques explored in this blog post will be a good fit to extract other data recorded as unstructured text in the London Bills of Mortality

## Methodology and Workflow

The transcription process in Death by Numbers using DataScribe generates a dataset where each record of a weekly bill occupies a row, and the causes of death are represented as columns. Descriptions of drownings, killings, suicides, people found dead, and accidents are recorded as unstructured text, typically including a count, the location, and occasionally a brief account of the incident. For instance, drowning's descriptive text highlights the location (_at Christ Church in Surry; at the London Bridge; at the River Lee),_ a brief characterization of circumstances (_accidentally; by misfortune; in a ditch; in a tub of soap suds_), and occasionally some data about the drowned person (_an unknown man; two brothers; a boy_). To both capture and provide structure to the text, our transcription team fills out two fields in the DataScribe transcription form: one for the count and one for the text.
The transcription process in _Death by Numbers_ using DataScribe generates a dataset where each record of a weekly bill occupies a row, and the causes of death are represented as columns. Descriptions of drownings, killings, suicides, people found dead, and accidents are recorded as unstructured text, typically including a count, the location, and occasionally a brief account of the incident. For instance, drowning's descriptive text highlights the location (_at Christ Church in Surry; at the London Bridge; at the River Lee),_ a brief characterization of circumstances (_accidentally; by misfortune; in a ditch; in a tub of soap suds_), and occasionally some data about the drowned person (_an unknown man; two brothers; a boy_). To both capture and provide structure to the text, our transcription team fills out two fields in the DataScribe transcription form: one for the count and one for the text.

{{< figure src="/images/adasme-fig_1-words.png" caption="Fig 1. View of the transcription form fields designed to capture the text about drowning deaths." alt="Fig 1. View of the transcription form fields designed to capture the text about drowning deaths." >}}

Expand All @@ -25,7 +25,7 @@ Observing the text reveals that locations typically follow the preposition "at."

## Mapping Killings, Suicides and Drownings

As mentioned before, The Death by Numbers Project builts data sets with textual descriptions of killings. However, it is not exactly a copy and paste process but an interpretive procedure. We built an aggregate category in which we include any death involving human agency, such as murder, shooting, stabbing, or being run over by a cart, etc. These deaths show up in the Bills in different parts of the causes listed. The total killings for the three data sets are 126, 114, 228, the last number being larger due to a lengthier set of weeks. The parishes with the highest number of human-caused deaths across the three datasets were St. Giles in the Fields, St. Mary Whitechapel, St. Martin in the Fields, St. Giles Cripplegate, St. Sepulchre's Parish, and to the south of the Thames River, St. Saviour's Southwark. Among the 97 parishes within the Walls of London, those located along the Thames had the highest number of killings, with Allhallows Great, Allhallows Less, St. Mary Somerset, St. Magnus Parish, and St. Dunstan East being the most frequently mentioned locations.
As mentioned before, The _Death by Numbers Project_ builds data sets with textual descriptions of killings. However, it is not exactly a copy and paste process but an interpretive procedure. We built an aggregate category in which we include any death involving human agency, such as murder, shooting, stabbing, or being run over by a cart, etc. These deaths show up in the Bills in different parts of the causes listed. The total killings for the three data sets are 126, 114, 228, the last number being larger due to a lengthier set of weeks. The parishes with the highest number of human-caused deaths across the three datasets were St. Giles in the Fields, St. Mary Whitechapel, St. Martin in the Fields, St. Giles Cripplegate, St. Sepulchre's Parish, and to the south of the Thames River, St. Saviour's Southwark. Among the 97 parishes within the Walls of London, those located along the Thames had the highest number of killings, with Allhallows Great, Allhallows Less, St. Mary Somerset, St. Magnus Parish, and St. Dunstan East being the most frequently mentioned locations.

{{< figure src="/images/adasme-fig_2-words.png" caption="Fig 2. Spatial Representation of Killings in each of the three data sets analyzed." alt="Fig 2. Spatial Representation of Killings in each of the three data sets analyzed." >}}

Expand All @@ -39,6 +39,6 @@ Drowning deaths are both more concentrated and more frequent than suicides and k

## Conclusion

The Bills of Mortality not only provide quantitative records of plague outbreaks but also contain valuable textual descriptions of various causes of death. The 'Death by Numbers' project is making these records accessible, including accidents, killings, suicides, and drownings, creating a dataset ripe for future text analysis. For this blog post, I focused on using basic text analysis tools to extract locations from these descriptions, specifically in relation to drownings, killings, and suicides, and visualized the data spatially. However, a flaw in the analysis is that this post does not yet incorporate demographic data to normalize these raw numbers obtained with text analysis techniques. Additionally, it would be valuable to corroborate the text extraction process with a manual review of the data to ensure accuracy, though the whole idea in this post is to skip the time-consuming and error-prone task of counting the mentioned one by one. While this post centers on location extraction, there remains great potential for deeper analysis of the textual descriptions that were left outside in this analysis: found dead, starved, executed, and also accidental deaths, which I intend to explore in future posts. The techniques applied here will continue to prove useful in extracting structured information from the rich, unstructured text found in the London Bills of Mortality.
The Bills of Mortality not only provide quantitative records of plague outbreaks but also contain valuable textual descriptions of various causes of death. The _Death by Numbers_ project is making these records accessible, including accidents, killings, suicides, and drownings, creating a dataset ripe for future text analysis. For this blog post, I focused on using basic text analysis tools to extract locations from these descriptions, specifically in relation to drownings, killings, and suicides, and visualized the data spatially. However, a flaw in the analysis is that this post does not yet incorporate demographic data to normalize these raw numbers obtained with text analysis techniques. Additionally, it would be valuable to corroborate the text extraction process with a manual review of the data to ensure accuracy, though the whole idea in this post is to skip the time-consuming and error-prone task of counting the mentioned one by one. While this post centers on location extraction, there remains great potential for deeper analysis of the textual descriptions that were left outside in this analysis: found dead, starved, executed, and also accidental deaths, which I intend to explore in future posts. The techniques applied here will continue to prove useful in extracting structured information from the rich, unstructured text found in the London Bills of Mortality.

[^1]: For an introductory overview of how St. Katharine's Tower became home to the new docks in the early 1800s, see <https://www.thehistoryoflondon.co.uk/the-new-london-docks-of-the-early-19th-century/>.

0 comments on commit 4cf7ba2

Please sign in to comment.