-
Notifications
You must be signed in to change notification settings - Fork 7
Named Entity Recognition: SNAP and Recogito (February 24)
February 24, 2016: 17h00-18h15 CET
Gabriel Bodard (ICS London) and Chiara Palladino (University of Bari and Leipzig)
Aim of the session
This session aims to provide a framework about manual techniques of Named Entity Recognition, focusing on two particular categories peculiar: personal names and place-names. The lesson will especially focus on manual annotation on ancient sources, presenting two in-development interfaces created for this purpose, SNAP (for personal names) and Recogito (for place-names). The teachers will introduce the theoretical framework of these projects within the larger contexts of prosopography and geography, then a practical introduction to Recogito will be presented: it will be illustrated how to annotate place-names on ancient texts and maps from the Pelagios database (either in English or in other languages); then, the process of geotagging will be presented and analyzed in depth, to show how to associate a simple annotated name with a place on the map.
In the final part of the session, some advanced applications with the annotated data will be presented: it will be shown how geotagged texts can be manipulated with some basic features to provide a better understanding of their spatial concept and model.
Outline of the class
- Introduction to NER (15 min)
- The Pelagios project (5 min)
- The SNAP:DRGN project (10 min)
- Pelagios and Recogito practical session (20 min)
- What can you do with geotagged data? (15 min)
- Exercise (10 min)
Required reading
- Mark Depauw and Bart Van Beek (2009), “People in Greek Documentary Papyri. First Results of a Research Project.” Journal of Juristic Papyrology 39, pp. 31-47. Available: http://www.trismegistos.org/ref/depauw_vanbeek.pdf
- Tom Elliott and Sean Gillies, “Digital Geography and Classics”, Digital Humanities Quarterly, 2009 3.1, Available: http://digitalhumanities.org/dhq/vol/3/1/000031/000031.html
- David Nadeau (2007), "A survey of named entity recognition and classification", Linguisticae Investigationes. Available: http://www.islab.ece.ntua.gr/attachments/article/71/NER-SURVEY.pdf
Further reading
- Elton Barker, Leif Isaksen et al. (2013), “On using a digital resources for the study of an ancient greek text: the case of Herodotus’ Histories”, in Stuart Dunn and Simon Mahony (eds.), The Digital Classicist 2013. Bulletin of the Institute of Classical Studies Supplement (122), Institute of Classical Studies, University of London, pp. 45-62, Available: http://oro.open.ac.uk/34498/8/Barker_etal2013_Hestia_BICS.pdf
- Laurie Pearce and Patrick Schmitz (2014), “Berkeley Prosopography Services.” ISAW Papers 7.19. Available: http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/pearce-schmitz/
- Rainer Simon, Elton Barker, et al. (2015). “Linking early geospatial documents, one place at a time: annotation of geographic documents with Recogito.” e-Perimetron, 10.2, pp. 49–59. Available: http://oro.open.ac.uk/43613/1/Simon_et_al.pdf
- Ryan Horne, “Beyond Maps as Images at the Ancient World Mapping Center”, ISAW Papers 7.9 (2014). Available: http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/horne/
- For general linked open data projects and more bibliography, see other papers in:
- Thomas Elliott, Sebastian Heath and John Muccigrosso (eds.), Current Practice in Linked Open Data for the Ancient World, ISAW Papers 7 (2014). Available: http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/
- Tim Berners-Lee, Linked Open Data in the W3. Available: https://www.w3.org/DesignIssues/LinkedData.html
Essay title
With reference to either prosopography (people) or geography (places) or another type of ancient data, describe and assess the value Linked Open Data currently has for ancient studies.
Exercise
As a possible assignment, the students will have to create their own small dataset annotating and geotagging a chosen text or image on Recogito.
- Familiarise yourself with the Pelagios Annotation Principles
- Read and work through the Recogito Beginner’s Tutorial
- Select a dataset in a chosen language (text or map image) in Recogito (probably with guidance from your tutor)
- Annotate roughly 50-100 placenames in text or as many placenames on a map as you can (ideally 15-30). NB: follow the Annotation Principles carefully
- Work through the names you have annotated, and georesolve or flag as many as you can (ideally at least half of them). NB: follow the beginner’s tutorial carefully
- Optional: Look at the maps, document stats and other visualizations in Recogito for the material you have added. Try downloading the geodata as a CSV file, and visualize it in QGIS following the tutorial provided. What can you learn from the data you have annotated?
##Recogito QGIS Tutorial (By Leif Isaksen)
This tutorial is intended to give a flavour of the potential of Recogito download data within a GIS system. It is not intended to be exhaustive or specific to QGIS.
Preparation:
- If necessary, download and install QGIS
- Download your chosen CSV file from Recogito:
- Within QGIS, activate the OpenLayers and Qgis2threejs plugins and ensure they are up to date.
Add base layer:
- Create a New Project in QGIS. (Project | New )
- Change the canvas unit to metres (Project | Project Properties…)
- Add an aerial base layer map (Web | OpenLayers plugin | MapQuest | MapQuest Open Aerial )
Load Recogito data:
- Add a csv file to the project (Layer | Add Delimited Text Layer…)
- Select the Bordeaux itinerary file and ensure the delimiters are set correctly, that the first line has field names and that the x and y values represent longitude (E-W) and latitude (N-S) respectively.
- Select EPSG:4326 CRS system You should now see the places on the map.
Symbolize the data.
- Double-click the layer (or right-click| properties)
- In the Labels tab select Label this Layer with, and select the toponym field
- In the Style tab, change the Single Symbol drop down menu to Categorized
- Under Column select tags
- Click Classify. You will now see that each kind of tag entry has been assigned a color. QGIS cannot separate out tags so you will need to color-code multiple entries the same, or you can use filtering to create individual layers for each type of feature.
- Choose an individual feature class and change it by double-clicking on the symbol or selecting change…. You can change multiple features at once. Try representing different categories in different styles. Experiment with different combinations of colour and shape.