A Python version of Carmen, a library for geolocating tweets.
Given a tweet, Carmen will return Location
objects that represent a
physical location.
Carmen uses both coordinates and other information in a tweet to make
geolocation decisions.
It's not perfect, but this greatly increases the number of geolocated
tweets over what Twitter provides.
To install, simply run:
$ python setup.py install
To run the Carmen frontend, see:
$ python -m carmen.cli --help
We are excited to release the improved Carmen Twitter geotagger, Carmen 2.0! We have implemented the following improvements:
- A new location database derived from the open-source GeoNames geographical database. This multilingual database improves the coverage and robustness of Carmen as shown in our analysis paper "Changes in Tweet Geolocation over Time: A Study with Carmen 2.0".
- Compatibility with Twitter API V2.
- An up to 10x faster geocode resolver.
We provide two different location databases.
carmen/data/geonames_locations_combined.json
is the new GeoNames database introduced in Carmen 2.0. It is derived by swapping out to use GeoNames IDs instead of arbitrary IDs used in the original version of Carmen. This database will be used by default.carmen/data/locations.json
is the default database in original carmen. This is faster but less powerful compared to our new database. You can use the--locations
flag to switch to this version of database for backward compatibility.
We refer reader to the Carmen 2.0 paper repo for more details of GeoNames mapping: https://github.com/AADeLucia/carmen-wnut22-submission
- In the repo root folder,
python setup.py sdist bdist_wheel
to create the wheels indist/
directory python -m twine upload --repository testpypi dist/*
to upload to testpypi- Create a brand new environment, and do
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple carmen
to make sure it can be installed correctly from testpypi - After checking correctness, use
python -m twine upload dist/*
to publish on actual pypi
If you use the Carmen 2.0 package, please cite the following papers:
@inproceedings{zhang-etal-2022-changes,
title = "Changes in Tweet Geolocation over Time: A Study with Carmen 2.0",
author = "Zhang, Jingyu and
DeLucia, Alexandra and
Dredze, Mark",
booktitle = "Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.wnut-1.1",
pages = "1--14",
abstract = "Researchers across disciplines use Twitter geolocation tools to filter data for desired locations. These tools have largely been trained and tested on English tweets, often originating in the United States from almost a decade ago. Despite the importance of these tools for data curation, the impact of tweet language, country of origin, and creation date on tool performance remains largely unknown. We explore these issues with Carmen, a popular tool for Twitter geolocation. To support this study we introduce Carmen 2.0, a major update which includes the incorporation of GeoNames, a gazetteer that provides much broader coverage of locations. We evaluate using two new Twitter datasets, one for multilingual, multiyear geolocation evaluation, and another for usage trends over time. We found that language, country origin, and time does impact geolocation tool performance.",
}
@inproceedings{dredze_carmen_2013,
title = {Carmen: A Twitter Geolocation System with Applications to Public Health},
shorttitle = {Carmen},
url = {https://github.com/mdredze/carmen},
abstract = {Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.},
language = {en},
urldate = {2020-06-13},
publisher = {Association for the Advancement of Artificial Intelligence},
author = {Dredze, Mark and Paul, Michael J. and Bergsma, Shane and Tran, Hieu},
year = {2013},
keywords = {geotagging, privacy, twitter, twitter tool},
}