Geolocation for Twitter.
A Python version of Carmen,
a library for geolocating tweets.
Given a tweet, Carmen will return Location
objects that represent a
physical location.
Carmen uses both coordinates and other information in a tweet to make
geolocation decisions.
It’s not perfect, but this greatly increases the number of geolocated
tweets over what Twitter provides.
To install, simply run:
$ python setup.py install
To run the Carmen frontend, see:
$ python -m carmen.cli --help
We are excited to release the improved Carmen Twitter geotagger, Carmen 2.0! We have implemented the following improvements:
We provide two different location databases.
carmen/data/geonames_locations_combined.json
is the new GeoNames database introduced in Carmen 2.0. It is derived by swapping out to use GeoNames IDs instead of arbitrary IDs used in the original version of Carmen. This database will be used by default.carmen/data/locations.json
is the default database in original carmen. This is faster but less powerful compared to our new database. You can use the --locations
flag to switch to this version of database for backward compatibility.We refer reader to the Carmen 2.0 paper repo for more details of GeoNames mapping: https://github.com/AADeLucia/carmen-wnut22-submission
python setup.py sdist bdist_wheel
to create the wheels in dist/
directorypython -m twine upload --repository testpypi dist/*
to upload to testpypipip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple carmen
to make sure it can be installed correctly from testpypipython -m twine upload dist/*
to publish on actual pypiIf you use the Carmen 2.0 package, please cite the following papers:
@inproceedings{zhang-etal-2022-changes,
title = "Changes in Tweet Geolocation over Time: A Study with Carmen 2.0",
author = "Zhang, Jingyu and
DeLucia, Alexandra and
Dredze, Mark",
booktitle = "Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.wnut-1.1",
pages = "1--14",
abstract = "Researchers across disciplines use Twitter geolocation tools to filter data for desired locations. These tools have largely been trained and tested on English tweets, often originating in the United States from almost a decade ago. Despite the importance of these tools for data curation, the impact of tweet language, country of origin, and creation date on tool performance remains largely unknown. We explore these issues with Carmen, a popular tool for Twitter geolocation. To support this study we introduce Carmen 2.0, a major update which includes the incorporation of GeoNames, a gazetteer that provides much broader coverage of locations. We evaluate using two new Twitter datasets, one for multilingual, multiyear geolocation evaluation, and another for usage trends over time. We found that language, country origin, and time does impact geolocation tool performance.",
}
@inproceedings{dredze_carmen_2013,
title = {Carmen: A Twitter Geolocation System with Applications to Public Health},
shorttitle = {Carmen},
url = {https://github.com/mdredze/carmen},
abstract = {Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.},
language = {en},
urldate = {2020-06-13},
publisher = {Association for the Advancement of Artificial Intelligence},
author = {Dredze, Mark and Paul, Michael J. and Bergsma, Shane and Tran, Hieu},
year = {2013},
keywords = {geotagging, privacy, twitter, twitter tool},
}