A command line tool (and Python library) for archiving Twitter JSON
twarc is a command line tool and Python library for collecting and archiving Twitter JSON
data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older
v1.1 API and the newer v2 API and Academic Access (respectively).
twarc has been developed with generous support from the Mellon Foundation.
New features are welcome and encouraged for twarc. However, to keep the core twarc library and command line tool sustainable we will look at new functionality with the following principles in mind:
For features and approaches that fall outside of this, twarc enables external packages to hook into the twarc2 command line tool via click-plugins. This means that if you want to propose new functionality, you can create your own package without coordinating with core twarc.
The documentation is managed at ReadTheDocs. If you would like to improve the documentation you can edit the Markdown files in docs
or add new ones. Then send a pull request and we can add it.
To view your documentation locally you should be able to:
pip install -r requirements-mkdocs.txt
pip install -e .
mkdocs serve
open http://127.0.0.1:8000/
If you prefer you can create a page on the wiki to workshop the documentation, and then when/if you think it’s ready to be merged with the documentation create an issue. Please feel free to create whatever documentation is useful in the wiki area.
If you are interested in adding functionality to twarc or fixing something that’s broken here are the steps to setting up your development environment:
git clone https://github.com/docnow/twarc
cd twarc
pip install -r requirements.txt
Create a .env file that included Twitter App keys to use during testing:
BEARER_TOKEN=CHANGEME
CONSUMER_KEY=CHANGEME
CONSUMER_SECRET=CHANGEME
ACCESS_TOKEN=CHANGEME
ACCESS_TOKEN_SECRET=CHANGEME
Now run the tests:
python setup.py test
Add your code and some new tests, and send a pull request!