Server application to serve U.S. federal spending data via a RESTful API
This API is utilized by USAspending.gov to obtain all federal spending data which is open source and provided to the public as part of the DATA Act.
Ensure the following dependencies are installed and working prior to continuing:
docker
which will handle the other application dependencies.docker-compose
bash
or another Unix Shell equivalent
git
make
for running build/test/run targets in the Makefile
. (Run $ make
for a list of targets.)If not using Docker, you’ll need to install app components on your machine:
Using Docker is recommended since it provides a clean environment. Setting up your own local environment requires some technical abilities and experience with modern software tools.
apt-get
Homebrew
yum
, apt
, pacman
, etc.)PostgreSQL
version 13.x (with a dedicated data_store_api
database)Elasticsearch
version 7.1Python
version 3.8 environment
pyenv
Now, navigate to the base file directory where you will store the USAspending repositories
$ mkdir -p usaspending && cd usaspending
$ git clone https://github.com/fedspendingtransparency/usaspending-api.git
$ cd usaspending-api
.env
FileCopy the template .env
file with local runtime environment variables defined. Change as needed for your environment. This file is git-ignored and will not be committed by git if changed.
$ cp .env.template .env
A .env
file is a common way to manage environment variables in a declarative file. Certain tools, like docker-compose
, will read and honor these variables.
.envrc
Filedirenv is a shell extension that automatically runs shell commands in a .envrc
file (commonly env var export
commands) when entering or exiting a folder with that file
Create a .envrc
file in the repo root, which will be ignored by git. Change credentials and ports as-needed for your local dev environment.
export DATABASE_URL=postgres://usaspending:usaspender@localhost:5432/data_store_api
export ES_HOSTNAME=http://localhost:9200
export DATA_BROKER_DATABASE_URL=postgres://admin:root@localhost:5435/data_broker
If direnv
does not pick this up after saving the file, type
$ direnv allow
Alternatively, you could skip using direnv
and just export these variables in your shell environment.
Just make sure your env vars declared in the shell and in .env
match for a consistent experience inside and outside of Docker
usaspending-backend
Docker ImageThis image is used as the basis for running application components and running containerized setup services.
$ docker-compose --profile usaspending build
‼️ Re-run this command if any python package dependencies change (in requirements/requirements-app.txt
), since they are baked into the docker image at build-time.
A postgres database is required to run the app. You can run it in a postgres
docker container (preferred), or run a PostgreSQL server on your local machine. In either case, it will be empty until data is loaded.
data_store_api
ALTER ROLE <<role/user you created>> WITH SUPERUSER;
.env
or .envrc
files if used to be sure it references your DBs user, password, host, and port where neededIf not using your own local install…
$ docker-compose --profile usaspending up usaspending-db
… will create and run a Postgres database.
docker-compose run --rm usaspending-manage python3 -u manage.py migrate
will run Django migrations.
docker-compose run --rm usaspending-manage python3 -u manage.py matview_runner --dependencies
will provision the materialized views which are required by certain API endpoints.
To just get essential reference data, you can run:
docker-compose run --rm usaspending-manage python3 -u manage.py load_reference_data
will load essential reference data (agencies, program activity codes, CFDA program data, country codes, and others).Alternatively, to download a fully populuated production snapshot of the database (full or a subset) and restore it into PostgreSQL, use the pg_restore
tool as described here: USAspending Database Download
Executing individual data-loaders to load in data is also possible, but requires more familiarity with those ad-hoc scripts and commands, and also requires an external data source (Data Broker DB, or external file, etc.) from which to load the data.
Some of the API endpoints reach into Elasticsearch for data.
$ docker-compose --profile usaspending up usaspending-es
… will create and start a single-node Elasticsearch cluster as a docker container with data persisted to a docker volume.
The cluster should be reachable via at http://localhost:9200 (“You Know, for Search”).
Optionally, to see log output, use docker-compose logs usaspending-es
(these logs are stored by docker even if you don’t use this).
The following will generate two base indexes, one for transactions and one for awards:
$ docker-compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name 01-26-2022-transactions --load-type transaction
$ docker-compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name 01-26-2022-awards --load-type award
docker-compose --profile usaspending up usaspending-api
… will bring up the Django app for the RESTful API
settings.py
(buckets, elasticsearch, local paths) and they will be mounted and used when you run this.The application will now be available at http://localhost:8000
.
Note: if the code was run outside of Docker then compiled Python files will potentially trip up the docker environment. A useful command to run for clearing out the files on your host is:
find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf
In your local development environment, available API endpoints may be found at http://localhost:8000/docs/endpoints
Deployed production API endpoints and docs are found by following links here: https://api.usaspending.gov
Build the base usaspending-backend
Docker image (the test container is based on this Docker image). In the parent usaspending-api directory run:
docker build -t usaspending-backend .
Start the Spark containers for the Spark related tests
docker-compose --profile spark up -d
To run all USAspending tests in the docker services run
docker-compose run --rm -e DATA_BROKER_DATABASE_URL='' usaspending-test
NOTE: If an env var named DATA_BROKER_DATABASE_URL
is set, Broker Integration tests will attempt to be run as well. If doing so, Broker dependencies must be met (see below) or ALL tests will fail hard. Running the above command with -e DATA_BROKER_DATABASE_URL=''
is a precaution to keep them excluded, unless you really want them (see below if so).
To run tests locally and not in the docker services, you need:
NOTE: Running test locally might require you to run export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
. As discussed here, there is an issue than can cause some of the Spark tests to fail without this environment variable set.
Once these are satisfied, run:
(usaspending-api) $ make tests
or, alternatively to skip using make
(usaspending-api) $ pytest
Setup python, virtual environment, and pip dependencies, then check version info
$ make local-dev-setup
$ source $(make activate)
Your prompt should then look as below to show you are in the virtual environment named usaspending-api
(to exit that virtual environment, simply type deactivate
at the prompt).
(usaspending-api) $
Some automated integration tests run against a Broker database. If certain dependencies to run such integration tests are not satisfied, those tests will bail out and be marked as Skipped.
(You can see messages about those skipped tests by adding the -rs
flag to pytest, like: pytest -rs
)
To satisfy these dependencies and include execution of these tests, do the following:
Ensure the Broker
source code is checked out alongside this repo at ../data-act-broker-backend
Ensure you have Docker
installed and running on your machine
Ensure you have built the Broker
backend Docker image by running:
(usaspending-api) $ docker build -t dataact-broker-backend ../data-act-broker-backend
Ensure you have the DATA_BROKER_DATABASE_URL
environment variable set, and it points to what will be a live PostgreSQL server (no database required) at the time tests are run.
If invoking pytest
within a docker container (e.g. using the usaspending-test
container), you must mount the host’s docker socket. This is declared already in the docker-compose.yml
file services, but would be done manually with: -v /var/run/docker.sock:/var/run/docker.sock
NOTE: Broker source code should be re-fetched and image rebuilt to ensure latest integration is tested
Re-running the test suite using pytest -rs
with these dependencies satisfied should yield no more skips of the broker integration tests.
Example Test Invocations of Just a Few Broker Integration Tests: (i.e. using -k
)
From within a container
(NOTE: DATA_BROKER_DATABASE_URL
is set in the docker-compose.yml
file (and could pick up .env
values, if set)
(usaspending-api) $ docker-compose run --rm usaspending-test pytest --capture=no --verbose --tb=auto --no-cov --log-cli-level=INFO -k test_broker_integration
From Developer Desktop
(NOTE: DATA_BROKER_DATABASE_URL
is set in the .envrc
file and available in the shell)
(usaspending-api) $ pytest --capture=no --verbose --tb=auto --no-cov --log-cli-level=INFO -k test_broker_integration
To submit fixes or enhancements, or to suggest changes, see CONTRIBUTING.md