usaspending api

Server application to serve U.S. federal spending data via a RESTful API

USAspending API

Code style: black Pull Request Checks Test Coverage Code Climate

This API is utilized by USAspending.gov to obtain all federal spending data which is open source and provided to the public as part of the DATA Act.

USAspending Landing Page

Creating a Development Environment

Ensure the following dependencies are installed and working prior to continuing:

Requirements

  • docker which will handle the other application dependencies.
  • docker compose
  • bash or another Unix Shell equivalent
  • git
  • make for running build/test/run targets in the Makefile. (Run $ make for a list of targets.)

If not using Docker, you’ll need to install app components on your machine:

Using Docker is recommended since it provides a clean environment. Setting up your own local environment requires some technical abilities and experience with modern software tools.

  • Command line package manager
    • Windows’ WSL bash uses apt
    • MacOS users can use Homebrew
    • Linux users already know their package manager (yum, apt, pacman, etc.)
  • PostgreSQL version 13.x (with a dedicated data_store_api database)
  • Elasticsearch version 7.1
  • Python version 3.10 environment
    • Highly recommended to use a virtual environment. There are various tools and associated instructions depending on preferences
    • See Required Python Libraries for an example using pyenv

Cloning the Repository

Now, navigate to the base file directory where you will store the USAspending repositories

mkdir -p usaspending && cd usaspending
git clone https://github.com/fedspendingtransparency/usaspending-api.git
cd usaspending-api

Environment Variables

Choose an option between .env and .envrc that best fits your preferred workflow. Pay close attention to the values in these environment variables as usage of localhost vs a container’s name differ between local setups.

Create Your .env File (recommended)

Copy the template .env file with local runtime environment variables defined. Change as needed for your environment. This file is git-ignored and will not be committed by git if changed.

cp .env.template .env

A .env file is a common way to manage environment variables in a declarative file. Certain tools, like docker compose, will read and honor these variables.

Create Your .envrc File

direnv is a shell extension that automatically runs shell commands in a .envrc file (commonly env var export commands) when entering or exiting a folder with that file

Create a .envrc file in the repo root, which will be ignored by git. Change credentials and ports as-needed for your local dev environment.

export DATABASE_URL=postgres://usaspending:usaspender@localhost:5432/data_store_api
export ES_HOSTNAME=http://localhost:9200
export DATA_BROKER_DATABASE_URL=postgres://admin:root@localhost:5435/data_broker

If direnv does not pick this up after saving the file, type

direnv allow

Alternatively, you could skip using direnv and just export these variables in your shell environment.

Just make sure your env vars declared in the shell and in .env match for a consistent experience inside and outside of Docker

Build usaspending-backend Docker Image

This image is used as the basis for running application components and running containerized setup services.

docker compose --profile usaspending build

‼️ Re-run this command if any python package dependencies change (in requirements/requirements-app.txt), since they are baked into the docker image at build-time.

Database Setup

A postgres database is required to run the app. You can run it in a postgres docker container (preferred), or run a PostgreSQL server on your local machine. In either case, it will be empty until data is loaded.

  • ⚠️ If running your own PostgreSQL server be sure to:
    1. Have a DB named data_store_api
    2. A superuser role (user), e.g. ALTER ROLE <<role/user you created>> WITH SUPERUSER;
    3. Cross-check your .env or .envrc files if used to be sure it references your DBs user, password, host, and port where needed
Start the Postgres DB Container

If not using your own local install…

docker compose --profile usaspending up -d usaspending-db

… will create and run a Postgres database.

Use the following commands to create necessary users and set the usaspending user’s search_path

docker exec -it usaspending-db sh -c " \
    psql \
        -h localhost \
        -p 5432 \
        -U usaspending \
        -d data_store_api \
        -c 'CREATE USER etl_user;' \
        -c 'CREATE USER readonly;' \
        -c 'ALTER USER usaspending SET search_path TO public,raw,int,temp,rpt;' \
"
Bring DB Schema Up-to-Date
  • To run Django migrations.
    docker compose run --rm usaspending-manage python3 -u manage.py migrate
    
  • To provision the materialized views which are required by certain API endpoints.
    docker compose run --rm usaspending-manage python3 -u manage.py matview_runner --dependencies
    
Seeding and Loading Database Data

To just get essential reference data, you can run:

  • To load essential reference data (agencies, program activity codes, CFDA program data, country codes, and others).
    docker compose run --rm usaspending-manage python3 -u manage.py load_reference_data
    

Alternatively, to download a fully populuated production snapshot of the database (full or a subset) and restore it into PostgreSQL, use the pg_restore tool as described here: USAspending Database Download

  • Recreate matviews with the command documented in the previous section if this is done

Executing individual data-loaders to load in data is also possible, but requires more familiarity with those ad-hoc scripts and commands, and also requires an external data source (Data Broker DB, or external file, etc.) from which to load the data.

  • For details on loading reference data, Data Accountability Broker Submissions, and current USAspending data into the API, see loading_data.md.
  • For details on how our data loaders modify incoming data, see data_reformatting.md.

Elasticsearch Setup

Some API endpoints reach into Elasticsearch for data.

docker compose --profile usaspending up -d usaspending-es

… will create and start a single-node Elasticsearch cluster as a docker container with data persisted to a docker volume.

  • The cluster should be reachable via at http://localhost:9200 (“You Know, for Search”).

  • Optionally, to see log output, use docker compose logs usaspending-es (these logs are stored by docker even if you don’t use this).

While not required, it is highly recommended to also create the Kibana docker container for querying the Elasticsearch cluster.

docker compose --profile usaspending up usaspending-kibana-es

Generate Elasticsearch Indexes

The following will generate the indexes:

CURR_DATE=$(date '+%Y-%m-%d-%H-%M-%S')
docker compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name "$CURR_DATE-transactions" --load-type transaction
docker compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name "$CURR_DATE-awards" --load-type award
docker compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name "$CURR_DATE-recipients" --load-type recipient
docker compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name "$CURR_DATE-locations" --load-type location
docker compose run --rm usaspending-manage python3 -u manage.py elasticsearch_indexer --create-new-index --index-name "$CURR_DATE-subaward" --load-type subaward

Running the API

Run the following to bring up the Django app for the RESTful API:

docker compose --profile usaspending up usaspending-api

You can update environment variables in settings.py (buckets, elasticsearch, local paths) and they will be mounted and used when you run this.

The application will now be available at http://localhost:8000.

NOTE: if the code was run outside of Docker then compiled Python files will potentially trip up the docker environment. A useful command to run for clearing out the files on your host is:

find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf

Using the API

In your local development environment, available API endpoints may be found at http://localhost:8000/docs/endpoints.

Deployed production API endpoints and docs are found by following links here: https://api.usaspending.gov.

Running Tests

Test Setup

  1. Build the base usaspending-backend Docker image (the test container is based on this Docker image). In the parent usaspending-api directory run:

    docker build -t usaspending-backend .
    
  2. Start the Spark containers for the Spark related tests

    docker compose --profile spark up -d
    
  3. To run all USAspending tests in the docker services run

    docker compose run --rm -e DATA_BROKER_DATABASE_URL='' usaspending-test
    

NOTE: If an env var named DATA_BROKER_DATABASE_URL is set, Broker Integration tests will attempt to be run as well. If doing so, Broker dependencies must be met (see below) or ALL tests will fail hard. Running the above command with -e DATA_BROKER_DATABASE_URL='' is a precaution to keep them excluded, unless you really want them (see below if so).

To run tests locally and not in the docker services, you need:

  1. Postgres: A running PostgreSQL database server (See Database Setup above)
  2. Elasticsearch: A running Elasticsearch cluster (See Elasticsearch Setup above)
  3. Environment Variables: Tell python where to connect to the various data stores (See Environmnet Variables)
  4. Required Python Libraries: Python package dependencies downloaded and discoverable (See below)

NOTE: Running test locally might require you to run export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES. As discussed here, there is an issue than can cause some of the Spark tests to fail without this environment variable set.

Once these are satisfied, run:

make tests

or, alternatively to skip using make

pytest

Required Python Libraries

Setup python, virtual environment, and pip dependencies, then check version info

make local-dev-setup
source $(make activate)

Your prompt should then look as below to show you are in the virtual environment named usaspending-api (to exit that virtual environment, simply type deactivate at the prompt).

(usaspending-api) $

Including Broker Integration Tests

Some automated integration tests run against a Broker database. If certain dependencies to run such integration tests are not satisfied, those tests will bail out and be marked as Skipped.
(You can see messages about those skipped tests by adding the -rs flag to pytest, like: pytest -rs)

To satisfy these dependencies and include execution of these tests, do the following:

  1. Ensure the Broker source code is checked out alongside this repo at ../data-act-broker-backend

  2. Ensure you have Docker installed and running on your machine

  3. Ensure you have built the Broker backend Docker image by running:

    docker build -t dataact-broker-backend ../data-act-broker-backend
    
  4. Ensure you have the DATA_BROKER_DATABASE_URL environment variable set, and it points to what will be a live PostgreSQL server (no database required) at the time tests are run.

    1. WARNING: If this is set at all, then ALL above dependencies must be met or ALL tests will fail (Django will try this connection on ALL tests’ run)
    2. This DB could be one you always have running in a local Postgres instance, or one you spin up in a Docker container just before tests are run
  5. If invoking pytest within a docker container (e.g. using the usaspending-test container), you must mount the host’s docker socket. This is declared already in the docker-compose.yml file services, but would be done manually with: -v /var/run/docker.sock:/var/run/docker.sock

NOTE: Broker source code should be re-fetched and image rebuilt to ensure latest integration is tested

Re-running the test suite using pytest -rs with these dependencies satisfied should yield no more skips of the broker integration tests.

Example Test Invocations of Just a Few Broker Integration Tests: (i.e. using -k)

From within a container

NOTE: DATA_BROKER_DATABASE_URL is set in the docker-compose.yml file (and could pick up .env values, if set)

docker compose run --rm usaspending-test pytest --capture=no --verbose --tb=auto --no-cov --log-cli-level=INFO -k test_broker_integration

From Developer Desktop

NOTE: DATA_BROKER_DATABASE_URL is set in the .envrc file and available in the shell

pytest --capture=no --verbose --tb=auto --no-cov --log-cli-level=INFO -k test_broker_integration

Contributing

To submit fixes or enhancements, or to suggest changes, see CONTRIBUTING.md.