GoldenCheetahOpenData

R API and Local Database Management for the GoldenCheetah OpenData Project


output: github_document

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "80%",
  fig.width = 9,
  fig.height = 9
)

GoldenCheetahOpenData

CRAN_Status_Badge
R build status
Travis build status
Codecov test coverage
Licence
Licence

Introduction

The GoldenCheetahOpenData R package provides methods for querying
the GoldenCheetah OpenData project database <doi:
10.17605/OSF.IO/6HFPZ>, downloading workout data from it and managing
local workout databases. Methods are also provided for the
organization of the workout data into trackeRdata objects for
further data analysis and modelling in R using the infrastructure
provided by the trackeR R package.

Installation

You can install the released version of GoldenCheetahOpenData from CRAN with:

install.packages("GoldenCheetahOpenData")

The development version can be installed directly from GitHub with:

# install.packages("devtools")
devtools::install_github("ikosmidis/GoldenCheetahOpenData")

Workflow

GoldenCheetahOpenData implements a simple query-download-read workflow, that allows users to gradually build and maintain a local repository with workouts from the GoldenCheetah OpenData project.

Querying GoldenCheetah OpenData project mirrors

The first step in the GoldenCheetahOpenData R package workflow is to query the mirrors of the GoldenCheetah OpenData project for the available IDs. This is done with a call to get_athlete_ids(), which will create a gcod_db object.

library("GoldenCheetahOpenData")
ids <- get_athlete_ids()
class(ids)

gcod_db objects

gcod_db objects have two perspectives: a remote perspective describing the state of the GoldenCheetah OpenData project’s database, and a local describing the state of any local database (more on this later).

gcod_db objects that are produced by get_athlete_ids() have no records in their local perspective because none have been downloaded.

print(ids, txtplot = TRUE)

The output above also gives us a quick snapshot of GoldenCheetah OpenData project’s database. It currently has

format(total_size(ids), unit = "auto")

worth of compressed workout sessions for

n_ids(ids, perspective = "remote")

athletes. And it keeps growing!

Downloading workouts

gcod_db objects can directly be passed into the download_workouts() function for getting the workouts for the athlete IDs in the remote perspective or at least a few of them. Below we only get the workouts for those athletes whose IDs contain b7-9 (as a big fan of the Star Trek Voyager series, this choice made sense to me!).

ids_b79 <- download_workouts(ids,
                             pattern = "b7-9",
                             local_dir = "~/Downloads/GCOD-db/",
                             verbose = TRUE)

After the above code chunk was run, download_workouts() downloaded the requested workout data on my disk and placed them in the directory “~/Downloads/GCOD-db/”

dir("~/Downloads/GCOD-db")

The result from download_workouts() is a new gcod_db object, which has updated the local and remote perspectives of the original gcod_db object

ids_b79

You may save your gcod_db objects, but this is not really necessary. They can be reconstructed using the rebuild_bd() method, which reads the local workout database and makes appropriate queries to the GoldenCheetah OpenData project’s mirrors:

ids_dir <- rebuild_db("~/Downloads/GCOD-db")
ids_dir
ids_b79
athlete_id(ids_dir, perspective = "local")
athlete_id(ids_b79, perspective = "local")
athlete_id(ids_dir, perspective = "remote")
athlete_id(ids_b79, perspective = "remote")

Caution! A careless call to download_workouts() can easily instruct R to start downloading all workouts from the GoldenCheetah OpenData project, which is rarely what you want. Instead, I recommend downloading only a few at a time. This can be done in various ways, including using the prefix argument of get_athlete_ids(), the pattern argument of download_workouts() (as above) or — a bit more advanced — by directly subsetting the gcod_db object

ids_sub <- subset(ids, subset = grepl("b7-9", athlete_id(ids)), perspective = "remote")
athlete_id(ids_sub)

In this way, whenever you rebuild the gcod_db object form the local database you are getting access to all the workout files you have ever downloaded.

Reading workouts

Reading workouts involves: extracting the workout archives for each athlete ID, reading all the .csv files in the extracted directories, wrangling the information in them (e.g. inferring the workout timestamps, carrying out data quality checks, imputation, etc), and organizing the resulting data into objects that can be used for further analyses. The read_workouts() method can do all the above from a unified interface. For example (and this takes a while):

b79 <- read_workouts(ids_b79)

By default, read_workouts():

  • does not overwrite existing directories with extracted workouts (which can be bypassed by setting overwrite = TRUE)

  • deletes the extracted directories after everything has been read (which can be bypassed by setting clean_db = FALSE)

  • writes the processed trackeRdata objects (see ?saveRDS) in the same directory as the workout archives, using the convention <athlete_id>.rds.

Analyses of workout data using trackeR

b79 is now a list of trackeRdata objects, and trackeR can be used for exploration.

library("trackeR")
## Reading was not possible for athlete ID (see `warnings`)
which(is.na(b79))
## so we remove them 
b79 <- b79[!is.na(b79)]
## number of workout sessions per athlete ID
sapply(b79, nsessions)
## total duration per athlete ID in hours
sapply(b79, function(x) sum(session_duration(x, duration_unit = "h")))

Let’s explore further the workout sessions for athlete ID af3ab0e9-fc82-43b7-9d5b-60d496b77d70

athlete1 <- b79[["af3ab0e9-fc82-43b7-9d5b-60d496b77d70"]]
## Number of sessions
nsessions(athlete1)
## Total workout duration
athlete1_duration <- session_duration(athlete1)
sum(athlete1_duration)
## Only keep workout sessions with duration more than 10 min
athlete1 <- athlete1[athlete1_duration > 10/60]

The workout timeline and some workout views can be easily produced using methods from the trackeR R package

## Training times
timeline(athlete1)
## Power and heart_rate for the 80th to 84th workout
plot(athlete1, session = 40:45, what = c("power", "heart_rate"))

We can also compute and visualize summaries for the workout sessions (see, Section 5.2 of the trackeR vignette for details)

athlete1_summaries <- summary(athlete1)
##  Choose some features (see `?trackeR::plot.trackeRdataSummary`
##  for the names of the available summaries, or
##  names(data.frame(athlete1_summaries)))
features <- c("duration", "distance", "avgPower", "avgHeartRate", "total_elevation_gain", "wrRatio")
## Plot each feature longitudinally
plot(athlete1_summaries, what = features)

and explore the relationships between those summaries

## Plot all pairs of features
plot(data.frame(athlete1_summaries)[features])

And a bit more advanced analysis: the power concentration profiles (see, Section 5.5 of the trackeR vignette for details and definition of concentration profiles) for this athlete ID are

athlete1_cp <- concentration_profile(athlete1, what = c("power"))
plot(athlete1_cp, multiple = TRUE)

and a functional PCA on them gives that the first 2 eigenfunctions of the concentration profiles explain about 90% of the variation in the concentration profiles

athlete1_fpca <- funPCA(athlete1_cp, what = "power", nharm = 5)
round(athlete1_fpca$varprop[1:2] * 100, 2)

The principal components can then be used for further analyses or as features in further modelling

## Check which sessions have power data
has_power <- !is.na(athlete1_summaries$avgPower)
## Scatterplots of the first harmonic against session summaries (high correlation with distance and duration)
plot(cbind(data.frame(athlete1_summaries)[has_power, features], PC1 = athlete1_fpca$scores[, 1]))
## Scatterplots of the second harmonic against session summaries
plot(cbind(data.frame(athlete1_summaries)[has_power, features], PC2 = athlete1_fpca$scores[, 2]))
## etc

Issues

Please use the GoldenCheetahOpenData GitHub issue
page
to
report any issues or suggest enhancements or improvements.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

References and resources

See the trackeR R package CRAN
page
for vignettes on the
use of trackeR.

Frick, H., Kosmidis, I. (2017). trackeR: Infrastructure for Running
and Cycling Data from GPS-Enabled Tracking Devices in R. Journal
of Statistical Software
, 82(7),
1–29. doi:10.18637/jss.v082.i07

Liversedge, M. (2020). GoldenCheetah OpenData
Project. OSF. doi:10.17605/OSF.IO/6HFPZ