srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data

214
29
R

output:
github_document

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

srvyr

CRAN_Status_Badge
R build status
Codecov test coverage
Documentation via pkgdown

srvyr brings parts of dplyr’s syntax to survey
analysis, using the survey
package.

srvyr focuses on calculating summary statistics from survey data, such as the
mean, total or quantile. It allows for the use of many dplyr verbs, such as
summarize, group_by, and mutate, the convenience of pipe-able functions,
rlang’s style of non-standard evaluation and more consistent return types
than the survey package.

You can try it out:

install.packages("srvyr")
# or for development version
# remotes::install_github("gergness/srvyr")

Example usage

First, describe the variables that define the survey’s structure with the function
as_survey()with the bare column names of the names that you would use in functions
from the survey package like survey::svydesign(), survey::svrepdesign() or
survey::twophase().

library(srvyr, warn.conflicts = FALSE)
data(api, package = "survey")

dstrata <- apistrat %>%
   as_survey_design(strata = stype, weights = pw)

Now many of the dplyr verbs are available.

  • mutate() adds or modifies a variable.
dstrata <- dstrata %>%
  mutate(api_diff = api00 - api99)
  • summarise() calculates summary statistics such as mean, total, quantile or ratio.
dstrata %>% 
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
  • group_by() and then summarise() creates summaries by groups.
dstrata %>% 
  group_by(stype) %>%
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
  • Functions from the survey package are still available:
my_model <- survey::svyglm(api99 ~ stype, dstrata)
summary(my_model)

Learning more

Here are some free resources put together by the community about srvyr:

Still need help?

I think the best way to get help is to form a specific question and ask it in some place like posit’s community website (known for it’s friendly community) or stackoverflow.com (maybe not known for being quite as friendly, but probably has more people). If you think you’ve found a bug in srvyr’s code, please file an issue on GitHub, but note that I’m not a great resource for helping specific issue, both because I have limited capacity but also because I do not consider myself an expert in the statistical methods behind survey analysis.

Have something to add?

These resources were mostly found via vanity searches on twitter & github. If you know of anything I missed, or have written something yourself, please let me know in this GitHub issue!

What people are saying about srvyr

minimal changes to my #r #dplyr script to incorporate survey weights, thanks to the amazing #srvyr and #survey packages. Thanks to @gregfreedman & @tslumley. Integrates soooo nicely into tidyverse

Brian Guay (@BrianMGuay on Jun 16, 2021)

Spending my afternoon using srvyr for tidy analysis of weighted survey data in #rstats and it’s so elegant. Vignette here: https://CRAN.R-project.org/package=srvyr/vignettes/srvyr-vs-survey.html

Chris Skovron (@cskovron on Nov 20, 2018)

  1. Yay!

Thomas Lumley, in the Biased and Inefficient blog

Contributing

I do appreciate bug reports, suggestions and pull requests! I started this as a
way to learn about R package development, and am still learning, so you’ll have
to bear with me. Please review the Contributor Code of
Conduct
, as all participants are required to abide by its
terms.

If you’re unfamiliar with contributing to an R package, I recommend the guides
provided by Rstudio’s tidyverse team, such as Jim Hester’s blog
post
or Hadley
Wickham’s R packages book.