nplyr: a grammar of (nested) data manipulation :bird:
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
message = FALSE,
warning = FALSE
)
Author: Mark Rieke
License: MIT
{nplyr}
is a grammar of nested data manipulation that allows users to perform dplyr-like manipulations on data frames nested within a list-col of another data frame. Most dplyr verbs have nested equivalents in nplyr. A (non-exhaustive) list of examples:
nest_mutate()
is the nested equivalent of mutate()
nest_select()
is the nested equivalent of select()
nest_filter()
is the nested equivalent of filter()
nest_summarise()
is the nested equivalent of summarise()
nest_group_by()
is the nested equivalent of group_by()
As of version 0.2.0, nplyr also supports nested versions of some tidyr functions:
nest_drop_na()
is the nested equivalent of drop_na()
nest_extract()
is the nested equivalent of extract()
nest_fill()
is the nested equivalent of fill()
nest_replace_na()
is the nested equivalent of replace_na()
nest_separate()
is the nested equivalent of separate()
nest_unite()
is the nested equivalent of unite()
nplyr is largely a wrapper for dplyr. For the most up-to-date information on dplyr please visit dplyr’s website. If you are new to dplyr, the best place to start is the data transformation chapter in R for data science.
You can install the released version of nplyr from CRAN or the development version from github with the devtools or remotes package:
# install from CRAN
install.packages("nplyr")
# install from github
devtools::install_github("markjrieke/nplyr")
To get started, we’ll create a nested column for the country data within each continent from the gapminder dataset.
library(nplyr)
gm_nest <-
gapminder::gapminder_unfiltered %>%
tidyr::nest(country_data = -continent)
gm_nest
dplyr can perform operations on the top-level data frame, but with nplyr, we can perform operations on the nested data frames:
gm_nest_example <-
gm_nest %>%
nest_filter(country_data, year == max(year)) %>%
nest_mutate(country_data, pop_millions = pop/1000000)
# each nested tibble is now filtered to the most recent year
gm_nest_example
# if we unnest, we can see that a new column for pop_millions has been added
gm_nest_example %>%
slice_head(n = 1) %>%
tidyr::unnest(country_data)
nplyr also supports grouped operations with nest_group_by()
:
gm_nest_example <-
gm_nest %>%
nest_group_by(country_data, year) %>%
nest_summarise(
country_data,
n = n(),
lifeExp = median(lifeExp),
pop = median(pop),
gdpPercap = median(gdpPercap)
)
gm_nest_example
# unnesting shows summarised tibbles for each continent
gm_nest_example %>%
slice(2) %>%
tidyr::unnest(country_data)
More examples can be found in the package vignettes and function documentation.
If you notice a bug, want to request a new feature, or have recommendations on improving documentation, please open an issue in the package repository.