nplyr

nplyr: a grammar of (nested) data manipulation :bird:

118
3
R

output: github_document

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  message = FALSE,
  warning = FALSE
)

nplyr

Author: Mark Rieke

License: MIT

R-CMD-check
Lifecycle: experimental
CRAN status

Overview

{nplyr} is a grammar of nested data manipulation that allows users to perform dplyr-like manipulations on data frames nested within a list-col of another data frame. Most dplyr verbs have nested equivalents in nplyr. A (non-exhaustive) list of examples:

  • nest_mutate() is the nested equivalent of mutate()
  • nest_select() is the nested equivalent of select()
  • nest_filter() is the nested equivalent of filter()
  • nest_summarise() is the nested equivalent of summarise()
  • nest_group_by() is the nested equivalent of group_by()

As of version 0.2.0, nplyr also supports nested versions of some tidyr functions:

  • nest_drop_na() is the nested equivalent of drop_na()
  • nest_extract() is the nested equivalent of extract()
  • nest_fill() is the nested equivalent of fill()
  • nest_replace_na() is the nested equivalent of replace_na()
  • nest_separate() is the nested equivalent of separate()
  • nest_unite() is the nested equivalent of unite()

nplyr is largely a wrapper for dplyr. For the most up-to-date information on dplyr please visit dplyr’s website. If you are new to dplyr, the best place to start is the data transformation chapter in R for data science.

Installation

You can install the released version of nplyr from CRAN or the development version from github with the devtools or remotes package:

# install from CRAN
install.packages("nplyr")

# install from github
devtools::install_github("markjrieke/nplyr")

Usage

To get started, we’ll create a nested column for the country data within each continent from the gapminder dataset.

library(nplyr)

gm_nest <- 
  gapminder::gapminder_unfiltered %>%
  tidyr::nest(country_data = -continent)

gm_nest

dplyr can perform operations on the top-level data frame, but with nplyr, we can perform operations on the nested data frames:

gm_nest_example <- 
  gm_nest %>%
  nest_filter(country_data, year == max(year)) %>%
  nest_mutate(country_data, pop_millions = pop/1000000)

# each nested tibble is now filtered to the most recent year
gm_nest_example

# if we unnest, we can see that a new column for pop_millions has been added
gm_nest_example %>%
  slice_head(n = 1) %>%
  tidyr::unnest(country_data)

nplyr also supports grouped operations with nest_group_by():

gm_nest_example <- 
  gm_nest %>%
  nest_group_by(country_data, year) %>%
  nest_summarise(
    country_data, 
    n = n(),
    lifeExp = median(lifeExp),
    pop = median(pop),
    gdpPercap = median(gdpPercap)
  )

gm_nest_example

# unnesting shows summarised tibbles for each continent
gm_nest_example %>%
  slice(2) %>%
  tidyr::unnest(country_data)

More examples can be found in the package vignettes and function documentation.

Bug reports/feature requests

If you notice a bug, want to request a new feature, or have recommendations on improving documentation, please open an issue in the package repository.