Elegant Data Manipulation with Lenses
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(dplyr)
install.packages("lenses")
or
devtools::install_github("cfhammill/lenses")
When programming in R there are two fundamental operations we perform on our data.
We view
some piece of the data, or we set
some piece of the data to a particular
value. These two operations are so fundamental that R comes with many pairs
of view
and set
functions. A classic example would be names
. Names can be
viewed names(x)
and set names(x) <- new_names
. Lenses are an extension
of the idea of view
/set
pairs, offering the following advantages:
set
anything you can view
)set
/view
nested data)In this document, we’ll see a few common data manipulation operations and
how they can be improved with lenses.
Let’s take the iris data set for example, we want to perform
some manipulations on it.
head(iris)
We’re curious about the value of the 3rd element of the Sepal.Length
column.
Using base R we can view
it with:
iris$Sepal.Length[3]
we can update (set) the value by assigning into it:
iris$Sepal.Length[3] <- 100
head(iris, 3)
and we can perform some operation to update it:
iris$Sepal.Length[3] <- log(iris$Sepal.Length[3])
This works well, however, there are some problems.
The first problem comes with having our view
and set
functions separate.
Composing our operations isn’t easy, particularly when using pipes:
iris %>%
.$Sepal.Length %>%
`[<-`(3, 20)
Whoops, that’s not what we wanted. Here we see Sepal.Length
with
the third element replaced, but where did the rest of iris
go! So
we lose information when we pipe from a view
to a set
.
R’s set
/view
pairs also can’t be composed with function
compostion:
`[<-`(`$`(iris, `Sepal.Length`), 3, 20)
still not what we want. It has the same problem above.
This is a failure of “bidirectionality”, once you’ve chosen to use a
view
function, or a set
function, you are locked into that direction.
Lack of composability and bidirectionality means that you frequently have
to duplicate your code. For example, if you want to apply an operation to the
third element of “Sepal.Length”, you need to specify the chain of accessors
twice, once in view
mode, and once in set
mode, making your code messy
and cumbersome:
iris$Sepal.Length[3] <- iris$Sepal.Length[3] * 2
head(iris, 3)
We can fix both of these problems by using lenses.
Lenses give you all the power of R’s view
and set
functions plus
the advantages noted above. Especially important are the composition
and bidirectionality features. Each lens can be used with the
view
,
and set
functions.
Let’s revisit the operations we performed above using lenses.
The first thing we will do is construct a lens into the
third element of the Sepal.Length
component of a structure:
library(lenses)
sepal_length3 <- index("Sepal.Length") %.% index(3)
In the above code we’re creating two lenses, one
into Sepal.Length
and another into element 3, using the
index
function. We’re then composing these two lenses with
%.%
producing a new lens into our element of interest.
Note that this lens has no idea we’re going to apply it
to iris
. Lenses are constructed without knowing what data they
will be applied to.
Now that we have a lens into the third element of Sepal.Length
,
we can examine the appropriate element of the iris
dataset with the
view
function:
iris %>% view(sepal_length3)
We can update this element with the
set
function:
iris %>% set(sepal_length3, 50) %>% head(3)
And we can apply a function to change the data. To
do this we can apply a function
over
the lens:
iris %>% over(sepal_length3, log) %>% head(3)
Note that we never had to respecify what subpart
we wanted, the lens kept track for us. We saw that
the same lens can be used to both
view
and
set
,
and that they can be composed easily with
%.%
.
Now you have seen the main lens verbs and operations
view
:set
:over
:%.%
:Now if all lenses had to offer was more composable indexing of vectors,
you might not be interested in integrating them into your workflows.
But lenses can do a lot more than just pick and set elements in vectors.
For example, this package provides lens-ified version of dplyr::select
.
Unlike select
,
select_l
is bidirectional. This means you can
set
the results
of your selection.
let’s select columns between Sepal.Width
and Petal.Width
and increment them by 10:
iris %>%
over(select_l(Sepal.Width:Petal.Width)
, ~ . + 10
) %>%
head(3)
Not only does select_l
create the appropriate lens for you with dplyr::select
style column references,
but over
allows us
to declare anonymous functions like in purrr
.
At this point I can imagine you saying, all this is very clear, but what good is it,
I have mutate
. Well that is a good point. It is hard to beat the convenience
of mutate
. However, select_l
has an advantage, it can be used on any named
object:
iris %>%
as.list %>%
view(select_l(matches("Sepal")) %.%
index(1) %.%
index(1)
)
You can use it with vectors, lists, data.frames, etc.
If select_l
isn’t enticing enough, have you ever wanted to set
or modify the results of a filter
? This is not super easy to do in the dplyr
universe. But our lens
ified filter
,
filter_l
does this
with ease.
Let’s set all “Sepal” columns where the row number is less than three to
zero. And for fun let’s also change the column names to all upper case:
library(dplyr)
iris %>%
mutate(row_num = seq_len(n())) %>%
set(filter_l(row_num < 3) %.%
select_l(matches("Sepal"))
, 0) %>%
over(names_l, toupper) %>%
head(3)
You can even use mutate over
your filter_l
iris %>%
mutate(row_num = seq_len(n())) %>%
over(filter_l(row_num < 3)
, ~ mutate(., Sepal.Length = 0)) %>%
head(3)
As you can see, lenses can be smoothly integrated into your tidyverse
workflows, as well as your base R workflows. Giving you the powers
of compositionality and bidirectionality to improve your code.
Frequently we end up in situations where we want to modify each element
of a nested object. This is especially cumbersome without lenses. Let’s
imagine our data lives inside a larger structure. And additionally that
it isn’t a nice data frame, but a list.
packed_iris <- list(as.list(iris))
packed_iris %>% str(2)
say I want to add 10 to the first element of each column between Sepal.Length
and
Petal.Width
. Base R I might do something like:
els_of_interest <-
grep("Sepal|Petal", names(packed_iris[[1]]), value = TRUE)
packed_iris[[1]][1:4] <-
lapply(packed_iris[[1]][1:4]
, function(x){ x[1] <- x[1] + 10; x })
str(packed_iris, 2)
pretty ugly right?
To do this with lenses we can use the map_l
function to promote a lens
to
apply to each element of a list.
els_l <-
index(1) %.%
select_l(Sepal.Length:Petal.Width) %.%
map_l(index(1))
over_map(packed_iris, els_l, ~ . + 10) %>%
str(2)
Here we use the over_map
function to apply a function to each element,
you could equivalently use over
with lapply
as well. As you can see
setting and applying functions to multiple elements of nested data is
dramatically improved by using lenses.
You can make a lens from scratch (!) by passing view
and set
functions to
the lens
constructor:
first_l <- lens(view = function(d) d[[1]],
set = function(d, x) { d[[1]] <- x; d })
As you can see, the view
function must accept an element of data, while the set
function must accept such an element as well as the new value of the subpart,
and return the new data in its entirety - thus achieving composability -
without modifying the original.
In order to avoid unpleasant surprises or inconsistencies for users,
an author of a lens
(via lens
)
should ensure it obeys the following rules (the “Lenz laws”, here paraphrased
from a Haskell lens tutorial):
view
some data with a lens, and thenset
the data with that value, you get the input data back.set
a value with a lens,view
that value with the same lens, you get back what you put in.set
a value into some data with a lens, andset
another value with the same lens, it’s the same as onlyset
.“Lenses” which do not satisfy these properties should be documented accordingly.
By convention, the few such specimens in this library are suffixed by “_il” (“illegal lens”).
See the package reference for more.
As you can see from the lens
constructor, knowing how to implement view
and set
for a lens turns out to be sufficient to implement the other verbs such as over
and
%.%
).In our implementation, lenses are trivial. They simply store the provided
functions. A lens
under the hood is a two element list with an
element view
and an element set
.
There is nothing particularly new about the lenses appearing here.
For a fairly comprehensive (and highly technical) history of lenses, see
links here and
this blog post .
Thanks to Leigh Spencer Noakes, Zsu Lindenmaier, and Lily Qiu for reading
drafts of this document and providing very helpful feedback.