stars

Spatiotemporal Arrays, Raster and Vector Data Cubes

581

output: github_document

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#",
  fig.path = "man/figures/README-"
)
is_online = curl::has_internet()

Spatiotemporal Arrays: Raster and Vector Datacubes

Spatiotemporal data often comes in the form of dense arrays, with space and time being array dimensions. Examples include

socio-economic or demographic data,
environmental variables monitored at fixed stations,
raster maps,
time series of satellite images with multiple spectral bands,
spatial simulations, and
climate or weather model output.

This R package provides classes and methods for reading,
manipulating, plotting and writing such data cubes, to the extent
that there are proper formats for doing so.

Raster and vector data cubes

The canonical data cube most of us have in mind is that where two
dimensions represent spatial raster dimensions, and the third time
(or band), as e.g. shown here:

knitr::include_graphics("https://raw.githubusercontent.com/r-spatial/stars/main/images/cube1.png")

By data cubes however we also consider higher-dimensional cubes
(hypercubes) such as a five-dimensional cube where in addition to
time, spectral band and sensor form dimensions:

knitr::include_graphics("https://raw.githubusercontent.com/r-spatial/stars/main/images/cube2.png")

or lower-dimensional cubes such as a raster image:

library(dplyr)
library(stars)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
read_stars(tif) |>
  slice(index = 1, along = "band") |>
  plot()

Raster data do not need to be regular and aligned with North/East,
and package stars supports besides regular also rotated,
sheared, rectilinear and curvilinear rasters:

x = 1:5
y = 1:4
d = st_dimensions(x = x, y = y, .raster = c("x", "y"))
m = matrix(runif(20),5,4)
r1 = st_as_stars(r = m, dimensions = d)

r = attr(d, "raster")
r$affine = c(0.2, -0.2)
attr(d, "raster") = r
r2 = st_as_stars(r = m, dimensions = d)

r = attr(d, "raster")
r$affine = c(0.1, -0.3)
attr(d, "raster") = r
r3 = st_as_stars(r = m, dimensions = d)

x = c(1, 2, 3.5, 5, 6)
y = c(1, 1.5, 3, 3.5)
d = st_dimensions(x = x, y = y, .raster = c("x", "y"))
r4 = st_as_stars(r = m, dimensions = d)

grd = st_make_grid(cellsize = c(10,10), offset = c(-130,10), n = c(8,5), crs = st_crs(4326))
r5 = st_transform(grd, "+proj=laea +lon_0=-70 +lat_0=35")

par(mfrow = c(2,3))
r1 = st_make_grid(cellsize = c(1,1), n = c(5,4), offset = c(0,0))
plot(r1, main = "regular")
plot(st_geometry(st_as_sf(r2)), main = "rotated")
plot(st_geometry(st_as_sf(r3)), main = "sheared")
plot(st_geometry(st_as_sf(r4, as_points = FALSE)), main = "rectilinear")
plot(st_geometry((r5)), main = "curvilinear")

Vector data cubes arise when we do not have two regularly discretized spatial dimensions, but a single dimension that points to distinct spatial feature geometries, such as polygons (e.g. denoting administrative regions):

knitr::include_graphics("https://raw.githubusercontent.com/r-spatial/stars/main/images/cube3.png")

or points (e.g. denoting sensor locations):

knitr::include_graphics("https://raw.githubusercontent.com/r-spatial/stars/main/images/cube4.png")

NetCDF’s CF-convention calls this a discrete axis.

NetCDF, GDAL

stars provides two functions to read data: read_ncdf and
read_stars, where the latter reads through GDAL. (In the future,
both will be integrated in read_stars.) For reading NetCDF files,
package RNetCDF is used, for reading through GDAL, package sf
provides the binary linking to GDAL.

For vector and raster operations, stars uses as much as possible
the routines available in GDAL and PROJ (e.g. st_transform,
rasterize, polygonize, warp). Read more about this in
the vignette on vector-raster conversions, reprojection,
warping.

Out-of-memory (on-disk) rasters

Package stars provides stars_proxy objects (currently only when
read through GDAL), which contain only the dimensions metadata and
pointers to the files on disk. These objects work lazily: reading
and processing data is postponed to the moment that pixels are
really needed (at plot time, or when writing to disk), and is done
at the lowest spatial resolution possible that still fulfills the
resolution of the graphics device. More
details are found in the stars proxy
vignette.

The following methods are currently available for stars_proxy objects:

methods(class = "stars_proxy")

Raster and vector time series analysis example

In the following, a curvilinear grid with hourly precipitation values of
a hurricane is imported and the first 12 time steps are plotted:

prec_file = system.file("nc/test_stageiv_xyt.nc", package = "stars")
(prec = read_stars(gdal_subdatasets(prec_file)[[1]]))
# or: (prec = read_ncdf(prec_file, curvilinear = c("lon", "lat"), ignore_bounds = TRUE))
sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"), "nc.gpkg") |> 
  st_transform(st_crs(prec)) -> nc # transform from NAD27 to WGS84
nc_outline = st_union(st_geometry(nc))
plot_hook = function() plot(nc_outline, border = 'red', add = TRUE)
prec |>
  slice(index = 1:12, along = "time") |>
  plot(downsample = c(3, 3, 1), hook = plot_hook)

and next, intersected with with the counties of North Carolina, where
the maximum precipitation intensity was obtained per county, and plotted:

a = aggregate(prec, by = nc, FUN = max)
plot(a, max.plot = 23, border = 'grey', lwd = .5)

We can integrate over (reduce) time, for instance to find out when
the maximum precipitation occurred. The following code finds the time
index, and then the corresponding time value:

index_max = function(x) ifelse(all(is.na(x)), NA, which.max(x))
b = st_apply(a, "geom", index_max)
b |> mutate(when = st_get_dimension_values(a, "time")[b$index_max]) |>
  select(when) |>
  plot(key.pos = 1, main = "time of maximum precipitation")

With package cubble, we can make a glyph map to see the magnitude and timings of county maximum precipitation:

library(cubble)
library(ggplot2)
a |> setNames("precip") |>
  st_set_dimensions(2, name = "tm") |>
  units::drop_units() |>
  as_cubble(key = id, index = tm) -> a.cb
a.cb |>
  face_temporal() |>
  unfold(long, lat) |>
  mutate(tm = as.numeric(tm)) |>
  ggplot(aes(x_major = long, x_minor = tm, y_major = lat, y_minor = precip)) +
  geom_sf(data = nc, inherit.aes = FALSE) +
  geom_glyph_box(width = 0.3, height = 0.1) +
  geom_glyph(width = 0.3, height = 0.1)

Other packages for data cubes

`gdalcubes`

Package gdalcubes can be used to create data cubes (or functions
from them) from image collections, sets of multi-band images with
varying

spatial resolution
spatial extent
coordinate reference systems (e.g., spread over multiple UTM zones)
observation times

and does this by resampling and/or aggregating over space and/or
time. It reuses GDAL VRT’s and gdalwarp for spatial resampling and/or
warping, and handles temporal resampling or aggregation itself.

`ncdfgeom`

ncdfgeom reads and writes vector data cubes from and to netcdf
files in a standards-compliant way.

`raster` and `terra`

Packages raster and its successor, terra are powerful packages
for handling raster maps and stacks of raster maps both in memory
and on disk, but do not address

non-raster time series,
multi-attribute rasters time series
rasters with mixed type attributes (e.g., numeric, logical, factor, POSIXct)
rectilinear or curvilinear rasters

A list of stars commands matching
existing raster commands is found in this
wiki.
A list of translations in the opposite direction (from stars to
raster or terra) still needs to be made.

A comment on the differences between stars and terra is found
here.

Other `stars` resources:

blog posts: first,
second,
third, and newer blog posts
vignettes
the original R Consortium proposal.

Acknowledgment

This project has been realized with financial
support
from the