sptemExp

The approach of ensemble spatiotemporal mixed models is to make reliable estimation of air pollutant concentrations at high resolutions. (1) Extraction of covariates from the satellite images such as GeoTiff and NC4 raster (e.g NDVI, AOD, and meteorological parameters); (2) Generation of temporal basis functions to simulate the seasonal trends in the study regions; (3) Generation of the regional monthly or yearly means of air pollutant concentration; (4) Generation of Thiessen polygons and spatial effect modeling; (5) Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing; (6) Integrated predictions with or without weights of the model's performance, supporting multi-core parallel computing; (7) Constrained optimization to interpolate the missing values; (8) Generation of the grid surfaces of air pollutant concentrations at high resolution; (9) Block kriging for regional mean estimation at multiple scales.


title: “Ensemble Spatiotemporal Mixed Models for Exposure Estimation”
output: github_document

knitr::opts_chunk$set(echo = TRUE)

sptemExp: a R package for spatiotemporal exposure estimation

Lianfa Li (Email: [email protected])

It is an ensemble spatiotemporal modeling tool with constrained optimization and also provides the functionality of grid mapping of air pollutants and parallel computing.

Specifically, it includes the following functionality:

  • Extraction of covariates from the satellite images such as geoTiff and nc4 raster (e.g NDVI, AOD, and meteorological parameters);
  • Generation of temporal basis functions to simulate the seasonal trends in the study regions;
  • Generation of Thiessen polygons and spatial effect modeling;
  • Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing;
  • Integrated prediction with or without weights of the model’s performance, supporting multi-core parallel computing;
  • Constrained optimization to interpolate the missing values;
  • Generation of the grid surfaces of air pollutant concentrations at high resolution;
  • Block kriging for regional mean estimation at multiple scales.

Extraction of the covariates from satellite images

The following codes show the functions, extractVTIF and extractVNC4 to obtain the values for the subject locations with their timeline from GEoTiff (left) and BC4 (right) image files.

data("gtifRst","samplepnt")
tvals=sptemExp::extractVTIF(samplepnt,gtifRst)
par(mfrow=c(1,2),mar=c(4,4,1,1))
raster::plot(gtifRst)
raster::plot(samplepnt,add=TRUE)

nc4File=file.path(system.file(package = "sptemExp"), "extdata", "ancdata.nc4")
ncin0=ncdf4::nc_open(nc4File)
extRes=sptemExp::extractVNC4(samplepnt,ncin0,"TLML")
raster::plot(extRes$img)

print("geotiff values:")
print(tvals[c(1:5)])
print("nc4 values:")
print(extRes$val[c(1:5)])

Generation of the temporal basis function

You can use the function, getTBasisFun to extract the temporal basis functions (usually the first two principle components).

data("shdSeries2014")
#Extract the temporal basis functions 
result=sptemExp::getTBasisFun(shdSeries2014,"siteid","date","obs",df=8,n.basis=2)
#Plot the results 
par(mfrow=c(1,2),mar=c(4,4,1,1))
plot(result$date,result$pv1,type="l",xlab="Date",ylab="1st Temporal Basis Function")
plot(result$date,result$pv2,type="l",xlab="Date",ylab="2nd Temporal Basis Function")

Generation of the Thiessen polygons for spatial effect modeling

We can generate the Thiessen polygons using the spatial points and border map. The function, tpolygonsByBorder is designed for this.

data("samplepnt","prnside")
x=samplepnt
sidepoly=prnside
par(mar=c(1,1,1,1))
tpoly=sptemExp::tpolygonsByBorder(x,sidepoly)$tpolys
raster::plot(samplepnt,add=T)

Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing

The base spatiotemporal mixed models can be trained using the bootstrap datasets with support of multiple cores. The following examples illustrate the use of the function, parSpModel to train 12 models and the trained models are saved in the appointed model path. In practical application, you can train more models like 80 or more to get better performance.

#Set the temporary path to save the models and their performance metrics: 
dPath=tempdir()
mPath=paste(dPath,"/models",sep="")
unlink(mPath,recursive = TRUE,force=TRUE)
dir.create(mPath)

#Load the dataset of Shandong PM2.5 
data("trainsample","bnd")

#The formula where sx(rid, bs = "mrf", map =bnd) is spatial random effect.  You can add unstructured item and other items.  See R2BayesX. 
formulaStrs=c(paste('logpm25 ~ sx(rid, bs = "mrf", map =bnd)+sx(monthAv,bs="rw2")+sx(ndvi,bs="rw2") + sx(aod,bs="rw2") ','+sx(wnd_avg,bs="rw2")',sep=""))

trainsample$tid=as.numeric(strftime(trainsample$date, format = "%j"))
trainsample$logpm25=log(trainsample$pm25)
tids=c(91) # the day of year for 2014 

#Train the model using 2 cores and construct 12 models: 
sptemExp::parSpModel(trainsample,bnd,formulaStrs,tidF="tid",tids,c=1,nM=10,
           mPath,idF="siteid",dateF="date",obsF="pm25")

Check the performance (rSquared and rmse) of every model and the total performance.

#Check the performance of everty model
mfile=paste(dPath,"/models/t_",tids[1],"_metrics.csv",sep="")
models_metrics=read.csv(mfile,row.names = NULL)
knitr::kable(models_metrics,caption="Performance of every model by bootstrap aggregation")
#Check the total performance  
mtfile=paste(dPath,"/models/t_",tids[1],"_total_metric.csv",sep="")
t_metrics=read.csv(mtfile,row.names = NULL)
knitr::kable(t_metrics,caption="Total performance by  bootstrap aggregation")

Integrate the predictions with or without weights of the model’s performance

Using the models trained in the above step to make the predictions for the new dataset covering Shandong Province. Here we just use the 2000 cells but you can set the value to be all the cells.


#Set the output path of the predictions: 
prePath=paste(dPath,"/preds",sep="")
dir.create(prePath)

#Load the prediction dataset of covariates 
amodelPath=paste(dPath,"/models/t_",tids[1],"_models",sep="")
data("shd140401pcovs","bnd")
shd140401pcovs_part=shd140401pcovs[c(1:2000),]

#cols lists the field names of covariates to be used in the models. Then call parATimePredict to implement parallel predictions (the argument, c is the core number)
cols=c("aod","ndvi","wnd_avg","monthAv")
sptemExp::parATimePredict(amodelPath,newPnts=shd140401pcovs_part,cols,bnd=bnd,c=2,prePath,idF="gid",ridF="rid")

Get the weighted averages by the function, weiA2Ens. You can also get the simple averages (not weighted) by the function, noweiAvg.


result=sptemExp::weiA2Ens(prePath,mfile,metrF="rmse","pre","gid","gid")
knitr::kable(result[c(1:5),],caption="Predicted result")

Constrained optimization to interpolate the missing values

Use the function of constrained optimization, inter2conOpt to get the values for the missing items.


data("allPre500","shdSeries2014")

#Get the temporal basis functions to be used in constrained optimization 
season_trends=sptemExp::getTBasisFun(shdSeries2014,idStr="siteid",dateStr="date",
                           valStr="obs",df=10,n.basis=2,tbPath=NA)
season_trends$tid=as.numeric(strftime(season_trends$date, format = "%j")) 

#Constrained optimization 
allPre_part_filled=sptemExp::inter2conOpt(tarPDf=allPre500[c(1:6),],pol_season_trends=season_trends,ncore=2)
knitr::kable(allPre_part_filled[c(1:6),],caption="")


Show the grids of concentration estimated

Generate the grid images of concentrations estimated.

data("spointspre","countylayer")
praster=sptemExp::points2Raster(spointspre,"d91")
dtStr=as.character(as.Date(91,origin=as.Date("2014-01-1")))
title=expression("PM"[2.5]*" Concentration Estimated")
#raster::plot(praster,col=terrain.colors(255),main=title,xlab=paste("Shandong Province, China (",dtStr,")",sep=""))
par(mar=c(4,4,1,1))
breakpoints = c(0,50,100,200,350,600)
colors=sptemExp::colorCusGrinf(breakpoints,c("darkgreen","yellow","darkred"))
raster::plot(praster,breaks=breakpoints,col=colors,
     main=title,xlab=paste("Shandong Province, China (",dtStr,")",sep=""))

Block kriging for regional mean estimation at multiple scales.

The function, getTidBKMean is for generation of the regional krged means, supporting parallel compution.


data("spointspre","countylayer")
tarF="d91" # target variable to be kriged 
regionName="NAME_3"
bkRes=sptemExp::getTidBKMean(spointspre,countylayer,regionName,tarF,2)

# In the functions, the variogram is automatically fitted using autovariogram in automap package. 

Show the result of the regional kriged means of PM2.5 concentration across the study region.

# The output from the above step has bkm_fill is the regional kriged estimate.  
bkRes=bkRes[!is.na(bkRes$bkm_fill),]
levels=c(30,60,100,150,250)
cr=sptemExp::colorGrinf(bkRes$bkm_fill,levels,colors=c("darkgreen","yellow","darkred"))
par(mar=c(1,1,1,1))
title=expression("Regional Block Kriged PM"[2.5]*" Concentration Estimated")
raster::plot(bkRes,col =cr$cols[cr$index],main=title)
legend("bottomright", fill =cr$cols, legend = cr$levels,col =cr$cols, cex=1,bty="n",bg="n")