The approach of ensemble spatiotemporal mixed models is to make reliable estimation of air pollutant concentrations at high resolutions. (1) Extraction of covariates from the satellite images such as GeoTiff and NC4 raster (e.g NDVI, AOD, and meteorological parameters); (2) Generation of temporal basis functions to simulate the seasonal trends in the study regions; (3) Generation of the regional monthly or yearly means of air pollutant concentration; (4) Generation of Thiessen polygons and spatial effect modeling; (5) Ensemble modeling for spatiotemporal mixed models, supporting multi-core parallel computing; (6) Integrated predictions with or without weights of the model's performance, supporting multi-core parallel computing; (7) Constrained optimization to interpolate the missing values; (8) Generation of the grid surfaces of air pollutant concentrations at high resolution; (9) Block kriging for regional mean estimation at multiple scales.
knitr::opts_chunk$set(echo = TRUE)
It is an ensemble spatiotemporal modeling tool with constrained optimization and also provides the functionality of grid mapping of air pollutants and parallel computing.
Specifically, it includes the following functionality:
The following codes show the functions, extractVTIF and extractVNC4 to obtain the values for the subject locations with their timeline from GEoTiff (left) and BC4 (right) image files.
data("gtifRst","samplepnt")
tvals=sptemExp::extractVTIF(samplepnt,gtifRst)
par(mfrow=c(1,2),mar=c(4,4,1,1))
raster::plot(gtifRst)
raster::plot(samplepnt,add=TRUE)
nc4File=file.path(system.file(package = "sptemExp"), "extdata", "ancdata.nc4")
ncin0=ncdf4::nc_open(nc4File)
extRes=sptemExp::extractVNC4(samplepnt,ncin0,"TLML")
raster::plot(extRes$img)
print("geotiff values:")
print(tvals[c(1:5)])
print("nc4 values:")
print(extRes$val[c(1:5)])
You can use the function, getTBasisFun to extract the temporal basis functions (usually the first two principle components).
data("shdSeries2014")
#Extract the temporal basis functions
result=sptemExp::getTBasisFun(shdSeries2014,"siteid","date","obs",df=8,n.basis=2)
#Plot the results
par(mfrow=c(1,2),mar=c(4,4,1,1))
plot(result$date,result$pv1,type="l",xlab="Date",ylab="1st Temporal Basis Function")
plot(result$date,result$pv2,type="l",xlab="Date",ylab="2nd Temporal Basis Function")
We can generate the Thiessen polygons using the spatial points and border map. The function, tpolygonsByBorder is designed for this.
data("samplepnt","prnside")
x=samplepnt
sidepoly=prnside
par(mar=c(1,1,1,1))
tpoly=sptemExp::tpolygonsByBorder(x,sidepoly)$tpolys
raster::plot(samplepnt,add=T)
The base spatiotemporal mixed models can be trained using the bootstrap datasets with support of multiple cores. The following examples illustrate the use of the function, parSpModel to train 12 models and the trained models are saved in the appointed model path. In practical application, you can train more models like 80 or more to get better performance.
#Set the temporary path to save the models and their performance metrics:
dPath=tempdir()
mPath=paste(dPath,"/models",sep="")
unlink(mPath,recursive = TRUE,force=TRUE)
dir.create(mPath)
#Load the dataset of Shandong PM2.5
data("trainsample","bnd")
#The formula where sx(rid, bs = "mrf", map =bnd) is spatial random effect. You can add unstructured item and other items. See R2BayesX.
formulaStrs=c(paste('logpm25 ~ sx(rid, bs = "mrf", map =bnd)+sx(monthAv,bs="rw2")+sx(ndvi,bs="rw2") + sx(aod,bs="rw2") ','+sx(wnd_avg,bs="rw2")',sep=""))
trainsample$tid=as.numeric(strftime(trainsample$date, format = "%j"))
trainsample$logpm25=log(trainsample$pm25)
tids=c(91) # the day of year for 2014
#Train the model using 2 cores and construct 12 models:
sptemExp::parSpModel(trainsample,bnd,formulaStrs,tidF="tid",tids,c=1,nM=10,
mPath,idF="siteid",dateF="date",obsF="pm25")
Check the performance (rSquared and rmse) of every model and the total performance.
#Check the performance of everty model
mfile=paste(dPath,"/models/t_",tids[1],"_metrics.csv",sep="")
models_metrics=read.csv(mfile,row.names = NULL)
knitr::kable(models_metrics,caption="Performance of every model by bootstrap aggregation")
#Check the total performance
mtfile=paste(dPath,"/models/t_",tids[1],"_total_metric.csv",sep="")
t_metrics=read.csv(mtfile,row.names = NULL)
knitr::kable(t_metrics,caption="Total performance by bootstrap aggregation")
Using the models trained in the above step to make the predictions for the new dataset covering Shandong Province. Here we just use the 2000 cells but you can set the value to be all the cells.
#Set the output path of the predictions:
prePath=paste(dPath,"/preds",sep="")
dir.create(prePath)
#Load the prediction dataset of covariates
amodelPath=paste(dPath,"/models/t_",tids[1],"_models",sep="")
data("shd140401pcovs","bnd")
shd140401pcovs_part=shd140401pcovs[c(1:2000),]
#cols lists the field names of covariates to be used in the models. Then call parATimePredict to implement parallel predictions (the argument, c is the core number)
cols=c("aod","ndvi","wnd_avg","monthAv")
sptemExp::parATimePredict(amodelPath,newPnts=shd140401pcovs_part,cols,bnd=bnd,c=2,prePath,idF="gid",ridF="rid")
Get the weighted averages by the function, weiA2Ens. You can also get the simple averages (not weighted) by the function, noweiAvg.
result=sptemExp::weiA2Ens(prePath,mfile,metrF="rmse","pre","gid","gid")
knitr::kable(result[c(1:5),],caption="Predicted result")
Use the function of constrained optimization, inter2conOpt to get the values for the missing items.
data("allPre500","shdSeries2014")
#Get the temporal basis functions to be used in constrained optimization
season_trends=sptemExp::getTBasisFun(shdSeries2014,idStr="siteid",dateStr="date",
valStr="obs",df=10,n.basis=2,tbPath=NA)
season_trends$tid=as.numeric(strftime(season_trends$date, format = "%j"))
#Constrained optimization
allPre_part_filled=sptemExp::inter2conOpt(tarPDf=allPre500[c(1:6),],pol_season_trends=season_trends,ncore=2)
knitr::kable(allPre_part_filled[c(1:6),],caption="")
Generate the grid images of concentrations estimated.
data("spointspre","countylayer")
praster=sptemExp::points2Raster(spointspre,"d91")
dtStr=as.character(as.Date(91,origin=as.Date("2014-01-1")))
title=expression("PM"[2.5]*" Concentration Estimated")
#raster::plot(praster,col=terrain.colors(255),main=title,xlab=paste("Shandong Province, China (",dtStr,")",sep=""))
par(mar=c(4,4,1,1))
breakpoints = c(0,50,100,200,350,600)
colors=sptemExp::colorCusGrinf(breakpoints,c("darkgreen","yellow","darkred"))
raster::plot(praster,breaks=breakpoints,col=colors,
main=title,xlab=paste("Shandong Province, China (",dtStr,")",sep=""))
The function, getTidBKMean is for generation of the regional krged means, supporting parallel compution.
data("spointspre","countylayer")
tarF="d91" # target variable to be kriged
regionName="NAME_3"
bkRes=sptemExp::getTidBKMean(spointspre,countylayer,regionName,tarF,2)
# In the functions, the variogram is automatically fitted using autovariogram in automap package.
Show the result of the regional kriged means of PM2.5 concentration across the study region.
# The output from the above step has bkm_fill is the regional kriged estimate.
bkRes=bkRes[!is.na(bkRes$bkm_fill),]
levels=c(30,60,100,150,250)
cr=sptemExp::colorGrinf(bkRes$bkm_fill,levels,colors=c("darkgreen","yellow","darkred"))
par(mar=c(1,1,1,1))
title=expression("Regional Block Kriged PM"[2.5]*" Concentration Estimated")
raster::plot(bkRes,col =cr$cols[cr$index],main=title)
legend("bottomright", fill =cr$cols, legend = cr$levels,col =cr$cols, cex=1,bty="n",bg="n")