Smoothing/Filtering a NDVI time series using a Savitzky Golay filter and R

Coarse scaled NDVI (or FAPAR, EVI…) images have a high temporal frequency and are delivered as Maximum Value Composites (MVC) of several days, which means the highest value is taken, assuming that clouds and other contaminated pixels have low values. However, especially in areas with a rainy season, the composites over 10-16 days still contain clouds and other disturbances. This figure illustrates a raw MODIS NDVI MVC of 16 days, and it’s obvious that several pixels are contaminated, and it’s also obvious that analyses will be affected by the noisy data.

unfiltered MODIS image

unfiltered MODIS image

Datasets like GIMMS or GEOV1 provide already filtered data, but e.g. MODIS, SPOT VGT, PROBA-V and AVHRR data are raw. The solution is to smooth the time series, using a filter, which calculates a smooth time series and interpolates the bad values using the previous and following images. Here is the same image, but smoothed with a Savitzky Golay filter.

filtered and smoothed MODIS image

filtered and smoothed MODIS image

The data are mostly delivered with quality rasters, rating the quality of each pixel. This can be used to either filter the raster and set bad quality pixels to  NA, or to weight the pixels. When the new time line is calculated, low weighted (i.e. contaminated) pixels are less considered in the calculation process. One possibility is the software TIMESAT (Eklundh & Jönsson, 2011), which offers different filter techniques. Here is an example how timesat smoothes the time line with a Savitzky Golay filter, omitting “bad” pixles and creating new rasters.

A Savitzky Golay filter applied on MODIS using TIMESAT

A Savitzky Golay filter applied on MODIS using TIMESAT

Filtering is also possible in R, and it’s very simple. First one has to decide if one wants to work with quality files, or simply use the raw data, both is possible. GIS software like GRASS has modules which allow an easy use of the quality files:

http://grass.osgeo.org/grass64/manuals/i.in.spotvgt.html

http://grass.osgeo.org/grass70/manuals/i.modis.qc.html

However, filtering without quality flags also provides reasonable results. Now we assume we have a time series of MODIS data from 2005-2010, with 23 images each year. This data is loaded in R in a raster stack or brick, called MODIS, bad values are masked as NA. We load the libraries, and create a function which uses the Savitzky Golay filter from the signal package. The parameters of the function need to be adapted (p, n, ts) (http://www.inside-r.org/packages/cran/signal/docs/sgolayfilt), also the time frame.

library(raster)
library(rgdal)
library(signal)

library(zoo)

fun <- function(x) {
v=as.vector(x)
z=substituteNA(v, type=”mean”)
MODIS.ts2 = ts(z, start=c(2005,1), end=c(2010,23), frequency=23)
x=sgolayfilt(MODIS.ts2, p=1, n=3, ts=30)
}

MODIS.filtered <- calc(MODIS, fun)

MODIS.filtered is a new brick containing the smoothed time series. Compare the raw with the filtered tims series:

l=cellStats(MODIS, stat=mean)
MODIS.raw.ts = ts(l, start=c(2005,1), end=c(2010,23), frequency=23)
plot(MODIS.raw.ts)
l=cellStats(MODIS.filtered, stat=mean)
MODIS.filt.ts = ts(l, start=c(2005,1), end=c(2010,23), frequency=23)
plot(MODIS.filt.ts)

One may find out the perfect fitting parameters by looking at the whole area, playing with the parameters:

l=cellStats(MODIS, stat=’mean’)
MODIS.ts = ts(l, start=2005, end=c(2010,23), frequency=23)
sg=sgolayfilt(MODIS.ts, p=3, n=9, ts=20)
sg.ts = ts(sg, start=2005, end=c(2010,23), frequency=23)

plot(sg.ts)

Eklundh, L.; Jönsson, P. Timesat 3.1 Software Manual. 2011.

Pixel-wise regression between two raster time series (e.g. NDVI and rainfall)

Doing a pixel-wise regression between two raster time series can be useful for several reasons, for example:

  • find the relation between vegetation and rainfall for each pixel, e.g. a low correlation could be a sign of degradation
  • derive regression coefficients to model the depending variable using the independend variable (e.g. model NDVI with rainfall data)
  • check the consistency between different sensors, e.g. MODIS and SPOT VGT
  • adjust the data recorded by different sensors to each other

It’s easy and fast in R. Let’s say we have two raster stacks or bricks which contain 120 rasters each, one with NDVI and one with rainfall (named “NDVI” and “rainfall”). We put everything in one single stack and derive rasters showing the slope, intercept and R² of the two time series as pixel values:

library(raster)

s <- stack(NDVI, rainfall)
fun=function(x) { if (is.na(x[1])){ NA } else { lm(x[1:120] ~ x[121:240])$coefficients[2] }}
slope <- calc(s, fun)
fun=function(x) { if (is.na(x[1])){ NA } else { lm(x[1:120] ~ x[121:240])$coefficients[1] }}
intercept <- calc(s, fun)
fun=function(x) { if (is.na(x[1])){ NA } else { m <- lm(x[1:120] ~ x[121:240]);summary(m)$r.squared }}
r.squared <- calc(s, fun)

It only works if your two stacks have the same dimensions and NA values, if not, you can try something like that:

z = mask(NDVI, rainfall)
s <- stack(z, rainfall)

and do the analysis as described above.

This example shows R² between annually summed NDVI (SPOT VGT) and TRMM rainfall from 1998-2010 for the Linguere district in Senegal, and we see that major differences exist, which need to further examined.

R² annual sums SPOT VGT NDVI and TRMM rainfall, 1998-2010 for Linguere, Senegal

R² annual sums SPOT VGT NDVI and TRMM rainfall, 1998-2010 for Linguere, Senegal

Pixel-wise time series trend anaylsis with NDVI (GIMMS) and R

The GIMMS dataset is currently offline and the new GIMMS3g will soon be released, but it does not really matter which type of data is used for this purpose. It can also be SPOT VGT, MODIS or anything else as long as the temporal resolution is high and the time frame is long enough to detect significant trends. The purpose is to do a pixelwise trend analyis and extract only significant trends over a certain period of time fora selected region. Everything is done using open and free R software. Input data are continuous NDVI images, in this case it’s GIMMS with bi-monthly images, so 24 per year for the period 1982-2006.

so let’s load all Geotiffs lying in the GIMMS directory (should be  600 in this case):

library(raster)
setwd("~/GIMMS")
sg = stack(list.files(pattern='*.tif'))
gimms = brick(sg)
rm(sg)

first, let’s create annual sums, as autocorrelation might be present with monthly values. To keep NDVI as the unit, the results are devided by 24:

fun <- function(x) { 
 gimms.ts = ts(x, start=c(1982,1), end=c(2006,24), frequency=24)
 x <- aggregate(gimms.ts) 
 }
gimms.sum <- calc(gimms, fun)
gimms.sum=gimms.sum/24
plot(gimms.sum)

then the slope is calculated to get the direction and magnitude of trends, multiplied by the number of years to get the change in NDVI units:

time <- 1:nlayers(gimms.sum) 
fun=function(x) { if (is.na(x[1])){ NA } else { m = lm(x ~ time); summary(m)$coefficients[2] }}
gimms.slope=calc(gimms.sum, fun)
gimms.slope=gimms.slope*25
plot(gimms.slope)

now we need to see which trends are significant. Thus we first extract the p-value:

fun=function(x) { if (is.na(x[1])){ NA } else { m = lm(x ~ time); summary(m)$coefficients[8] }}
p <- calc(gimms.sum, fun=fun)
plot(p, main="p-Value")

then mask all values >0.05 to get a confidence level of 95%:

m = c(0, 0.05, 1, 0.05, 1, 0)
rclmat = matrix(m, ncol=3, byrow=TRUE)
p.mask = reclassify(p, rclmat)
fun=function(x) { x[x<1] <- NA; return(x)}
p.mask.NA = calc(p.mask, fun)

and finaly mask all insignificant values in the trend map, so we only get NDVI change significant at the 95% level:

trend.sig = mask(gimms.slope, p.mask.NA)
plot(trend.sig, main="significant NDVI change")

The result could look like that:

significant NDVI change (1982-2006) using integrated GIMMS

significant NDVI change (1982-2006) using integrated GIMMS

Simple time series analysis with GIMMS NDVI and R

GIMMS NDVI time series for a selected pixel

GIMMS NDVI time series for a selected pixel

Time series analysis with satellite derived greenness indexes (e.g. NDVI) is a powerfull tool to assess environmental processes. AVHRR, MODIS and SPOT VGT provide global and daily imagery. Creating some plots is a simple task, and here is a rough start how it is done with GIMMS NDVI. All we need is the free software R.

I assume we already have the NDVI images in a folder, so the first 3 steps can be skipped, if not, they can be downloaded with a simple command:

wget -r ftp://ftp.glcf.umd.edu/glcf/GIMMS/Albers/Africa/

now unzip it:

find . -name "*.gz" -exec unp {} \;

As it comes in Albers projection, I reproject the Tiffs to a Lat Long WGS84 projection:

for in in *tif; do gdalwarp -s_srs EPSG:9001 -t_srs EPSG:4326 -r near -of GTiff $i /home/martin/Dissertation/GIMMS/GIMMS_lat/`basename $i`; done

I delete all 1981 images, as the year is not complete.

next, we start R, load the raster package and set the directory with the GIMMS Geotiffs:

library(raster)
library(rgdal)
setwd("~/GIMMS")

now load the Tiffs as a raster brick:

sg = stack(list.files(pattern='*.tif'))
gimmsb = brick(sg)
rm(sg)

now let’s chose a pixel and create a time series object. It can either be done by Row and Col:

i=20 # row
j=23 # col
gimms = gimmsb[i,j]
gimms.ts = ts(gimms, start=1982, end=c(2006,24), frequency=24)

or by coordinates

xy <- cbind(-3.6249,14.3844) 
sp <- SpatialPoints(xy) 
data <- extract(gimmsb, sp)
data=as.vector(data)
gimms.ts = ts(data, start=1982, end=c(2006,24), frequency=24)

now the time series can be used for further analysis or plotted:

plot(gimms.ts)

A regression line and a LOESS smoothing can be added:

gimms.time = time(gimms.ts) 
plot(gimms.ts)
abline(reg=lm(gimms.ts ~ gimms.time))
lines(lowess(gimms.ts, f=0.3, iter=5), col="blue")

some playing with boxplots and annual values:

layout(1:2)
plot(aggregate(gimms.ts)); 
boxplot(gimms.ts ~ cycle(gimms.ts))
GIMMS time series analysis

GIMMS time series analysis

averaged over the sudy area:

g=cellStats(gimmsb, stat='mean')
g.ts = ts(g, start=1982, end=c(2006,24), frequency=24)
plot(g.ts)

There are great possibilities for time series analysis in R, e.g. the series can be smoothed, decomposed…etc. The zoo package provides further functions and can handle irregular series. See also the lattice package for better visual presentation. How pixelwise analyis is conducted for time series of rasters, I’ll write in another post.

Converting GPCC gridded rainfall 1901-2010@0.5° to monthly Geotiffs

GPCC rainfall data contains the largest database worldwide, with over 90 000 weather-stations interpolated to a 0.5° grid. It includes S. Nicholsons dataset for Africa and is thus a good source for gridded monthly precipitation. It covers the period 1901-2010, however, it comes in a strange dataformat and it is a long way until we get such diagrams for the desired location:

GPCC annual rainfall averaged over my study area in northern Senegal

GPCC annual rainfall averaged over my study area in northern Senegal

As I couldn’t handle with the format provided at the DWD download site, I used the climate explorer to generate a ncdf-file. Use this link to get the global dataset: GPCC-V6.

Then this file is loaded into R. After loading the packages, the working directory which contains the file is set:

library(raster)
library(rgdal)
library(ncdf)
setwd("~/Dissertation/GPCC/")

Then the variable “prcp” is loaded into R and my study area clipped. To find out the correct variable, use gdalinfo e.g.

gpcc = brick("gpcc_V6_05.nc", varname="prcp")
gpcc
plot(gpcc)
e <- extent(-20, 5, 9, 26)
gpccclip <- crop(gpcc, e)

Then it’s exported as a multilayer Geotiff:

writeRaster(gpccclip, filename="gpcc_v6_westafr.tif", format="GTiff")

now I extract the individual rasters and create 1320 single files. I exit R for this:

cd ~/Dissertation/GPCC/
gdalinfo gpcc_v6_westafr.tif
for ((i = 1; i<= 1320; i++)) do gdal_translate -b $i -of GTiff gpcc_v6_westafr.tif `basename $i`.tif; done

Finaly the files are renamed. This may not be an elegant way, but it works:

for ((i=1;$i<=1320;i++)) do echo "mv $i.tif gpcc_v6_$(( ($i-1) / 12 + 1901 ))_$(( ($i-1) % 12 + 1 )).tif"; done >Umbenennungsscript.sh
sed -i 's/_1.tif/_01.tif/g' Umbenennungsscript.sh
sed -i 's/_2.tif/_02.tif/g' Umbenennungsscript.sh
sed -i 's/_3.tif/_03.tif/g' Umbenennungsscript.sh
sed -i 's/_4.tif/_04.tif/g' Umbenennungsscript.sh
sed -i 's/_5.tif/_05.tif/g' Umbenennungsscript.sh
sed -i 's/_6.tif/_06.tif/g' Umbenennungsscript.sh
sed -i 's/_7.tif/_07.tif/g' Umbenennungsscript.sh
sed -i 's/_8.tif/_08.tif/g' Umbenennungsscript.sh
sed -i 's/_9.tif/_09.tif/g' Umbenennungsscript.sh
chmod 0755 Umbenennungsscript.sh
./Umbenennungsscript.sh

The final result are 1320 Geotiffs, named gpcc_v6_1901_01.tif until gpcc_v6_2010_12.tif with monthly rainfall for West Africa, ready to be used in any GIS. Now we can easily create beatiful maps like this:

Mean annual rainfall 1950-2010 (source: GPCC v6)

Mean annual rainfall 1950-2010 (source: GPCC v6)

 

Automatically downloading and processing TRMM rainfall data

TRMM rainfall data maybe is the most accurate rainfall data derived from satellite measurements and a valuable source in regions with scarse weather-stations. It has a good spatial (0.25°) and temporal (daily) resolution and is available since 1998. However, downloading and processing may be a lot of work, if not scripted. The following script may be badly coded, but it works. All you need is the open source software R. This script is written for a Linux environment.

First, load some packages:

library(raster)
 library(rgdal)
 library(RCurl)

Then, download the data from the FTP server: ftp://disc2.nascom.nasa.gov/data/TRMM/Gridded/. This example downloads 3B43 monthly data version 7 for 2012. The first line has to be changed for the desired data:

url = "ftp://disc2.nascom.nasa.gov/data/TRMM/Gridded/3B43_V7/2012/"
filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly=TRUE, crlf=TRUE)
 filenames = paste(url, strsplit(filenames, "\r*\n") [[1]], sep="")
filenames = filenames[grep(filenames, pattern="*precipitation.accum")] # this line allows to download only files with a certain pattern, e.g. only certain months or days. "*precipitation.accum" means monthly accumulated rainfall here.
filenames # list all files
mapply(download.file, filenames, basename(filenames)) # download files

Now I create a virtual file as the downloaded TRMM data come as binarys. The VRT-file contains all the downloaded binary files with the appropriate geo-informations. To automate the process, the following script generates the VRT-file (TRMM.vrt) for all 2012 data. Change “3B43.12” accordingly to the downloaded data. Save this example as trmm2012.sh.

echo '<VRTDataset rasterXSize="1440" rasterYSize="400"> 
 <Geotransform>-180,0.25,0,50, 0, -0.25</Geotransform> 
 <SRS>WGS84</SRS>' >TRMM.vrt;
for i in 3B43.12*
do echo '<VRTRasterBand dataType="Float32" band="1" subClass="VRTRawRasterBand"> 
 <SourceFilename relativeToVRT="1">'$i'</SourceFilename> 
 <ByteOrder>MSB</ByteOrder>
 <ImageOffset>0</ImageOffset>
 <PixelOffset>4</PixelOffset>
 <LineOffset>5760</LineOffset>
 </VRTRasterBand>' >>TRMM.vrt
done;
echo '</VRTDataset>' >>TRMM.vrt

Within R, the script (trmm2012.sh) is executed and the virtual-file (TRMM.vrt) loaded as a rasterbrick. This is flipped in y direction and the files written as multilayer Geotiff. This Geotiff contains all the layers for 2012 and can be opened in every GIS software.

system("./trmm2012.sh")
 b=brick("TRMM.vrt")
 b
 trmm = flip(b, direction='y')
 writeRaster(trmm, filename="trmm_acc_2012.tif", format="GTiff", overwrite=TRUE)