Data handling

Downloading and processing MODIS with fast and free tools

Here’s a simple way of processing MODIS: downloading, extracting the bands from HDF as GEOTIFF, merging tiles and reprojecting. Area extraction, NDVI calculation, cloud filtering and more can be done in GRASS for example. It’s Linux based, but can easily used in Windows. My way might not be the most elegant, but it’s fast and works fine.

The first step is to select the tiles, the time frame and product at http://reverb.echo.nasa.gov/reverb/. I take MOD09Q1 (250 m, 8 day, collection 6 reflectance) as example. Once orderd, we get a TXT file containing all file names. We download all files using wget. After download i mass rename all files to remove the creation date (using e.g. total commander) so the files are named like: MOD09Q1.A2000065.h21v05.006.

Then I extract Band 1, Band 2 and the Quality flag as GeoTiff. “aus” is the folder were the TIFFS will end up. Find out the name of the bands using gdalinfo.

wget -i modis.txt
aus="/media/data/MODIS/tiffs/"
for i in *.hdf; do gdal_translate -of GTiff 'HDF4_EOS:EOS_GRID:"'$i'":MOD_Grid_250m_Surface_Reflectance:sur_refl_b01' $aus`basename $i .hdf`_REF_band1.tif; done
for i in *.hdf; do gdal_translate -of GTiff 'HDF4_EOS:EOS_GRID:"'$i'":MOD_Grid_250m_Surface_Reflectance:sur_refl_b02' $aus`basename $i .hdf`_REF_band2.tif; done
for i in *.hdf; do gdal_translate -of GTiff 'HDF4_EOS:EOS_GRID:"'$i'":MOD_Grid_250m_Surface_Reflectance:sur_refl_qc_250m' $aus`basename $i .hdf`_QA.tif; done

I did not find a clean way for merging, so i suggest my “dirty” way, it needs to be repeated for each year and band, this is 5 minutes of copying but you can use it again and again and again for the next years then….here’s an example for band 1 year 2000. If you use a 16 day product every second line will give an error, but that you can ignore.

# 2000
gdal_merge.py -o 2000001_band1.tif *2000001*
gdal_merge.py -o 2000009_band1.tif *2000009*
gdal_merge.py -o 2000017_band1.tif *2000017*
gdal_merge.py -o 2000025_band1.tif *2000025*
gdal_merge.py -o 2000033_band1.tif *2000033*
gdal_merge.py -o 2000041_band1.tif *2000041*
gdal_merge.py -o 2000049_band1.tif *2000049*
gdal_merge.py -o 2000057_band1.tif *2000057*
gdal_merge.py -o 2000065_band1.tif *2000065*
gdal_merge.py -o 2000073_band1.tif *2000073*
gdal_merge.py -o 2000081_band1.tif *2000081*
gdal_merge.py -o 2000089_band1.tif *2000089*
gdal_merge.py -o 2000097_band1.tif *2000097*
gdal_merge.py -o 2000105_band1.tif *2000105*
gdal_merge.py -o 2000113_band1.tif *2000113*
gdal_merge.py -o 2000121_band1.tif *2000121*
gdal_merge.py -o 2000129_band1.tif *2000129*
gdal_merge.py -o 2000137_band1.tif *2000137*
gdal_merge.py -o 2000145_band1.tif *2000145*
gdal_merge.py -o 2000153_band1.tif *2000153*
gdal_merge.py -o 2000161_band1.tif *2000161*
gdal_merge.py -o 2000169_band1.tif *2000169*
gdal_merge.py -o 2000177_band1.tif *2000177*
gdal_merge.py -o 2000185_band1.tif *2000185*
gdal_merge.py -o 2000193_band1.tif *2000193*
gdal_merge.py -o 2000201_band1.tif *2000201*
gdal_merge.py -o 2000209_band1.tif *2000209*
gdal_merge.py -o 2000217_band1.tif *2000217*
gdal_merge.py -o 2000225_band1.tif *2000225*
gdal_merge.py -o 2000233_band1.tif *2000233*
gdal_merge.py -o 2000241_band1.tif *2000241*
gdal_merge.py -o 2000249_band1.tif *2000249*
gdal_merge.py -o 2000257_band1.tif *2000257*
gdal_merge.py -o 2000265_band1.tif *2000265*
gdal_merge.py -o 2000273_band1.tif *2000273*
gdal_merge.py -o 2000281_band1.tif *2000281*
gdal_merge.py -o 2000289_band1.tif *2000289*
gdal_merge.py -o 2000297_band1.tif *2000297*
gdal_merge.py -o 2000305_band1.tif *2000305*
gdal_merge.py -o 2000313_band1.tif *2000313*
gdal_merge.py -o 2000321_band1.tif *2000321*
gdal_merge.py -o 2000329_band1.tif *2000329*
gdal_merge.py -o 2000337_band1.tif *2000337*
gdal_merge.py -o 2000345_band1.tif *2000345*
gdal_merge.py -o 2000353_band1.tif *2000353*
gdal_merge.py -o 2000361_band1.tif *2000361*

# We can remove the other files now 
rm MOD*
# Then reproject everything to a geographic system, maybe into a new folder:
mkdir lat
aus="/media/data/MODIS/tiffs/lat/"

for i in *_band1.tif; do gdalwarp -t_srs EPSG:4326 $i $aus`basename $i .tif`.tif; done
for i in *_band2.tif; do gdalwarp -t_srs EPSG:4326 $i $aus`basename $i .tif`.tif; done
for i in ???????_QA.tif; do gdalwarp -t_srs EPSG:4326 $i $aus`basename $i .tif`.tif; done

What we have now can already be used, but you can continue in GRASS to calculate NDVI and filter the data.

 

Advertisements
Research results

Remote sensing of vegetation in drylands: Evaluating vegetation optical depth (VOD) using NDVI and in situ data over Sahel

Tian, F.; Brandt, M.; Liu, Y. Y.; Verger, A.; Tagesson, T.; Diouf, A. A.; Rasmussen, K.; Mbow, C.; Wang, Y.; Fensholt, R. Remote sensing of vegetation dynamics in drylands: Evaluating vegetation optical depth (VOD) using AVHRR NDVI and in situ green biomass data over West African Sahel. Remote Sensing of Environment 2016, 177, 265–276.

 

  •  A Long-term VOD dataset is evaluated against NDVI and in situ biomass observations.
  • Both VOD and NDVI reflect the spatio-temporal patterns of biomass in West Sahel.
  • VOD captures variations of woody plant foliage biomass better than NDVI.
  • VOD and NDVI seasonal metrics differ for optimal long-term monitoring of biomass.

Monitoring long-term biomass dynamics in drylands is of great importance for many environmental applications including land degradation and global carbon cycle modeling. Biomass has extensively been estimated based on the normalized difference vegetation index (NDVI) as a measure of the vegetation greenness. The vegetation optical depth (VOD) derived from satellite passive microwave observations is mainly sensitive to the water content in total aboveground vegetation layer. VOD therefore provides a complementary data source to NDVI for monitoring biomass dynamics in drylands, yet further evaluations based on ground measurements are needed for an improved understanding of the potential advantages.

In this study, we assess the capability of a long-term VOD dataset (1992–2011) to capture the temporal and spatial variability of in situ measured green biomass (herbaceous mass and woody plant foliage mass) in the semi-arid Senegalese Sahel.

Results show that the magnitude and peak time of VOD are sensitive to the woody plant foliage whereas NDVI seasonality is primarily governed by the green herbaceous vegetation stratum in the study area. Moreover, VOD is found to be more robust against typical NDVI drawbacks of saturation effect and dependence on plant structure (herbaceous and woody compositions) across the study area when used as a proxy for vegetation productivity. Finally, both VOD and NDVI well reflect the spatial and inter-annual dynamics of the in situ green biomass data; however, the seasonal metrics showing the highest degree of explained variance differ between the two data sources. While the observations in October (period of in situ data collection) perform best for VOD (r2 = 0.88), the small growing season integral (sensitive to recurrent vegetation) have the highest correlations for NDVI (r2 = 0.90).

Overall, in spite of the coarse resolution of 25 km, the study shows that VOD is an efficient proxy for estimating biomass of the entire vegetation stratum in the semi-arid Sahel and likely also in other dryland areas.

1-s2.0-S0034425716300852-fx1

Data handling

Be cautious using GIMMS3g v0 for trend analysis!

GIMMS3g NDVI is widely used to assess vegetation trends from local to global scale. And even though it is the best long term dataset available right now (July 2015), people should be aware that a serious error affects the dataset in semi arid areas, like the Sahel, which has severe impacts on trend analysis. The dry season values around 2004 suddenly drop, which is not happening in reality. It is definitely related to the sensor change from NOAA16 to NOAA17, and also the last change to NOAA18 has an impact on the time series. The good thing is that the wet season values should be usable. The developers are aware about this problem and a new version is on it’s way. To illustrate the issue, find attached the GIMMS3g NDVI (v0) averaged over the Sahel belt, note the drop at the sensor change (first blue line) and the recovery at the next sensor change:

gimms

To further illustrate how serious this problem affects trend analysis, the same area is shown in the VOD dataset, which is found to be reliable:vod

a rough overlay highlights the different directions of annual trends:

gimms_vod

The problem can be solved by excluding the dry season and using the small integral, calculated in TIMESAT:

sintgim_vo

see further literature:

  • Tian, F., Fensholt, R., Verbesselt, J., Grogan, K., Horion, S., Wang, Y., 2015. Evaluating temporal consistency of long-term global NDVI datasets for trend analysis. Remote Sensing of Environment 163, 326–340. doi:10.1016/j.rse.2015.03.031
  • Horion, S., Fensholt, R., Tagesson, T., Ehammer, A., 2014. Using earth observation-based dry season NDVI trends for assessment of changes in tree cover in the Sahel. International Journal of Remote Sensing 35, 2493–2515. doi:10.1080/01431161.2014.883104
  • Pinzon, J.E., Tucker, C.J., 2014. A Non-Stationary 1981–2012 AVHRR NDVI3g Time Series. Remote Sensing 6, 6929–6960. doi:10.3390/rs6086929
  • Liu, Y.Y., van Dijk, A.I.J.M., McCabe, M.F., Evans, J.P., de Jeu, R.A.M., 2013. Global vegetation biomass change (1988–2008) and attribution to environmental and human drivers. Global Ecology and Biogeography 22, 692–705. doi:10.1111/geb.12024
  • Jonsson, P., Eklundh, L., 2004. TIMESAT–a program for analyzing time-series of satellite sensor data* 1. Computers & Geosciences 30, 833–845.
Data handling

Smoothing/Filtering a NDVI time series using a Savitzky Golay filter and R

Coarse scaled NDVI (or FAPAR, EVI…) images have a high temporal frequency and are delivered as Maximum Value Composites (MVC) of several days, which means the highest value is taken, assuming that clouds and other contaminated pixels have low values. However, especially in areas with a rainy season, the composites over 10-16 days still contain clouds and other disturbances. This figure illustrates a raw MODIS NDVI MVC of 16 days, and it’s obvious that several pixels are contaminated, and it’s also obvious that analyses will be affected by the noisy data.

unfiltered MODIS image
unfiltered MODIS image

Datasets like GIMMS or GEOV1 provide already filtered data, but e.g. MODIS, SPOT VGT, PROBA-V and AVHRR data are raw. The solution is to smooth the time series, using a filter, which calculates a smooth time series and interpolates the bad values using the previous and following images. Here is the same image, but smoothed with a Savitzky Golay filter.

filtered and smoothed MODIS image
filtered and smoothed MODIS image

The data are mostly delivered with quality rasters, rating the quality of each pixel. This can be used to either filter the raster and set bad quality pixels to  NA, or to weight the pixels. When the new time line is calculated, low weighted (i.e. contaminated) pixels are less considered in the calculation process. One possibility is the software TIMESAT (Eklundh & Jönsson, 2011), which offers different filter techniques. Here is an example how timesat smoothes the time line with a Savitzky Golay filter, omitting “bad” pixles and creating new rasters.

A Savitzky Golay filter applied on MODIS using TIMESAT
A Savitzky Golay filter applied on MODIS using TIMESAT

Filtering is also possible in R, and it’s very simple. First one has to decide if one wants to work with quality files, or simply use the raw data, both is possible. GIS software like GRASS has modules which allow an easy use of the quality files:

http://grass.osgeo.org/grass64/manuals/i.in.spotvgt.html

http://grass.osgeo.org/grass70/manuals/i.modis.qc.html

However, filtering without quality flags also provides reasonable results. Now we assume we have a time series of MODIS data from 2005-2010, with 23 images each year. This data is loaded in R in a raster stack or brick, called MODIS, bad values are masked as NA. We load the libraries, and create a function which uses the Savitzky Golay filter from the signal package. The parameters of the function need to be adapted (p, n, ts) (http://www.inside-r.org/packages/cran/signal/docs/sgolayfilt), also the time frame.

library(raster)
library(rgdal)
library(signal)

library(zoo)

fun <- function(x) {
v=as.vector(x)
z=substituteNA(v, type=”mean”)
MODIS.ts2 = ts(z, start=c(2005,1), end=c(2010,23), frequency=23)
x=sgolayfilt(MODIS.ts2, p=1, n=3, ts=30)
}

MODIS.filtered <- calc(MODIS, fun)

MODIS.filtered is a new brick containing the smoothed time series. Compare the raw with the filtered tims series:

l=cellStats(MODIS, stat=mean)
MODIS.raw.ts = ts(l, start=c(2005,1), end=c(2010,23), frequency=23)
plot(MODIS.raw.ts)
l=cellStats(MODIS.filtered, stat=mean)
MODIS.filt.ts = ts(l, start=c(2005,1), end=c(2010,23), frequency=23)
plot(MODIS.filt.ts)

One may find out the perfect fitting parameters by looking at the whole area, playing with the parameters:

l=cellStats(MODIS, stat=’mean’)
MODIS.ts = ts(l, start=2005, end=c(2010,23), frequency=23)
sg=sgolayfilt(MODIS.ts, p=3, n=9, ts=20)
sg.ts = ts(sg, start=2005, end=c(2010,23), frequency=23)

plot(sg.ts)

Eklundh, L.; Jönsson, P. Timesat 3.1 Software Manual. 2011.
Data handling

Pixel-wise regression between two raster time series (e.g. NDVI and rainfall)

Doing a pixel-wise regression between two raster time series can be useful for several reasons, for example:

  • find the relation between vegetation and rainfall for each pixel, e.g. a low correlation could be a sign of degradation
  • derive regression coefficients to model the depending variable using the independend variable (e.g. model NDVI with rainfall data)
  • check the consistency between different sensors, e.g. MODIS and SPOT VGT
  • adjust the data recorded by different sensors to each other

It’s easy and fast in R. Let’s say we have two raster stacks or bricks which contain 120 rasters each, one with NDVI and one with rainfall (named “NDVI” and “rainfall”). We put everything in one single stack and derive rasters showing the slope, intercept and R² of the two time series as pixel values:

library(raster)

s <- stack(NDVI, rainfall)
fun=function(x) { if (is.na(x[1])){ NA } else { lm(x[1:120] ~ x[121:240])$coefficients[2] }}
slope <- calc(s, fun)
fun=function(x) { if (is.na(x[1])){ NA } else { lm(x[1:120] ~ x[121:240])$coefficients[1] }}
intercept <- calc(s, fun)
fun=function(x) { if (is.na(x[1])){ NA } else { m <- lm(x[1:120] ~ x[121:240]);summary(m)$r.squared }}
r.squared <- calc(s, fun)

It only works if your two stacks have the same dimensions and NA values, if not, you can try something like that:

z = mask(NDVI, rainfall)
s <- stack(z, rainfall)

and do the analysis as described above.

This example shows R² between annually summed NDVI (SPOT VGT) and TRMM rainfall from 1998-2010 for the Linguere district in Senegal, and we see that major differences exist, which need to further examined.

R² annual sums SPOT VGT NDVI and TRMM rainfall, 1998-2010 for Linguere, Senegal
R² annual sums SPOT VGT NDVI and TRMM rainfall, 1998-2010 for Linguere, Senegal
Environmental change in the Sahel, Research results

Environmental change in time series – An interdisciplinary study in the Sahel of Mali and Senegal

Our next article is now online in the Journal of Arid Environments:

It is available here: http://www.sciencedirect.com/science/article/pii/S0140196314000536

Abstract: Climatic changes and human activities have caused major environmental change in the Sahel. Remote sensing studies detect various vegetation trends; however, explanations are rarely studied in detail. We present a methodology using time series, high-resolution imagery and fieldwork to validate trend analyses for two regions in the Sahel of Mali and Senegal. Both study areas show significant greening trends from 1982 to 2010. Reasons can be very site-specific, but several factors are valid for both research areas: (1) farmer-managed agro-forestry, (2) planting programs and protection laws, (3) widespread dispersion of robust species, which replace the former diverse woody vegetation and simulate a greening which conceals a shift in biodiversity and (4) an increase of annual rainfall. However, the situation is still far from the pre-drought conditions, which are reconstructed by Corona imagery (1965) and interviews with the local population. Rather a transformation is observed: a decrease in natural vegetation, tree density and diversity. Reasons are climatic and anthropogenic: (1) drought events, less rain and higher temperatures, (2) increased demand for cropping areas and wood, especially in times of droughts. Our example validates that climatic factors are important drivers of change, but much of today’s environment and vegetation composition is controlled by humans.

Data handling

Pixel-wise time series trend anaylsis with NDVI (GIMMS) and R

The GIMMS dataset is currently offline and the new GIMMS3g will soon be released, but it does not really matter which type of data is used for this purpose. It can also be SPOT VGT, MODIS or anything else as long as the temporal resolution is high and the time frame is long enough to detect significant trends. The purpose is to do a pixelwise trend analyis and extract only significant trends over a certain period of time fora selected region. Everything is done using open and free R software. Input data are continuous NDVI images, in this case it’s GIMMS with bi-monthly images, so 24 per year for the period 1982-2006.

so let’s load all Geotiffs lying in the GIMMS directory (should be  600 in this case):

library(raster)
setwd("~/GIMMS")
sg = stack(list.files(pattern='*.tif'))
gimms = brick(sg)
rm(sg)

first, let’s create annual sums, as autocorrelation might be present with monthly values. To keep NDVI as the unit, the results are devided by 24:

fun <- function(x) { 
 gimms.ts = ts(x, start=c(1982,1), end=c(2006,24), frequency=24)
 x <- aggregate(gimms.ts) 
 }
gimms.sum <- calc(gimms, fun)
gimms.sum=gimms.sum/24
plot(gimms.sum)

then the slope is calculated to get the direction and magnitude of trends, multiplied by the number of years to get the change in NDVI units:

time <- 1:nlayers(gimms.sum) 
fun=function(x) { if (is.na(x[1])){ NA } else { m = lm(x ~ time); summary(m)$coefficients[2] }}
gimms.slope=calc(gimms.sum, fun)
gimms.slope=gimms.slope*25
plot(gimms.slope)

now we need to see which trends are significant. Thus we first extract the p-value:

fun=function(x) { if (is.na(x[1])){ NA } else { m = lm(x ~ time); summary(m)$coefficients[8] }}
p <- calc(gimms.sum, fun=fun)
plot(p, main="p-Value")

then mask all values >0.05 to get a confidence level of 95%:

m = c(0, 0.05, 1, 0.05, 1, 0)
rclmat = matrix(m, ncol=3, byrow=TRUE)
p.mask = reclassify(p, rclmat)
fun=function(x) { x[x<1] <- NA; return(x)}
p.mask.NA = calc(p.mask, fun)

and finaly mask all insignificant values in the trend map, so we only get NDVI change significant at the 95% level:

trend.sig = mask(gimms.slope, p.mask.NA)
plot(trend.sig, main="significant NDVI change")

The result could look like that:

significant NDVI change (1982-2006) using integrated GIMMS
significant NDVI change (1982-2006) using integrated GIMMS