Downloading and processing MODIS with fast and free tools

Here’s a simple way of processing MODIS: downloading, extracting the bands from HDF as GEOTIFF, merging tiles and reprojecting. Area extraction, NDVI calculation, cloud filtering and more can be done in GRASS for example. It’s Linux based, but can easily used in Windows. My way might not be the most elegant, but it’s fast and works fine.

The first step is to select the tiles, the time frame and product at I take MOD09Q1 (250 m, 8 day, collection 6 reflectance) as example. Once orderd, we get a TXT file containing all file names. We download all files using wget. After download i mass rename all files to remove the creation date (using e.g. total commander) so the files are named like: MOD09Q1.A2000065.h21v05.006.

Then I extract Band 1, Band 2 and the Quality flag as GeoTiff. “aus” is the folder were the TIFFS will end up. Find out the name of the bands using gdalinfo.

wget -i modis.txt
for i in *.hdf; do gdal_translate -of GTiff 'HDF4_EOS:EOS_GRID:"'$i'":MOD_Grid_250m_Surface_Reflectance:sur_refl_b01' $aus`basename $i .hdf`_REF_band1.tif; done
for i in *.hdf; do gdal_translate -of GTiff 'HDF4_EOS:EOS_GRID:"'$i'":MOD_Grid_250m_Surface_Reflectance:sur_refl_b02' $aus`basename $i .hdf`_REF_band2.tif; done
for i in *.hdf; do gdal_translate -of GTiff 'HDF4_EOS:EOS_GRID:"'$i'":MOD_Grid_250m_Surface_Reflectance:sur_refl_qc_250m' $aus`basename $i .hdf`_QA.tif; done

I did not find a clean way for merging, so i suggest my “dirty” way, it needs to be repeated for each year and band, this is 5 minutes of copying but you can use it again and again and again for the next years then….here’s an example for band 1 year 2000. If you use a 16 day product every second line will give an error, but that you can ignore.

# 2000 -o 2000001_band1.tif *2000001* -o 2000009_band1.tif *2000009* -o 2000017_band1.tif *2000017* -o 2000025_band1.tif *2000025* -o 2000033_band1.tif *2000033* -o 2000041_band1.tif *2000041* -o 2000049_band1.tif *2000049* -o 2000057_band1.tif *2000057* -o 2000065_band1.tif *2000065* -o 2000073_band1.tif *2000073* -o 2000081_band1.tif *2000081* -o 2000089_band1.tif *2000089* -o 2000097_band1.tif *2000097* -o 2000105_band1.tif *2000105* -o 2000113_band1.tif *2000113* -o 2000121_band1.tif *2000121* -o 2000129_band1.tif *2000129* -o 2000137_band1.tif *2000137* -o 2000145_band1.tif *2000145* -o 2000153_band1.tif *2000153* -o 2000161_band1.tif *2000161* -o 2000169_band1.tif *2000169* -o 2000177_band1.tif *2000177* -o 2000185_band1.tif *2000185* -o 2000193_band1.tif *2000193* -o 2000201_band1.tif *2000201* -o 2000209_band1.tif *2000209* -o 2000217_band1.tif *2000217* -o 2000225_band1.tif *2000225* -o 2000233_band1.tif *2000233* -o 2000241_band1.tif *2000241* -o 2000249_band1.tif *2000249* -o 2000257_band1.tif *2000257* -o 2000265_band1.tif *2000265* -o 2000273_band1.tif *2000273* -o 2000281_band1.tif *2000281* -o 2000289_band1.tif *2000289* -o 2000297_band1.tif *2000297* -o 2000305_band1.tif *2000305* -o 2000313_band1.tif *2000313* -o 2000321_band1.tif *2000321* -o 2000329_band1.tif *2000329* -o 2000337_band1.tif *2000337* -o 2000345_band1.tif *2000345* -o 2000353_band1.tif *2000353* -o 2000361_band1.tif *2000361*

# We can remove the other files now 
rm MOD*
# Then reproject everything to a geographic system, maybe into a new folder:
mkdir lat

for i in *_band1.tif; do gdalwarp -t_srs EPSG:4326 $i $aus`basename $i .tif`.tif; done
for i in *_band2.tif; do gdalwarp -t_srs EPSG:4326 $i $aus`basename $i .tif`.tif; done
for i in ???????_QA.tif; do gdalwarp -t_srs EPSG:4326 $i $aus`basename $i .tif`.tif; done

What we have now can already be used, but you can continue in GRASS to calculate NDVI and filter the data.


Be cautious using GIMMS3g v0 for trend analysis!

GIMMS3g NDVI is widely used to assess vegetation trends from local to global scale. And even though it is the best long term dataset available right now (July 2015), people should be aware that a serious error affects the dataset in semi arid areas, like the Sahel, which has severe impacts on trend analysis. The dry season values around 2004 suddenly drop, which is not happening in reality. It is definitely related to the sensor change from NOAA16 to NOAA17, and also the last change to NOAA18 has an impact on the time series. The good thing is that the wet season values should be usable. The developers are aware about this problem and a new version is on it’s way. To illustrate the issue, find attached the GIMMS3g NDVI (v0) averaged over the Sahel belt, note the drop at the sensor change (first blue line) and the recovery at the next sensor change:


To further illustrate how serious this problem affects trend analysis, the same area is shown in the VOD dataset, which is found to be reliable:vod

a rough overlay highlights the different directions of annual trends:


The problem can be solved by excluding the dry season and using the small integral, calculated in TIMESAT:


see further literature:

  • Tian, F., Fensholt, R., Verbesselt, J., Grogan, K., Horion, S., Wang, Y., 2015. Evaluating temporal consistency of long-term global NDVI datasets for trend analysis. Remote Sensing of Environment 163, 326–340. doi:10.1016/j.rse.2015.03.031
  • Horion, S., Fensholt, R., Tagesson, T., Ehammer, A., 2014. Using earth observation-based dry season NDVI trends for assessment of changes in tree cover in the Sahel. International Journal of Remote Sensing 35, 2493–2515. doi:10.1080/01431161.2014.883104
  • Pinzon, J.E., Tucker, C.J., 2014. A Non-Stationary 1981–2012 AVHRR NDVI3g Time Series. Remote Sensing 6, 6929–6960. doi:10.3390/rs6086929
  • Liu, Y.Y., van Dijk, A.I.J.M., McCabe, M.F., Evans, J.P., de Jeu, R.A.M., 2013. Global vegetation biomass change (1988–2008) and attribution to environmental and human drivers. Global Ecology and Biogeography 22, 692–705. doi:10.1111/geb.12024
  • Jonsson, P., Eklundh, L., 2004. TIMESAT–a program for analyzing time-series of satellite sensor data* 1. Computers & Geosciences 30, 833–845.

Smoothing/Filtering a NDVI time series using a Savitzky Golay filter and R

Coarse scaled NDVI (or FAPAR, EVI…) images have a high temporal frequency and are delivered as Maximum Value Composites (MVC) of several days, which means the highest value is taken, assuming that clouds and other contaminated pixels have low values. However, especially in areas with a rainy season, the composites over 10-16 days still contain clouds and other disturbances. This figure illustrates a raw MODIS NDVI MVC of 16 days, and it’s obvious that several pixels are contaminated, and it’s also obvious that analyses will be affected by the noisy data.

unfiltered MODIS image

unfiltered MODIS image

Datasets like GIMMS or GEOV1 provide already filtered data, but e.g. MODIS, SPOT VGT, PROBA-V and AVHRR data are raw. The solution is to smooth the time series, using a filter, which calculates a smooth time series and interpolates the bad values using the previous and following images. Here is the same image, but smoothed with a Savitzky Golay filter.

filtered and smoothed MODIS image

filtered and smoothed MODIS image

The data are mostly delivered with quality rasters, rating the quality of each pixel. This can be used to either filter the raster and set bad quality pixels to  NA, or to weight the pixels. When the new time line is calculated, low weighted (i.e. contaminated) pixels are less considered in the calculation process. One possibility is the software TIMESAT (Eklundh & Jönsson, 2011), which offers different filter techniques. Here is an example how timesat smoothes the time line with a Savitzky Golay filter, omitting “bad” pixles and creating new rasters.

A Savitzky Golay filter applied on MODIS using TIMESAT

A Savitzky Golay filter applied on MODIS using TIMESAT

Filtering is also possible in R, and it’s very simple. First one has to decide if one wants to work with quality files, or simply use the raw data, both is possible. GIS software like GRASS has modules which allow an easy use of the quality files:

However, filtering without quality flags also provides reasonable results. Now we assume we have a time series of MODIS data from 2005-2010, with 23 images each year. This data is loaded in R in a raster stack or brick, called MODIS, bad values are masked as NA. We load the libraries, and create a function which uses the Savitzky Golay filter from the signal package. The parameters of the function need to be adapted (p, n, ts) (, also the time frame.



fun <- function(x) {
z=substituteNA(v, type=”mean”)
MODIS.ts2 = ts(z, start=c(2005,1), end=c(2010,23), frequency=23)
x=sgolayfilt(MODIS.ts2, p=1, n=3, ts=30)

MODIS.filtered <- calc(MODIS, fun)

MODIS.filtered is a new brick containing the smoothed time series. Compare the raw with the filtered tims series:

l=cellStats(MODIS, stat=mean)
MODIS.raw.ts = ts(l, start=c(2005,1), end=c(2010,23), frequency=23)
l=cellStats(MODIS.filtered, stat=mean)
MODIS.filt.ts = ts(l, start=c(2005,1), end=c(2010,23), frequency=23)

One may find out the perfect fitting parameters by looking at the whole area, playing with the parameters:

l=cellStats(MODIS, stat=’mean’)
MODIS.ts = ts(l, start=2005, end=c(2010,23), frequency=23)
sg=sgolayfilt(MODIS.ts, p=3, n=9, ts=20)
sg.ts = ts(sg, start=2005, end=c(2010,23), frequency=23)


Eklundh, L.; Jönsson, P. Timesat 3.1 Software Manual. 2011.

Pixel-wise regression between two raster time series (e.g. NDVI and rainfall)

Doing a pixel-wise regression between two raster time series can be useful for several reasons, for example:

  • find the relation between vegetation and rainfall for each pixel, e.g. a low correlation could be a sign of degradation
  • derive regression coefficients to model the depending variable using the independend variable (e.g. model NDVI with rainfall data)
  • check the consistency between different sensors, e.g. MODIS and SPOT VGT
  • adjust the data recorded by different sensors to each other

It’s easy and fast in R. Let’s say we have two raster stacks or bricks which contain 120 rasters each, one with NDVI and one with rainfall (named “NDVI” and “rainfall”). We put everything in one single stack and derive rasters showing the slope, intercept and R² of the two time series as pixel values:


s <- stack(NDVI, rainfall)
fun=function(x) { if ([1])){ NA } else { lm(x[1:120] ~ x[121:240])$coefficients[2] }}
slope <- calc(s, fun)
fun=function(x) { if ([1])){ NA } else { lm(x[1:120] ~ x[121:240])$coefficients[1] }}
intercept <- calc(s, fun)
fun=function(x) { if ([1])){ NA } else { m <- lm(x[1:120] ~ x[121:240]);summary(m)$r.squared }}
r.squared <- calc(s, fun)

It only works if your two stacks have the same dimensions and NA values, if not, you can try something like that:

z = mask(NDVI, rainfall)
s <- stack(z, rainfall)

and do the analysis as described above.

This example shows R² between annually summed NDVI (SPOT VGT) and TRMM rainfall from 1998-2010 for the Linguere district in Senegal, and we see that major differences exist, which need to further examined.

R² annual sums SPOT VGT NDVI and TRMM rainfall, 1998-2010 for Linguere, Senegal

R² annual sums SPOT VGT NDVI and TRMM rainfall, 1998-2010 for Linguere, Senegal

Renaming, converting, clipping: script based raster time series processing

When working with remotely sensed time series data (e.g. MODIS, AVHRR, GIMMS, etc.), bulk processing can save a lot of time. Using a terminal in a Linux environment, simple scripts can process thousends of files in short time. Here are some basic hints on how to start, gdal has to be installed. For Windows users or users who prefer an interface, QGIS provides a graphical interface for most of the gdal modules. Also renaming can be done more user friendly.


Original data-names can be very long annoying. So first we rename all files to numbers from 1 to XX (like 1, 2, 3…). To do so, navigate to the folder with the files, and these lines will rename all files originally starting with MOD (like MOD13Q1203837383922837284.hdf):

for fname in MOD* 
mv $fname `printf $x`

Then the following lines will create a script that renames all files numbered from 1 to 60. the final files will include year and date. So we then have data named MODIS_2001_01 until MODIS_2005_12. This can be adopted to every name and numbering. The script is then executed.

for ((i=1;$i<=60;i++)) do echo "mv $i MODIS_$(( ($i-1) / 12 + 2001 ))_$(( ($i-1) % 12 + 1 )).tif" ; done >
sed -i 's/_1/_01/g'
sed -i 's/_2/_02/g'
sed -i 's/_3/_03/g'
sed -i 's/_4/_04/g'
sed -i 's/_5/_05/g'
sed -i 's/_6/_06/g'
sed -i 's/_7/_07/g'
sed -i 's/_8/_08/g'
sed -i 's/_9/_09/g'
chmod 0755


This works fine to convert all kind of formats, also very useful to change HDF files (like MODIS, VGT…) to GeoTiff. This example converts GeoTiffs to ENVI format.

# define in and out folder and output format


# convert all Tiffs to ENVI format, use float32 as output here

cd $in
for i in *.tif; do gdal_translate -of $format -ot Float32 $i $out`basename $i`; done


# define in and out folder


# define the window which should be clipped

range="-17.6 16.25 -2.72 13.16"

# do the clipping for all Tiffs

cd $in 
for i in *.tif; do gdal_translate -projwin $range $i $out`basename $i`; done

For more information, see

Pixel-wise time series trend anaylsis with NDVI (GIMMS) and R

The GIMMS dataset is currently offline and the new GIMMS3g will soon be released, but it does not really matter which type of data is used for this purpose. It can also be SPOT VGT, MODIS or anything else as long as the temporal resolution is high and the time frame is long enough to detect significant trends. The purpose is to do a pixelwise trend analyis and extract only significant trends over a certain period of time fora selected region. Everything is done using open and free R software. Input data are continuous NDVI images, in this case it’s GIMMS with bi-monthly images, so 24 per year for the period 1982-2006.

so let’s load all Geotiffs lying in the GIMMS directory (should be  600 in this case):

sg = stack(list.files(pattern='*.tif'))
gimms = brick(sg)

first, let’s create annual sums, as autocorrelation might be present with monthly values. To keep NDVI as the unit, the results are devided by 24:

fun <- function(x) { 
 gimms.ts = ts(x, start=c(1982,1), end=c(2006,24), frequency=24)
 x <- aggregate(gimms.ts) 
gimms.sum <- calc(gimms, fun)

then the slope is calculated to get the direction and magnitude of trends, multiplied by the number of years to get the change in NDVI units:

time <- 1:nlayers(gimms.sum) 
fun=function(x) { if ([1])){ NA } else { m = lm(x ~ time); summary(m)$coefficients[2] }}
gimms.slope=calc(gimms.sum, fun)

now we need to see which trends are significant. Thus we first extract the p-value:

fun=function(x) { if ([1])){ NA } else { m = lm(x ~ time); summary(m)$coefficients[8] }}
p <- calc(gimms.sum, fun=fun)
plot(p, main="p-Value")

then mask all values >0.05 to get a confidence level of 95%:

m = c(0, 0.05, 1, 0.05, 1, 0)
rclmat = matrix(m, ncol=3, byrow=TRUE)
p.mask = reclassify(p, rclmat)
fun=function(x) { x[x<1] <- NA; return(x)}
p.mask.NA = calc(p.mask, fun)

and finaly mask all insignificant values in the trend map, so we only get NDVI change significant at the 95% level:

trend.sig = mask(gimms.slope, p.mask.NA)
plot(trend.sig, main="significant NDVI change")

The result could look like that:

significant NDVI change (1982-2006) using integrated GIMMS

significant NDVI change (1982-2006) using integrated GIMMS

Simple time series analysis with GIMMS NDVI and R

GIMMS NDVI time series for a selected pixel

GIMMS NDVI time series for a selected pixel

Time series analysis with satellite derived greenness indexes (e.g. NDVI) is a powerfull tool to assess environmental processes. AVHRR, MODIS and SPOT VGT provide global and daily imagery. Creating some plots is a simple task, and here is a rough start how it is done with GIMMS NDVI. All we need is the free software R.

I assume we already have the NDVI images in a folder, so the first 3 steps can be skipped, if not, they can be downloaded with a simple command:

wget -r

now unzip it:

find . -name "*.gz" -exec unp {} \;

As it comes in Albers projection, I reproject the Tiffs to a Lat Long WGS84 projection:

for in in *tif; do gdalwarp -s_srs EPSG:9001 -t_srs EPSG:4326 -r near -of GTiff $i /home/martin/Dissertation/GIMMS/GIMMS_lat/`basename $i`; done

I delete all 1981 images, as the year is not complete.

next, we start R, load the raster package and set the directory with the GIMMS Geotiffs:


now load the Tiffs as a raster brick:

sg = stack(list.files(pattern='*.tif'))
gimmsb = brick(sg)

now let’s chose a pixel and create a time series object. It can either be done by Row and Col:

i=20 # row
j=23 # col
gimms = gimmsb[i,j]
gimms.ts = ts(gimms, start=1982, end=c(2006,24), frequency=24)

or by coordinates

xy <- cbind(-3.6249,14.3844) 
sp <- SpatialPoints(xy) 
data <- extract(gimmsb, sp)
gimms.ts = ts(data, start=1982, end=c(2006,24), frequency=24)

now the time series can be used for further analysis or plotted:


A regression line and a LOESS smoothing can be added:

gimms.time = time(gimms.ts) 
abline(reg=lm(gimms.ts ~ gimms.time))
lines(lowess(gimms.ts, f=0.3, iter=5), col="blue")

some playing with boxplots and annual values:

boxplot(gimms.ts ~ cycle(gimms.ts))
GIMMS time series analysis

GIMMS time series analysis

averaged over the sudy area:

g=cellStats(gimmsb, stat='mean')
g.ts = ts(g, start=1982, end=c(2006,24), frequency=24)

There are great possibilities for time series analysis in R, e.g. the series can be smoothed, decomposed…etc. The zoo package provides further functions and can handle irregular series. See also the lattice package for better visual presentation. How pixelwise analyis is conducted for time series of rasters, I’ll write in another post.

Converting GPCC gridded rainfall 1901-2010@0.5° to monthly Geotiffs

GPCC rainfall data contains the largest database worldwide, with over 90 000 weather-stations interpolated to a 0.5° grid. It includes S. Nicholsons dataset for Africa and is thus a good source for gridded monthly precipitation. It covers the period 1901-2010, however, it comes in a strange dataformat and it is a long way until we get such diagrams for the desired location:

GPCC annual rainfall averaged over my study area in northern Senegal

GPCC annual rainfall averaged over my study area in northern Senegal

As I couldn’t handle with the format provided at the DWD download site, I used the climate explorer to generate a ncdf-file. Use this link to get the global dataset: GPCC-V6.

Then this file is loaded into R. After loading the packages, the working directory which contains the file is set:


Then the variable “prcp” is loaded into R and my study area clipped. To find out the correct variable, use gdalinfo e.g.

gpcc = brick("", varname="prcp")
e <- extent(-20, 5, 9, 26)
gpccclip <- crop(gpcc, e)

Then it’s exported as a multilayer Geotiff:

writeRaster(gpccclip, filename="gpcc_v6_westafr.tif", format="GTiff")

now I extract the individual rasters and create 1320 single files. I exit R for this:

cd ~/Dissertation/GPCC/
gdalinfo gpcc_v6_westafr.tif
for ((i = 1; i<= 1320; i++)) do gdal_translate -b $i -of GTiff gpcc_v6_westafr.tif `basename $i`.tif; done

Finaly the files are renamed. This may not be an elegant way, but it works:

for ((i=1;$i<=1320;i++)) do echo "mv $i.tif gpcc_v6_$(( ($i-1) / 12 + 1901 ))_$(( ($i-1) % 12 + 1 )).tif"; done >
sed -i 's/_1.tif/_01.tif/g'
sed -i 's/_2.tif/_02.tif/g'
sed -i 's/_3.tif/_03.tif/g'
sed -i 's/_4.tif/_04.tif/g'
sed -i 's/_5.tif/_05.tif/g'
sed -i 's/_6.tif/_06.tif/g'
sed -i 's/_7.tif/_07.tif/g'
sed -i 's/_8.tif/_08.tif/g'
sed -i 's/_9.tif/_09.tif/g'
chmod 0755

The final result are 1320 Geotiffs, named gpcc_v6_1901_01.tif until gpcc_v6_2010_12.tif with monthly rainfall for West Africa, ready to be used in any GIS. Now we can easily create beatiful maps like this:

Mean annual rainfall 1950-2010 (source: GPCC v6)

Mean annual rainfall 1950-2010 (source: GPCC v6)


Automatically downloading and processing TRMM rainfall data

TRMM rainfall data maybe is the most accurate rainfall data derived from satellite measurements and a valuable source in regions with scarse weather-stations. It has a good spatial (0.25°) and temporal (daily) resolution and is available since 1998. However, downloading and processing may be a lot of work, if not scripted. The following script may be badly coded, but it works. All you need is the open source software R. This script is written for a Linux environment.

First, load some packages:


Then, download the data from the FTP server: This example downloads 3B43 monthly data version 7 for 2012. The first line has to be changed for the desired data:

url = ""
filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly=TRUE, crlf=TRUE)
 filenames = paste(url, strsplit(filenames, "\r*\n") [[1]], sep="")
filenames = filenames[grep(filenames, pattern="*precipitation.accum")] # this line allows to download only files with a certain pattern, e.g. only certain months or days. "*precipitation.accum" means monthly accumulated rainfall here.
filenames # list all files
mapply(download.file, filenames, basename(filenames)) # download files

Now I create a virtual file as the downloaded TRMM data come as binarys. The VRT-file contains all the downloaded binary files with the appropriate geo-informations. To automate the process, the following script generates the VRT-file (TRMM.vrt) for all 2012 data. Change “3B43.12” accordingly to the downloaded data. Save this example as

echo '<VRTDataset rasterXSize="1440" rasterYSize="400"> 
 <Geotransform>-180,0.25,0,50, 0, -0.25</Geotransform> 
 <SRS>WGS84</SRS>' >TRMM.vrt;
for i in 3B43.12*
do echo '<VRTRasterBand dataType="Float32" band="1" subClass="VRTRawRasterBand"> 
 <SourceFilename relativeToVRT="1">'$i'</SourceFilename> 
 </VRTRasterBand>' >>TRMM.vrt
echo '</VRTDataset>' >>TRMM.vrt

Within R, the script ( is executed and the virtual-file (TRMM.vrt) loaded as a rasterbrick. This is flipped in y direction and the files written as multilayer Geotiff. This Geotiff contains all the layers for 2012 and can be opened in every GIS software.

 trmm = flip(b, direction='y')
 writeRaster(trmm, filename="trmm_acc_2012.tif", format="GTiff", overwrite=TRUE)