Skip to content

030 NASA MODIS Earthdata

Introduction

Purpose

In this notebook, we will use high-level codes from geog0111 to familiarise ourselves with downloading and interpreting NASA MODIS datasets from NASA EarthData. We will also be visualising these data in this notebook.

We will be introducing NASA MODIS land products, and viewing the MODIS LAI product as an example. This notebook should serve as an introduction to accessing similar products from Earthdata.

For further work on NASA datasets, a student might consider using an explicit API to access the data. If all you want is to get hold of some data product for some defined location and time, then you might use an API such as Appeears.

Prerequisites

Since the idea of this section is to get used to the datasets, the requirement for coding is quite low. You should have some familiarity with the following:

Test

You should run a NASA account test if you have not already done so.

MODIS LAI product

To introduce geospatial processing, we will use a dataset from the MODIS LAI product over the UK.

The data product MOD15 LAI/FPAR has been generated from NASA MODIS sensors Terra and Aqua data since 2002. We are now in dataset collection 6 (the data version to use).

LAI is defined as the one-sided green leaf area per unit ground area in broadleaf canopies and as half the total needle surface area per unit ground area in coniferous canopies. FPAR is the fraction of photosynthetically active radiation (400-700 nm) absorbed by green vegetation. Both variables are used for calculating surface photosynthesis, evapotranspiration, and net primary production, which in turn are used to calculate terrestrial energy, carbon, water cycle processes, and biogeochemistry of vegetation. Algorithm refinements have improved quality of retrievals and consistency with field measurements over all biomes, with a focus on woody vegetation.

We use such data to map and understand about the dynamics of terrestrial vegetation / carbon, for example, for climate studies.

The raster data are arranged in tiles, indexed by row and column, to cover the globe:

MODIS tiles

Exercise 1

The pattern on the tile names is hXXvYY where XX is the horizontal coordinate and YY the vertical.

  • use the map above to work out the names of the two tiles that we will need to access data over the UK
  • set the variable tiles to contain these two names in a list

For example, for the two tiles covering Madagascar, we would set:

tiles = ['h22v10','h22v11']

Accessing NASA MODIS URLs

Warning: The NASA data servers tend to be down for maintainance on Wednesday morning EST

Although you can access MODIS datasets through the NASA Earthdata interface, there are many occasions that we would want to just automatically pull datasets from a server. As we note above, we could use some existing API for this, such as Appeears, but we are aiming here at being able to ultimately develop codes that do this from a lower-level perspective.

Automation has many roles, and is particularly useful when you want a time series of data that might involve many files. For example, for analysing LAI or other variables over space/time) we will want to write code that pulls the time series of data.

If you visit the site https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006, you will see 'date' style links (e.g. 2018.09.30) through to sub-directories.

In these, e.g. https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/ you will find URLs of a set of files.

The files pointed to by the URLs are the MODIS MOD15 4-day composite 500 m LAI/FPAR product MCD15A3H.

There are links to several datasets on the page, including 'quicklook files' that are jpeg format images of the datasets, e.g.:

MCD15A3H.A2018273.h17v03

as well as xml files and hdf datasets.

Data Products

If we look at the dataserver we have specified https://e4ftl01.cr.usgs.gov, we will see that a number of sub-directories exist. Each of these 'server directories' points to a different data stream:

[DIR] ASTT/                   2019-08-05 07:54    -   
[DIR] COMMUNITY/              2020-06-02 08:45    -   
[DIR] ECOSTRESS/              2020-04-09 10:30    -   
[DIR] GEDI/                   2020-02-10 09:58    -   
[DIR] MEASURES/               2020-03-17 10:55    -   
[DIR] MOLA/                   2020-06-01 09:20    -   
[DIR] MOLT/                   2020-04-14 08:06    -   
[DIR] MOTA/                   2019-12-27 06:49    -   
[DIR] VIIRS/                  2020-06-23 10:26    -

For example, we might notice VIIRS which takes us to the VIIRS data products, or GEDI spaceborne lidar data. Each of these data streams will have their own properties that we need to appreciate before using them.

MOTA

The URL we have used above, https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/ starts with a call to the server directory MOTA, so we can think of https://e4ftl01.cr.usgs.gov/MOTA as the base level URL.

MOTA refers to combined MODIS Terra and Aqua datasets. Similarly, MOLA and MOLT refer to datasets generated from single MODIS sensors of Aqua and Terra, respectively.

The rest of the directory information MCD15A3H.006/2018.09.30 tells us:

  • the product name MCD15A3H
  • the product version 006
  • the date of the dataset 2018.09.30

There are several ways we could specify the date information. The most 'human readable' is probably YYYY.MM.DD as given here, but we may also use day of year. Recall from 023 Plotting how to convert date formats.

We will mainly use geog0111.im_display to do simple image plots. This is essentially the same as the code you have developed in an exercise.

If we look at the product specification page for MCD15A3H we see that the data product has multiple data layers. In the case of MCD15A3H, this is:

SDS Name Description Units Data Type Fill Value No Data Value Valid Range Scale Factor
Fpar_500m Fraction of Photosynthetically Active Radiation Percent 8-bit unsigned integer 249 to 255 N/A 0 to 100 0.01
Lai_500m Leaf Area Index m²/m² 8-bit unsigned integer 249 to 255 N/A 0 to 100 0.1
FparLai_QC Quality for FPAR and LAI Class Flag 8-bit unsigned integer 255 N/A 0 to 254 N/A
FparExtra_QC Extra detail Quality for FPAR and LAI Class Flag 8-bit unsigned integer 255 N/A 0 to 254 N/A
FparStdDev_500m Standard deviation of FPAR Percent 8-bit unsigned integer 248 to 255 N/A 0 to 100 0.01
LaiStdDev_500m Standard deviation of LAI m²/m² 8-bit unsigned integer 248 to 255 N/A 0 to 100 0.1

We will use a simple recipe for accessing a dataset and plotting it. All examples below will follow the same pattern.

First, we use the Modis class in geog0111.modis to get the dataset. Recall from 024 Image display that many of these datasets are cached on the system for you. If it takes a long time to retrieve a dataset, it is probably because you are downloading it, rather than using a cached copy. If it is Wednesday afternoon UK time, you will not be able to download new datasets, and can only use cached copies, as the NASA server is down for maintenance.

Notice that the LAI data have a valid range 0 to 100, and that a scale factor of 0.1 must be applied to the datas to interpret as LAI.

from geog0111.modisUtils import modisAnnual
from osgeo import gdal

#######################
# specify what we want
# in a dictionary
#######################
# UK tiles

kwargs = {
    'tile'      :    ['h17v03','h18v03','h17v04','h18v04'],
    'product'   :    'MCD15A3H',
    'sds'       :    ['Lai_500m'],
    'doys'       : [41],
    'year'      : 2019,
}

filename,bandname = modisAnnual(force=True,verbose=False,**kwargs)

data_MCD15A3H = {}
for f,v in filename.items():
    g = gdal.Open(v)
    if g:
        data_MCD15A3H[f] = g.ReadAsArray()
from geog0111.im_display import im_display

im_display(data_MCD15A3H,['Lai_500m'],x_size=16,y_size=8,vmax=100)

png

masking

We can see from the product specification page for MCD15A3H that values above 100 for Lai_500m are invalid. In addition, a scale factor of 0.1 should be applied.

We will learn more of this later, but for now, note that we can use code such as the following to apply the scaling factor and mask invalid numbers. We need to import the numpy package for this first (import numpy as np).

from geog0111.modisUtils import modisAnnual
from osgeo import gdal

#######################
# specify what we want
# in a dictionary
#######################
# UK tiles

kwargs = {
    'tile'      :    ['h17v03','h18v03','h17v04','h18v04'],
    'product'   :    'MCD15A3H',
    'sds'       :    ['Lai_500m'],
    'doys'       : [41],
    'year'      : 2019,
}

filename,bandname = modisAnnual(verbose=False,**kwargs)
data_MCD15A3H = {}
for f,v in filename.items():
    g = gdal.Open(v)
    if g:
        data_MCD15A3H[f] = g.ReadAsArray()
import numpy as np
# multiply by scale
scale = 0.1
ds = data_MCD15A3H['Lai_500m'] * scale
# mask invalid by setting to 
ds[ds>100 * scale] = np.nan

# load back into data_MCD15A3H
data_MCD15A3H['Lai_500m'] = ds

# plot
im_display(data_MCD15A3H,['Lai_500m'],x_size=16,y_size=8)

png

The dataset, being a measurement, is not spatially complete. This is because of clouds and other factors meaning that the algorithm to produce LAI could not run at that time. It is very common that you will find geospatial datasets have missing values. We will learn later on when and how we might fill these gaps.

Exercise 2: mask invalid data

  • write a function called data_mask that is given:

    • a data dictionary
    • a list of sds strings
    • a list of scale factors
    • a list of upper threshold values
    • a list of lower threshold values and returns the dictionary with the scaled and masked datasets
  • plot both Lai_500m and LaiStdDev_500m side-by-side. You should apply appropriate scaling factors and masking as above.

Burned area dataset MCD64A1

We can use the same codes to access different MODIS datasets. For example the monthly MCD64A1 product:

MCD15A3H

For the Burn Date field, values of -2 to 0 are invalid. We can visualise this with code as above, but first we need to get the day of year for the month we want. Recall from 023 Plotting that we can use datetime for this. We do this by working out the difference between the day of the month in question and the 31st day of December for the previous year.

from datetime import datetime

year = 2019
day = 1
for month in range(1,13):
    doy = (datetime(year,month,day) - datetime(year-1,12,31)).days
    print(f'1st of {month:<2d} is doy {doy}')
1st of 1  is doy 1
1st of 2  is doy 32
1st of 3  is doy 60
1st of 4  is doy 91
1st of 5  is doy 121
1st of 6  is doy 152
1st of 7  is doy 182
1st of 8  is doy 213
1st of 9  is doy 244
1st of 10 is doy 274
1st of 11 is doy 305
1st of 12 is doy 335

Exercise 3

  • write a function called get_doy that is given the year, month and day integer and returns the day of year
  • test it

We now simply need to encode any scale factors or invalidity thresholds, and we can plot the dataset:

from geog0111.modisUtils import modisAnnual
from osgeo import gdal
from geog0111.im_display import im_display
from geog0111.data_mask import data_mask
from geog0111.get_doy import get_doy

year  = 2019
month = 6
doy = get_doy(year,month,1)

# h22v10 is Madagascar
kwargs = {
    'tile'      :    ['h22v10'],
    'product'   :    'MCD64A1',
    'doys'       : [doy],
    'sds'       :    ['Burn Date'],
    'year'      : year,
}

filename,bandname = modisAnnual(verbose=False,**kwargs)
data_MCD64A1 = {}
for f,v in filename.items():
    g = gdal.Open(v)
    if g:
        data_MCD64A1[f] = g.ReadAsArray()
scale   = [1]
uthresh = [200]
lthresh = [0]

data_MCD64A1 = data_mask(data_MCD64A1,kwargs['sds'],scale,uthresh,lthresh)

# have to take sub-area to see anything here:
# define row and column min/max with r=[1500,2000],c=[1000,1500]
im_display(data_MCD64A1,kwargs['sds'],\
           x_size=10,y_size=8,\
           r=[1800,2000],c=[1000,1200])

png

Snow

There are two MODIS daily snow cover datasets: MOD10A1 and MYD10A1 for data from the MODIS Terra and Aqua sensors, respectively. The field NDSI_Snow_Cover has valid values in the range 0 to 100.

from geog0111.modisUtils import modisAnnual
from osgeo import gdal
from geog0111.im_display import im_display
from geog0111.data_mask import data_mask
from geog0111.get_doy import get_doy

year  = 2019
month = 12
doy = get_doy(year,month,1)

kwargs = {
    'tile'      :    ['h19v03'],
    'product'   :    'MOD10A1',
    'sds'       :     ['NDSI_Snow_Cover'],
    'year'      : year,
    'doys'        : [doy]
}

scale   = [1]
uthresh = [101]
lthresh = [-1]
sds     = kwargs['sds']

filename,bandname = modisAnnual(verbose=False,**kwargs)
data_MOD10A1 = {}
for f,v in filename.items():
    g = gdal.Open(v)
    if g:
        data_MOD10A1[f] = g.ReadAsArray()

data_MOD10A1 = data_mask(data_MOD10A1,sds,scale,uthresh,lthresh)
# have to take sub-area to see anything here:
# define row and column min/max with r=[1500,2000],c=[1000,1500]
im_display(data_MOD10A1,kwargs['sds'],\
           x_size=10,y_size=8,colourmap='gray')

png

Land Cover dataset MCD12Q1

One of the MODIS land cover products is MCD12Q1. We briefly examined this product in the previous session, but can take the chance now to explore it further. The datasets are yearly, given for doy 1. Details of the product can be found at the product page (MCD12Q1)[https://lpdaac.usgs.gov/products/mcd12q1v006/].

We previously developed a code get_lc to set up the colourmaps and plot the LC_Type1 for a given year and set of tiles. This is the Land Cover Type 1: Annual International Geosphere-Biosphere Programme (IGBP) classification.

from geog0111.plot_lc import plot_lc
from geog0111.modisUtils import modisAnnual
from osgeo import gdal

# Madagascar

# ensure tiles is a list
kwargs = {
    'product'   :    'MCD12Q1',
    'sds'       :    ['LC_Type1'],
    'year'      :    2019,
    'doys'      : [1],
    'tile'      : ['h22v10']
}

filename,bandname = modisAnnual(verbose=False,**kwargs)

data_MCD12Q1 = {}
for f,v in filename.items():
    g = gdal.Open(v)
    if g:
        data_MCD12Q1[f] = g.ReadAsArray()

# the data we want here
plot_lc(data_MCD12Q1['LC_Type1'])

png

Exercise 5

  • Explore the MODIS datasets in your own time, and get to know the datasets they contain, along with the scaling factors and other interpretation information. Note that if we have not already downloaded particular datasets into the cache, it may take a short time for them to download and load.

Summary

In these notes, we have introduced the characteristics of MODIS data products, and learned how to specify, access, and display them for a number of datasets. You will have accessed a number of products under a number of conditions in the exercises, but you are encouraged to explore this further.

The main item to do with using data products of this sort, that we haven't covered yet, is the interpretation of Quality Assurance (QA) data. This is often packed information into bits, and can be a little tricky at first. However, as with above, once you have a little familiarisation with a few cases, you will be able to apply this more widely.

You should spend some time going through the various links to explore the different datasets, and try out the exercises above for various products. The familiarity you gain from this will help when it comes to building our own codes later on.

Some MODIS datasets


Last update: December 6, 2022