Global Land Cover Characterization
Global Land Cover Characteristics Data Base Version 2.0
PLEASE NOTE: This is the Version 2.0 release of the global land cover characteristics data base. The land cover information has been updated from Version 1.2. Please read section 9.0 for information about the revision process and what changes have been made to the database.
Table of Contents
The U.S. Geological Survey's (USGS) National Center for Earth Resources Observation and Science (EROS), the University of Nebraska-Lincoln (UNL) and the Joint Research Centre of the European Commission have generated a 1-km resolution global land cover characteristics data base for use in a wide range of environmental research and modeling applications (Loveland and others, 2000). The land cover characterization effort is part of the National Aeronautics and Space Administration (NASA) Earth Observing System Pathfinder Program and the International Geosphere-Biosphere Programme-Data and Information System focus 1 activity. Funding for the project is provided by the USGS, NASA, U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, U.S. Forest Service, and the United Nations Environment Programme.
The data set is derived from 1-km Advanced Very High Resolution Radiometer (AVHRR) data spanning a 12-month period (April 1992-March 1993) and is based on a flexible data base structure and seasonal land cover regions concepts. Seasonal land cover regions provide a framework for presenting the temporal and spatial patterns of vegetation in the database. The regions are composed of relatively homogeneous land cover associations (for example, similar floristic and physiognomic characteristics) which exhibit distinctive phenology (that is, onset, peak, and seasonal duration of greenness), and have common levels of primary production.
Rather than being based on precisely defined mapping units in a predefined land cover classification scheme, the seasonal land cover regions serve as summary units for both descriptive and quantitative attributes. The attributes may be considered as spreadsheets of region characteristics and permit updating, calculating, or transforming the entries into new parameters or classes. This provides the flexibility for using the land cover characteristics data base in a variety of models without extensive modification of model inputs.
The analytical strategy for global land cover characterization has evolved from methods initially tested during the development of a prototype 1-km land cover characteristics data base for the conterminous United States (Loveland and others, 1991, 1995; Brown and others, 1993). In the U.S. study, multitemporal AVHRR data, combined with other ancillary data sets, were used to produce a prototype land cover characteristics data base.
The global land cover characteristics database was developed on a continent-by-continent basis. All continents in the global database share the same map projection (Interrupted Goode Homolosine), have 1-km nominal spatial resolution, and are based on 1-km AVHRR data spanning April 1992 through March 1993. (Please note that while the native projection for the global land cover database is the Interrupted Goode Homolosine, the land cover data are now available in a Geographic projection based on user requirements.) While each continental data base has unique elements based on the salient geographic aspects of the specific continent, there are a common set of derived thematic maps produced through the aggregation of seasonal land cover regions. The thematic maps include:
One-kilometer AVHRR NDVI composites are the core data set used in land cover characterization. In addition, other key geographic data include digital elevation data, ecoregions interpretations, and country or regional-level vegetation and land cover maps. See Brown and others (1993) for a detailed discussion of the role of ancillary data for land cover characterization.
The base data used are the International Geosphere Biosphere Programme (IGBP) 1-km AVHRR 10-day composites for April 1992 through March 1993 (Eidenshink and Faundeen, 1994). Multitemporal AVHRR NDVI data are used to divide the landscape into land cover regions, based on seasonality. While the primary AVHRR data used in the classification is NDVI, the individual channel data sets are used for postclassification characterization of certain landscape properties. A data quality evaluation was conducted and is reported by Zhu and Yang (1996).
DEM data are used to model the ecological factors governing natural vegetation distribution, and are important for identifying land cover types and stratifying seasonal regions representing two or more disparate vegetation types.
Ecological regions data are used to identify regions with disparate land cover types and for stratifying seasonal regions representing two or more disparate vegetation types. Both continental and country level ecoregions data are used in this process.
Maps and atlases of ecoregions, soils, vegetation, land use, and land cover are used in the interpretation phase of the study and serve as reference data to guide class labeling.
The methods used can be described as a multitemporal unsupervised classification of NDVI data with post-classification refinement using multi-source earth science data. Monthly AVHRR NDVI maximum value composites for April, 1992 through March, 1993 are used to define seasonal greenness classes. Past investigations have demonstrated that classes developed from multitemporal NDVI data represent characteristic patterns of seasonality and correspond to relative patterns of productivity (Loveland and others, 1991; Brown and others, 1993).
The translation of the seasonal greenness classes to seasonal land cover regions require post-classification refinement with the addition of digital elevation, ecoregions data and a collection of other land cover/vegetation reference data. The interpretation is based on extensive use of computer-assisted image processing tools (Brown and others, 1998); however, the classification process is not automated and more closely resembles a traditional manual image interpretation philosophy. There is a reliance on the skills of the human interpreter to make the final decisions regarding the relationship between spectral classes defined using unsupervised methods and landscape characteristics that are used to make land cover definitions.
The initial step in the process involves the preparation of the AVHRR NDVI data for use in the unsupervised classification. This requires recompositing the 10day composites into monthly data sets. The use of monthly rather than 10-day composites represents a compromise between temporal frequency and the need for cloudfree data (Moody and Strahler, 1994). It also provides a means to reduce data volume while maintaining annual phenological information. Experience has shown that composites representing a longer period are more suitable for image classification due to the substantial improvement in composite quality (Zhu and Yang, 1996).
Masks representing water bodies, snow and ice, and barren or sparsely vegetated areas are developed to eliminate NDVI data from the composites for those areas where the meaning of the NDVI values is ambiguous. In addition, the masked data set has a reduced overall variance and the classes defined using unsupervised classifications are therefore more representative of landscape patterns. The water mask is developed through the interpretation of single-date AVHRR channel 2 (near-infrared) images supplemented with water body information taken from the Digital Chart of the World (Defense Mapping Agency, 1992). Snow and ice, barren, and sparsely vegetated masks are produced from a 12-month maximum value NDVI composite threshold values that vary according to continental characteristics.
The initial segmentation of the 12-month NDVI composites into seasonal greenness classes is performed using unsupervised clustering. This classification method is often used for studies in which the location and characteristics of specific classes are unknown. Unsupervised classification uses clustering to identify "natural" groupings of pixels with similar NDVI properties. In this case, the clusters correspond to annual sequences of greenup, peak, and senescence. The specific clustering algorithm used is CLUSTER, a variation of the K-Means algorithm that has been optimized for use with large data sets (Kelly and White, 1993). It is an iterative statistical clustering algorithm that defines clusters or groups of NDVI values with similar properties. The clustering is controlled by predetermined parameters for number of iterations and number of resulting clusters. The clusters are defined by channel mean vectors and covariance matrices. The specific number of clusters for each continent was based on an empirical evaluation of several clustering trials.
The purposes of the preliminary labeling step are to provide a general understanding of the characteristics of each cluster (or seasonal greenness class) and to determine which classes have two or more disparate land cover classes represented within their spatial distribution (e.g., a class may include a mixture of both broadleaf deciduous trees/shrubs and cropland). Preliminary labeling involves inspecting the spatial patterns and spectral or multitemporal statistics of each class, comparing each class to reference data, and making decisions concerning land cover types.
The preliminary labeling step includes two primary tasks. The first is the generation of statistics and graphics for each class, describing their relationship to the available ancillary data (for example, graphs profiling the temporal sequence of NDVI, graphs of class elevation ranges, and tabular summaries comparing the seasonal greenness classes to nominal data sets). The second task is the interpretation of the summaries, graphs, and reference data to determine the general land cover type or types associated with each seasonal greenness class and to identify the classes that represent two or more disparate land cover types. Typically, a minimum of three interpreters label each class. Where differences exist, the interpreters compare decisions and consult reference materials in order to arrive at a consensus.
Post-classification stratification is used to separate classes containing two or more disparate land cover types. Experience has shown that at least 70% of the seasonal greenness classes represent multiple land cover types (Brown and others, 1993; Running and others, 1995). Most of these types of problems are the result of spectral similarities between natural and agricultural land cover. These problems can usually be solved by developing criteria based on the relationship between the confused seasonal greenness classes and selected ancillary data sets. Elevation and ecoregions data have proven to be the most useful ancillary variables for post-classification stratification (Brown and others, 1993).
There are two tasks involved in the post-classification stratification step. The first is to determine the ancillary variables and preliminary decision rules that separate the classes identified in the preliminary labeling step as having multiple land cover types. The second task is to implement and refine the decision rules. Generally, this is an iterative process in which the initial criteria are tested, refined, and finally used to permanently modify the original class. This results in a number of new seasonal greenness classes, that through the following step, become the final map units (seasonal land cover regions). A complete history of the processing of each class is maintained.
Following the generation of the seasonal land cover regions in the postclassification stratification step, the remaining steps in data base development are: (1) generate final attributes; (2) determine the land cover type or types for each class; and (3) derive thematic data sets.
As with the preliminary labeling step, the final land cover characterization involves generating a suite of attributes that describe the characteristics of each seasonal land cover region. Both statistics and contingency tables are created between the final seasonal land cover regions layer and the respective ancillary variables (NDVI, AVHRR channels 15, elevation, ecoregions, etc.). The attributes are part of the global land cover characteristics database and, in addition, are used as evidence in the determining the final land cover types.
A convergence of evidence approach is used to determine the land cover type for each seasonal land cover class. All available documentation, including the region attributes, image maps, atlases, other existing land cover/vegetation maps, and any other relevant materials are consulted and compared to the spatial distribution of each region. As before, at least three interpreters are used to insure consistency.
The seasonal land cover regions are then translated into the Global Ecosystem framework (1994a, 1994b). Olson has defined 94 ecosystem classes that are based on their land cover mosaic, floristic properties, climate, and physiognomy. The Global Ecosystem framework provides a mechanism for tailoring data to the unique landscape conditions of each continent, while still providing a means for summarizing the data at the global level. The Global Ecosystem types have been cross-referenced to land cover classes in the Simple Biosphere Model (SiB), Simple Biosphere 2 Model, the Biosphere Atmosphere Transfer Scheme (BATS), International GeosphereBiosphere Programme (IGBP), and USGS/Anderson (see Table 1).
Table 1. Example translation table to derived data legends.
The final task associated with this step is the generation of the derived data sets, including land cover and seasonal measures. In this step, the seasonal land cover regions are aggregated (or renumbered) into the appropriate classes of the output classification legends. Urban areas, extracted from the Digital Chart of the World (Defense Mapping Agency, 1992)are added to three of the derived data sets: Global Ecosystems, IGBP Land Cover, and the USGS Land Use/Land Cover system.
The following derived data sets are included in the global land cover data base:
Accuracy statistics for one land cover layer from the version 1.2 Global Land Cover Characteristics Database (GLCC), the International Geosphere Biosphere Programme (IGBP) DISCover data set, was established by researchers at the University of California, Santa Barbara (UCSB). The UCSB validation was funded through the NASA Land Cover Land Use Change Program on behalf of the IGBP Land Cover Working Group (LCWG). While the results of this validation exercise do not provide conclusive evidence of the accuracy of the other land cover classifications included in either version 1.2 or version 2.0 GLCC, they provide a general indication of data quality.
IGBP DISCover accuracy figures were derived using a simple random sample stratified by land cover type (Belward and others, 1999). To determine the true cover type, three interpreters independently interpreted either Landsat TM or SPOT images covering each sample. In order for the AVHRR pixel to be called correct, the majority of the three interpreters (2 of 3) had to agree on the land cover type, as interpreted from Landsat or SPOT, for the sample point. Based on the original IGBP LCWG validation protocol, the overall accuracy figures are (Scepan, 1999):
Sample point accuracy 59.4%
Area-weighted accuracy 66.9%
The area-weighted accuracy weights the importance of each class accuracy based on the land area occupied by that class. These accuracy figures are based on the assumption that if the three people interpreting the Landsat or SPOT data could not reach a consensus on the "true" cover type (meaning there were three different answers), then the AVHRR classification was declared to be incorrect - event though there was no evidence that it actually was wrong. As a result, the LCWG developed a revised set of accuracy statistics. These figures, referred to as "Majority Rule' accuracy, is based on the assumption that if the "true" cover for a sample could not be determined by the interpreters, then the sample should be thrown out. Based on this assumption, the overall accuracy numbers are (Scepan, 1999):
Majority Rule Accuracy 73.5%
Area Weighted Majority Rule 78.7%
It must be noted that there is no statistical validity of these figures because of the reduction of the number of useful validation samples.
Another perspective on DISCover accuracy is provided by Defries and Los (1999). They investigated the impacts of the accuracy of DISCover for one application, climate modeling. In their study, they aggregated classes into groups corresponding to two key parameters used in climate models: leaf area index (LAI) and surface roughness. Based on this aggregation, the applications accuracy of DISCover for estimating those parameter are:
LAI Accuracy 84.5%
LAI Area Weighted Accuracy 90.2%
Surface Roughness Accuracy 82.4%
Surface Roughness Area Weighted Accuracy 87.8%
A comprehensive presentation of all aspects of the validation, including per class accuracy tables, can be found in the September 1999 volume of Photogrammetric Engineering and Remote Sensing: Special Issue on Global Land Cover Validation.
The first version of the global land cover database was completed and released to the public in November, 1997. We applied the feedback we received from the users of this database (Brown and others, 1999) and broad lessons learned from the validation exercise of the IGBP DISCover land cover data (Scepan, 1999; Muchoney and others, 1999) to the development of this revised version of the database. Version 2.0 of the Global Land Cover Database contains updated land cover and water classes. The revised version is based on the land cover regions that are found in Version 1.2. Consequently, this version of the database is still based on the 1992-1993 AVHRR time series, and therefore, represents the land cover patterns for that period. Detailed information about the revised seasonal land cover regions can be found in a set of ascii tab-delimited text files, one for each continental database:
The global land cover characteristics data are in a flat, headerless raster format. The raster images contain class number values for each pixel that correspond to the appropriate land cover classification scheme legend. Data are distributed as compressed and uncompressed single-band images. The files can be obtained either by anonymous file transfer protocol (ftp) or downloaded from the LP Distributed Active Archive Center (DAAC) World Wide Web site: (http://LPDAAC.usgs.gov/glcc/globe_int.php). The instructions for accessing these files are contained in the following two sections.
On the Global Land Cover page contains links to all documentation files and the image files (both compressed and uncompressed). NOTE: World Wide Web browsers can vary in how the files will be downloaded. On PCs, some browsers will allow a user to interactively select the location where the file will be saved and to edit the file name. However, on certain browsers files may be automatically downloaded to a default storage location on the local system.
The land cover characteristics data base is available for each of five continental areas and for the entire globe. The continental land cover characteristics data bases are available in two different map projections: the Interrupted Goode Homolosine and the Lambert Azimuthal Equal Area (see Steinwand, 1994 , and Steinwand and others, 1995, for a complete description of these projections). The geometric characteristics for each continent are described in the individual documentation files for each continental data set. The global data are now available in two projections--the Interrupted Goode Homolosine projection and the Geographic (or Plate Carre) projection.
The data dimensions of the Interrupted Goode Homolosine projection for the global land cover characteristics data set are 17,347 lines (rows) and 40,031 samples (columns) resulting in a data set size of approximately 695 megabytes for 8-bit (byte) images. The following is a summary of the map projection parameters used for the Interrupted Goode Homolosine projection:
Projection Type: Interrupted Goode Homolosine
Pixel Size: 1000 meters
Radius of sphere: 6370997 m.
XY corner coordinates (center of pixel) in projection units (meters):
Upper left: (-20015000, 8673000)
Upper right: (20015000, 8673000)
Lower right: (20015000, -8673000)
The data dimensions of the Geographic projection for the global land cover characteristics data set are 21,600 lines (rows) and 43,200 samples (columns) resulting in a data set size of approximately 933 megabytes for 8-bit (byte) images. The following is a summary of the map projection parameters used for the Geographic projection:
Projection Type: Geographic
Pixel Size: 30 arc seconds
Radius of sphere: 6370997 m.
XY corner coordinates (center of pixel) in projection units (arc
Upper left: (-647985, 323985)
Upper right: (647985, 323985)
Lower right: (647985, -323985)
Global Ecosystems Legend
IGBP Land Cover Legend
USGS Land Use/Land Cover System Legend (Modified Level 2)
Simple Biosphere Model Legend
Simple Biosphere 2 Model Legend
Biosphere Atmosphere Transfer Scheme Legend
Vegetation Lifeforms Legend