We have developed a GIS-based statistical technique which empirically predicts changes in the spatial distribution of a plant or animal species over a geographic area (the continental U.S.) which has undergone a scenario of change in specified environmental condition(s). The technique is illustrated with two economically-important tree species:Pinus taeda, loblolly pine, and Acer saccharum, sugar maple.
We use the organism's current distribution to define its Hutchinsonian
N
-dimensional niche, then alter the environmental conditions and use the niche definition to "paint" the organism's new distribution back onto the map. We statistically define the occupied niche for each tree species inN
-dimensional environmental space before any change, then change environmental conditions, and then re-map the new, altered places that are now suitable for the organism. This technique is significant for assessing predicted effects of changes in environmental conditions (i.e., global warming) on the distribution ranges of both plants and animals.
Using Multivariate Geographic Clustering, we have produced several maps of ecoregions across the 48 conterminous United States at a resolution of one square kilometer per cell. Ecologists have long used the concept of the ecoregion, an area within which there are similar ecological conditions, as a tool for understanding large geographic areas (Bailey 1995, 1996; Omernik 1987, 1995). Multivariate Geographic Clustering employs a non-hierarchical grouping of individual cells in a digital map from a Geographic Information System (GIS) for the purpose of classifying the cells into types or categories based on characteristics identified as important to the growth of vegetation. Nine characteristics from three categories--elevation, edaphic (soil) factors, and climatic factors--were identified as important. These national data are represented as over 7.8 million map cells in a 9-dimensional data space (Hargrove and Luxmoore 1998).
Multivariate Geographic Clustering (MGC) is a technique combining multivariate statistics and a Geographic Information System (GIS) which objectively computes the placement of borders between ecoregions, given maps of all environmental conditions that one wishes to be considered. Rather than relying on expertise, MGC uses the standardized values of each environmental condition for each individual raster cell in the map as a set of coordinates which specify a position for that raster cell in an environmental data space having as many dimensions as the number of included environmental characteristics. Two raster cells from anywhere in the map having similar combinations of environmental characteristics (two mountaintops, for example) will be located near each other in data space, and their nearness and relative positions will quantitatively reflect their environmental similarities.
After their disassembly from geographic space, the map cells are re-plotted in environmental data space like stars in a data universe. Because the density of these cells in data space is not uniform, we use an iterative classification procedure to group various nearby ``stars'' into clusters having similar combinations of environmental conditions. This procedure begins with the specification by the user of the desired number of ``galaxies'' or clusters into which the stars are to be grouped. All observations are examined sequentially to find the most widely separated set of stars which will provide this number of initial cluster ``seeds.'' Thus, the number of ecoregions which result from the process is under the user's control.
Each map cell is then compared against all cluster seeds, and the map cell is assigned membership to the cluster whose seed is closest to it in terms of Euclidean distance. After all map cells have been assigned, new cluster centroids are calculated to be the mean of each coordinate over all cells assigned membership to that cluster. Then the iterative assignment procedure repeats. Stars do not move in environmental data space; rather, the centroids of the cluster ``galaxies'' slowly slew until an equilibrium classification is obtained. When fewer than a specified number of map cells change cluster assignment in a particular iteration, the process converges and halts.
We have successfully used Multivariate Geographic Clustering to objectively regionalize at scales from single and multiple states to the entire United States, as well as to characterize the nature of borders between ecoregions. Most recently, MGC has been used through time to map similarities between present-day growing conditions and paleoclimatic conditions present at the Last Glacial Maximum (20,000 - 18,000 YBP). For very large clustering problems, we are experimenting with Multivariate Geographic clustering within a metacomputing environment comprised of multiple supercomputers interconnected via the Internet (Mahinthakumar et al., 1999).
Because Multivariate Geographic Clustering with these large amounts of data is computationally intensive, we employ a parallel computer constructed from low-cost commodity personal computers (Hoffman et.al., 1997). Working in unison, each of the standard PCs acts as a node in the composite parallel machine. By dividing the clustering problem into smaller tasks, each node can solve a small part of the overall problem, enabling the machine to quickly and successfully cluster the entire region of interest at very high resolution (Hoffman and Hargrove 1999a, 1999b).
The foundation of the assessment tool for predicting changes in
species distributions is a statistical embodiment of Hutchinson's
(1957, 1965) definition of an organism's ecological niche as "an
N
-dimensional hypervolume." We have used detailed maps of the
organism's current distribution to define its niche, and then used the
niche definition to "paint" the prediction of the full extent of the
possible habitable distribution back onto the map. We have
statistically defined the occupied part of the niche hypervolume for two
economically-important tree species in N
-dimensional cluster
or environmental space, then re-mapped all places that are suitable for
colonization by these tree species.
G.E. Hutchinson (1957, 1965) suggested that an organism's niche could be
visualized as a multidimensional space, or hypervolume, formed by the
combination of gradients of each single environmental condition to
which the organism was exposed. The N
environmental
exposure conditions form a set of N
intersecting axes
within which one can define an N
-dimensional niche hypervolume unique
to each species. The niche hypervolume is comprised of all
combinations of the environmental conditions which permit an individual
of that species to survive and reproduce indefinitely. Hutchinson
distinguished the fundamental niche, defined as the maximum inhabitable
hypervolume in the absence of competition, predation, and parasitism,
from the realized niche, which is a smaller hypervolume occupied when
the species is under biotic constraints. Hutchinson also defined the
niche breadth for an organism as the habitable range, between the
maximum and the minimum, for each particular environmental variable.
Thus, the niche breadth is the projection of the niche hypervolume onto
each individual environmental axis. Hutchinson's ideas still form the
foundation of niche theory in modern ecology.
We have employed MGC to objectively define the niche for particular organisms by statistically examining combinations environmental conditions occurring at locations currently inhabited by that organism. This niche hypervolume can then be used to project changes in geographic range under possible scenarios of environmental change. In effect, we have constructed a statistical ``catalog'' of all combinations of environmental conditions under which a particular tree species is known to survive into a niche hypervolume, and then we compare this ``catalog'' to environmental combinations found under a scenario of altered conditions, in order to predict and map the extent and shape of the new geographic species distribution which would result.
The shape and geometry of the niche hypervolume provide clues to environmental sensitivities for particular species. An analysis of niche breadth to determine the narrowest part of the niche hypervolume indicates the environmental characteristic most severely constraining this species. Identification of the most severely-constraining environmental gradient can be useful for either encouraging threatened species, or eliminating invasive ones.
We begin with the current distribution map for each tree species. We submit all cells where the tree species currently occurs to one of our Multivariate Spatial Clustering analyses. Unlike our previous clustering studies, we do not perform a Principal Components Analysis, but separately retain all of the individual environmental variables for multivariate clustering.
The goal of this step is to characterize and define the occupied (or occupiable) volume of environmental space by statistically cataloging all combinations of environmental conditions which are habitable by this species. Rather than specifying the number of clusters, we use a technique in which we specify the radius of the clusters. The radius of each cluster reflects the within-cluster variance, or the homogeneity of environmental combinations within each cluster. The smaller the radius that we specify, the more clusters will be able to ``pack into'' the niche hypervolume. Specifying a very small radius results in a large number of clusters. These clusters, once they have completely filled the niche hypervolume, serve to precisely define the location and shape of the niche space for this particular tree species.
To predict a new distribution for the species under altered environmental conditions, we re-cluster using all cells from a map of the area for which the prediction is desired using the same radius and the same cluster centroids as before. Altered map cells from the prediction area which are within the organism's defined niche space will be "captured" into the clusters defining the niche space, but map cells outside the niche will not. When the locations are re-assembled into a prediction map, the altered spatial distribution now habitable by this species under the new environmental conditions will be shown as the cells which were assigned to these clusters.
If the current distribution map is not simply present/absent, but has some surrogate measures of performance or "fitness" of the organism at each location (i.e., abundance, density, biomass), it is possible to assign the same fitness measure to the new distribution map. The predicted distribution map under the altered environment will not be simply binary, but will project how well the tree will do in each location under the new conditions. New areas in the map which are similar to places known to be occupied by this species will be identified and mapped.
Because the environmental gradient dimensions are measured in different units, all dimensions must be standardized to a mean of zero and a unit standard deviation before MGC. However, the scenario data dimensions are likely to have a different mean and standard deviation from those used to generate the niche hypervolume. Therefore, the niche hypervolume must be transformed so that it comes to lie in the proper geometric position with respect to the scenario data.
This geometric transformation of the niche hypervolume involves two manipulations: a simple translation to account for differences in the mean for each dimension, and a stretching or shrinking to account for differences in the standard deviation of each dimension. Because the stretching or shrinking manipulation is unique along each dimension, the radius around each centroid describing the niche hypervolume is no longer uniform in all directions, but must be proportionally modified for each dimension.
Ecological expertise is required to correctly identify and include the environmental gradient layers which are potentially limiting the geographic distribution of the species. We performed a jackknife test in order to determine if the Hutchinsonian niche hypervolumes, as defined, adequately included and represented the environmental factors functionally restricting the geographic distributions of loblolly pine and sugar maple trees.
As a test of the niche hypervolumes, we played the current environmental conditions in all 7.8 million cells in the conterminous United States, as described by the same 25 environmental dimensions, back through them. The species distribution map which is predicted for these current conditions should strongly resemble the current species distribution map with which the process began. If the output map is too geographically extensive, then an environmental gradient limiting the actual distribution of the tree has been left out. If, on the other hand, the output map is not extensive enough, then some environmental condition has been included which does not actually limit the range of the tree. By iterating the inclusion of alternative layers with such a jackknife test until the output map predicted under current environmental conditions is nearly identical to the input range map, one can assure that the niche hypervolume is adequately described in terms of the limiting environmental gradients.
Finding a set of maximally-separated map cells as the centroids for an unknown number of fixed-radius clusters such that all inhabited map cells are contained within a cluster is computationally intensive. Because this step used to define and locate the niche hypervolume is not amenable to parallelization, the hypervolumes reported here were computed serially on a single machine.
There may, however, be a way to paralellize the niche hypervolume definition process. One would like to implement a ``best-of-the-best'' strategy, where each node independently calculates a constellation of variable numbers of maximally-separated radius-sized clusters such that its portion of map cells are completely contained within those clusters. Then, another node would collect the constellations produced by two initial nodes, and treat these as its map cells, finding the best single constellation representing them both.
However, if the two constellations overlap, the clusters found by different nodes will almost certainly be closer to each other than the radius distance, and will be eliminated. This will produce ``orphan'' map cells which are no longer represented by clusters in the niche hypervolume. If we double the desired radius when using the hypervolume, however, even the worst-case ``orphan'' map cells will be included in clusters. This ``radius-doubling'' idea remains untested, but would permit the rapid definition of niche hypervolumes if it solves the ``orphan'' problem at all higher dimensionalities.
The number of clusters that are used to statistically define the niche volume is unimportant. However, fewer, larger-radius clusters can be used to ``fill in'' gaps between relatively sparse empirical occurrence data. On the other hand, many clusters with small radii can be used to define the niche volume very precisely when high-quality, detailed maps of current species distributions are available as input.
In order to make a predicted species distribution map for a particular change scenario, the combination of environmental conditions present in each of the cells in the map are clustered in environmental space. The centroids of the clusters defining the niche hypervolume for the species are used as seeds, with the radius set to the same value used when the niche was defined. A single pass is made through the iterative cluster assignment procedure, and cells in the map which are within one radius distance of (i.e., similar enough to) clusters inside the niche hypervolume will be assigned membership to those clusters. Map cells with combinations of conditions not similar enough to niche hypervolume clusters will not be assigned to a cluster. Mapping of all cells back into geographic space reveals the spatial extent and arrangement of areas predicted to be inhabited by the species under the new conditions.
To test or use the niche hypervolume to make a prediction, we employed a parallel algorithm which begins with the set of initial ``seed'' clusters defining the niche hypervolume location in environmental data space. After transforming the hypervolume into the same data space as the map of altered conditions, each cell in map space is sequentially tested against each cluster in the hypervolume, dimension by dimension, to determine if it is within that cluster. The radius is uniquely altered for each dimension, and the coordinate for the map cell is tested to determine if it is within the altered radius of the centroid for that cluster. As soon as a cluster is found which contains that map cell, the map cell is assigned that cluster, and the algorithm considers the next map cell. Thus, the assigned cluster may not necessarily be the closest, but the map cell is assured to be within the niche hypervolume for this species.
In the prediction process, a single pass is made through the map cells, and the centroids defining the niche hypervolume are not altered. The prediction code runs in parallel by subdividing all map cells. Each node considers its portion of the map cells, and sequentially tests whether they are within the niche hypervolume. Therefore, using the niche hypervolume to make predictions is computationally fast.
Two surrogate fitness measures were used for each of the two tree species, as obtained from the national STATSGO data base published by NRCS. Niche hypervolumes were defined and located for each species at a specified radius of 0.75 within an environmental data space comprised of 25 environmental gradients. The niche definition for loblolly pine consisted of 49,322 clusters, while the niche definition for sugar maple contained 45,489 clusters. This figure shows the map of the 49,322 raw clusters, colored randomly, from the entire United States which are within the loblolly niche hypervolume under current environmental conditions.
As a check, the current geographic distributions for each species were played back through their respective hypervolumes to ensure that all map cells where each species are known to occur were within the niche definition. All map cells were included in each niche hypervolume definition. From these output data sets, a mean fitness was calculated and assigned to each cluster centroid from the fitness values at each map cell which was assigned to that centroid. This mean fitness would be projected onto any map cell falling within this cluster when the niche hypervolume is used to make new range predictions.
When the two surrogate fitness measures, woody production and site index, are applied to each of the raw clusters within the 25 dimensional data space, the output maps predicted under current conditions strongly resemble the input map in each case:
The predicted geographic ranges under current conditions strongly resemble the original geographic ranges in each case, particularly for the heart of the geographic distribution. If the species occurrence maps are viewed as a ``bull's eye,'' it is the outermost rings of the bull's eye where the greatest differences occur. In each case, these outer rings, where the species can survive but its fitness is poor, are greater in extent in the predictions than in the originally-input distributions.
We believe that this effect is due to uneven reporting of species occurrences in the original distribution data sets in places where conditions for the species are sub-optimal. The niche hypervolume technique makes the application of all fitness projections more uniform, which probably results in an improvement in the outermost fringes of the species geographic range.
The niche hypervolumes determined for these two tree species using this technique, in terms of these 25 environmental gradients, appear to be adequate definitions for the Hutchinsonian niche space occupiable by these trees. In the future, we plan to use the niche hypervolumes for loblolly pine and sugar maple to predict changes in the geographic ranges for these two species under the 2XCO2 global change scenarios projected by the Hadley and CCC global change simulations, as provided by the VEMAP2 program.
Most techniques are not well-suited to make use of partial survey and mapping data. Point occurrence survey data, although they may not be useful otherwise, are ideally suitable as input for niche hypervolume definition. Such map data are ordinarily considered ``incomplete'', since users cannot be sure whether a species is not present at a site, or whether that site simply was not surveyed. The non-random spatial distribution of collection sites (i.e., along roadways or transects) is also often problematic.
For the purpose of characterizing the niche hypervolume, however,
the spatial distribution of species survey data in geographic space is
unimportant. The important thing is that all combinations of
environmental conditions are represented in the input data, and that
all regions of the N
-dimensional environmental space are
adequately sampled. This adequacy can occur just as easily with spot
surveys as with continuously-mapped species occurrence data.
It may be possible to simplify the dimensionality of a niche hypervolume, once defined, based on niche breadth. Environmental axes with particularly broad niche breadth indicate that the species is tolerant of a wide range of conditions along that environmental gradient; therefore, little predictive power is added by the inclusion of that environmental axis in the niche definition.
The niche breadth in terms of each of the N
environmental condition axes can be used to roughly describe and
convert the hypervolume, once defined, into terms which can be easily
utilized by a standard GIS system for susceptibility prediction. By
performing a Boolean union of conditions within each of the multiple
ranges, federal land planners can use any GIS to make their own
geographic range predictions for particular species over any area for
which they have environmental change scenarios.
The series of N
environmental ranges obtained by
projecting the niche hypervolume onto each of the N
environmental axes in turn define the dimensional measurements of a
faceted polyhedron with 2N
faces. This solid is the
smallest regular polyhedron which completely contains the irregular,
complex niche hypervolume. Because the hypervolume itself is a subset
of this envelope polyhedron, susceptibility maps generated with the
polyhedron-based niche breadth ranges will slightly overpredict or
overestimate the area suitable for occupation.
Once the niche hypervolume has been defined for a species, a rough prediction of the altered geographic range can be performed using a standard GIS system by specifying the range of conditions within which the tree has been found, parameter by parameter. The Boolean union of these ranges essentially re-forms the niche hypervolume. Application of the set of ranges can be accomplished using any GIS to project a new geographic range for the species.
Niche Hypervolume Susceptibility analysis has the potential to become a unified, single-implementation approach to predicting geographic ranges of species under altered environmental conditions which could be easily coordinated and implemented across multiple agencies. Niche hypervolume definitions, either in terms of cluster centroids or niche breadth ranges, could become a medium of exchange and coordination among agencies. Agencies could obtain hypervolume definitions for particular species from a central repository, and either use them directly to generate predicted range maps for their change scenarios of interest, or, if they have detailed distribution maps of currently occupied areas, could use those data to refine the niche hypervolume definitions, returning improved hypervolumes to the central repository. Based on first principles of ecological niche theory, the niche hypervolume technique is general enough to predict spatial distributions under altered conditions for any species.
Bailey, R.G. 1995. Description of the ecoregions of the United States. (2nd ed., 1st ed. 1980). Misc. Publ. No. 1391, Washington, D.C. U.S. Forest Service. 108 pp with separate map at 1:7,500,000.
Bailey, R.G. 1996. Ecosystem Geography. Springer-Verlag. 216 pp.
Hargrove, W.W., and F.M. Hoffman. 1998. National Clustering. URL: http://www.esd.ornl.gov/projects/clustering/
Hargrove, W.W., and R.J. Luxmoore. 1997. A Spatial Clustering Technique for the Identification of Customizable Ecoregions. URL: http://www.esri.com/library/userconf/proc97/PROC97/TO250/PAP226/P226.HTM
Hargrove, W.W., and R.J. Luxmoore. 1998. A New High-Resolution National Map of Vegetation Ecoregions Produced Empirically Using Multivariate Spatial Clustering. URL: http://www.esri.com/library/userconf/proc98/PROCEED/TO350/PAP333/P333.HTM
Hoffman, F.M., W.W. Hargrove, and A.J. Schultz. 1997-1999. The Stone SouperComputer - ORNL's First Beowulf-Style Parallel Computer. URL: http://www.esd.ornl.gov/facilities/beowulf/
Hoffman, F.M., and W.W. Hargrove. 1999a. "Cluster Computing: Linux Taken to the Extreme." Linux Magazine, Vol. 1, No. 1, pp. 56-59.
Hoffman, F.M. and W.W. Hargrove. 1999b. "Multivariate Geographic Clustering Using a Beowulf-style Parallel Computer." In Proceedings of the International Conference on Parallel and Distributed Processsing Techniques and Applications (PDPTA '99), Volume III, H. R. Arabnia, Ed. ISBN 1-892512-11-4, CSREA Press, pp. 1292-1298.
Hutchinson, G.E. 1957. Concluding Remarks. Cold Spring Harbor Symp. Quant. Biol. 22:415-427.
Hutchinson, G.E. 1965. The niche: An abstractly inhabited hypervolume. In: The Ecological Theatre and the Evolutionary Play. New Haven, Yale University Press, pp. 26-78.
Mahinthakumar, G, F.M. Hoffman, W.W. Hargrove, and N.T. Karonis. 1999. Multivariate geographic clustering in a metacomputing environment using Globus. Supercomputing '99.
Omernik, J.M. 1987. Ecoregions of the conterminous United States. Map (scale 1:7,500,000). Annals of the Association of American Geographers.
Omernik, J.M. 1995. Ecoregions: a spatial framework for environmental management. pp. 49-62 In: W.S. Davis and T.P. Simon, Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Lewis, Boca Raton.
For additional information contact:
*Oak Ridge National Laboratory, managed by Lockheed Martin Energy Research Corp. for the U.S. Department of Energy under contract number DE-AC05-96OR22464.
"The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes."