An Interface between
Computing, Ecology and Biodiversity: Environmental Informatics
Stockwell, David, Peter Arzberger, Tony Fountain,
and John Helly
San Diego Supercomputer Center
National Partnership for Advanced Computational
Infrastructure, USA
Abstract
The
grand challenge for the 21st century is to harness knowledge of the
earth¡¯s biological and ecological diversity to understand how they shape global
environmental systems. This insight benefits both science and society.
Biological and ecological data are among the most diverse and complex in the
scientific realm, spanning vast temporal and spatial scales, distant
localities, and multiple disciplines. Environmental informatics is an emerging
discipline applying information science, ecology, and biodiversity to the
understanding and solution of environmental problems. In this paper we give an
overview of the experiences of the San Diego Supercomputer Center (SDSC) with
this new multidisciplinary science, discuss the application of computing resources
to the study of environmental systems, and outline strategic partnership
activities in environmental informatics that are underway. We hope to foster
interactions between ecology, biodiversity, and conservation researchers in
East Asia-Pacific Rim and those at SDSC and the Partnership for Biodiversity
Informatics.
Key Words: biodiversity, systematics,
environmental informatics, ecology, computing, SDSC, PBI
The
Motivation
The grand challenge for the 21st century is
to harness knowledge of the earth's biological and ecological diversity to
understand how they shape global environmental systems. This insight is critical for managing
natural resources, sustaining human health, ensuring economic stability, and
improving the quality of human life. As the conversion of natural systems to
agriculture and urban systems decreases biological and ecological diversity,
the need for this information becomes more urgent. In fact, at the current rate
of environmental degradation and species extinction, the worldwide biological
science community, working with society-at-large, has approximately 50 years or
so to answer the challenge of declining diversity and its ramifications
(Systematics Agenda 2000 1994; Wilson 1998; Raven and Wilson 1992), leading
Wilson to predict that the coming century will be the century of the
environment (Wilson 1998).
Many universities now offer courses in environmental
informatics, an action arising from stakeholders in the academic and existing
environmental regulation and policy management structure awakening to the need
for sophisticated information systems if they are to quickly respond to current
conditions. This is informed by the experiences of several groups in Canada,
the U.S., Australia, and Europe, where the assumption that environmental
science is orchestrated by planners and conciliators on one hand, and civil and
environmental engineers and scientists on the other, has been challenged.
Environmental informatics addresses problems at the forefront of computing and
statistical science, and thus encounters obstacles in storage; management and
retrieval for large scientific databases (such as distribution ownership,
heterogeneity, access, security, legacy, and quality assurance); noisy and
uncertain task environments; irregular spatial and temporal data distributions;
knowledge representation; statistical computing; expert systems; visualization;
decision support, and Web-based collaboration.
In Pursuit
To
progress toward the grand challenge, existing knowledge must first be successfully
inventoried. There are several key
interrelated strategies for doing so: (a) Developing
a research framework to
understand the underlying processes that will allow the field to move from a
qualitative to a predictive science; (b) establishing
collaborations to tackle the complex, multidisciplinary nature of the
problem; (c) harnessing technology
to develop new tools for conducting science, and to create and then store,
query, and understand the data, and; (d) aggressively creating educational opportunities at all levels and across all
strata of society so that political and social practice may be informed by
scientific knowledge[RW1]. We
would like to elaborate a bit on each of these strategies.
The role of fundamental research (a) is to
elucidate the states, processes, and interactions of the natural environment
under anthropogenic influences. The study of environmental pattern and process
requires both analysis and synthesis to provide a foundation of information
(NSB 1999) across all scientific, engineering, and mathematics disciplines.
The dynamics of Earth¡¯s biodiversity are vastly complex, with multiple, diverse, and dispersed
causes (Jasanoff et al. 1998). Therefore, it cannot be understood, managed, or controlled through scientific
activity organized on single or traditional disciplinary lines. Instead,
biodiversity requires cross-domain,
interdisciplinary approaches (b).
The data and tools (conceptual, physical, computational, etc.)—many of which are still under development—required
to investigate its dynamics are beyond the scope of any single investigator and
often beyond the mission, infrastructure, or expertise of any single
institution[RW2]. Therefore research in this
field requires interdisciplinary, collaborative teams working within and across
institutions.
Ecological and
systematics data are among the most heterogeneous and complex in the scientific
realm. By their very nature they
are multidisciplinary and multiscalar, with origins ranging from geochemistry
to climate patterns with all levels of life sciences in between. In addition,
the data are from many locales, are highly disparate in format, and span
enormous scales of space and time.
Nevertheless, integrated information is essential to a comprehensive
understanding of the patterns and processes of both ecology and systematics as
well as their area of overlap (e.g., biodiversity). The move from data to information to knowledge is only
possible using computational and
information technologies (c).
Education (d) [RW3]is necessary at all levels, to train and inform both researchers and the
public. Within research, individuals must understand the underlying theories of
the processes, be keen observers, and be able to develop and apply relevant
technologies to address the challenges of integration, synthesis and analysis.
At a societal level, environmental biological diversity affects and is affected
by all humans on Earth, and the Earth¡¯s
citizens must be informed and involved so that reasonable policy and social
practices, based on knowledge, can be established.
For the purposes of this
paper, we would like to present the steps and experiences that SDSC and its
partners have undertaken and met with while employing the above strategies. For
background, though, it is first useful to answer the question, Why is a supercomputing center interested
in environmental issues?
The original mission of SDSC—which is funded by the National
Science Foundation (NSF)—was to provide service (i.e.,
access to cycles and corresponding support) to the academic community. But
during the first few years of our existence we learned that the original
service mission should not and could not be separated from development and
research. This realization changed our mission to ¡°providing world leadership in advancing knowledge through the application
and development of advanced computational technologies.¡± In our ongoing efforts to address this mission we actively engage in outreach
to new communities of users.
At the time of SDSC¡¯s inception in1985, the primary
new community targeted was the physical sciences, who had sophisticated
computational models that demanded high-end computing. Since then there have
been several major technical and sociological changes that have changed the
nature of supercomputing and thus the communities with which SDSC is
involved. The Internet and Web
applications have impacted how scientists interact with computers and each
other, the types of services that a center like SDSC can provide, and even how
society-at-large interacts and accesses information. Data storage and retrieval
technologies have expanded tremendously concurrent with an explosion in the
amount of data generated within many disciplines (biology in particular). The speed of computation has also
increased: Personal computers are now as powerful as the machines once only
available at a handful of national laboratories. When SDSC began its work, our ¡°revolutionary¡± CRAY X-MP/48 ran at 840 million
calculations per second, comparable to the top speeds of a top-of-the-line PC
available today. Finally, research has become increasingly multidisciplinary
(Jasanoff et al. 1998).
Today, environmental informatics offers SDSC a compelling imperative and
opportunity to apply computing and information technology within yet another
new community, and through key hires and strategic partnerships we have
acquired the expertise to make a contribution. The following examples, which build upon each other,
illustrate how SDSC has strategically pursued its role within the grand
challenge.
1. Computational Ecology: In
1995, as a first step toward defining opportunities at the interface between
computing and ecology, John Helly organized a workshop at SDSC to ¡°gather
together ecologists and computer scientists for the purpose of identifying
those technology issues which impede the progress of ecological research¡±
(Helly et. al 1996). Workshop
participants identified the three key areas of technology need: data
management, modeling, and visualization.
Common to these areas is
the need to develop standards to facilitate the sharing of data, the comparison
of results, and the integration of model components. Unique to the problem of
data sharing is the issue of the proprietary nature of research data and the
lack of institutional incentives for data sharing along with the usual issues
of intellectual property. The use of visualization continues to be inhibited by
the highly specialized knowledge still required to effectively utilize current
visualization packages along with fundamental questions such as how should
statistical error be represented in visual presentations. The modeling
community is challenged by questions such as how multi-scale, multi-resolution
models may be integration across disciplines, as well as the desire for
collections of standard realizations of model components to minimize the
redundant software development and facilitate the comparison of modeling
analysis. (Helly et al. 1996)
Data,
visualization, and modeling have since formed a framework for our other
activities within environmental informatics, including:
2. Biodiversity Informatics:
In 1997, the NSF awarded SDSC with funds to initiate activities in
computational biodiversity and we hired David Stockwell, who has worked in
Australia on a number of environmental issues and who had participated in part
of the Environmental Resource Information Network (ERIN). David developed the Biodiversity
Species Workshop (http://biodi.sdsc.edu),
a Web-based application for experimental analysis, mapping, and prediction of
worldwide species distribution data (Stockwell and Peters 1999). Researchers can submit their own
latitude and longitude data through a Web browser, conduct preliminary mapping
and manipulation of that data, track changing distributions with climate
change, and visualize the outputs using GIS, animation, and/or virtual reality.
The central function of the Workshop is to develop predictions of spatial
distribution of species using algorithms that range from simple to bioclimatic
limit fitting to an artificial intelligence algorithm (Stockwell 1999).
The
Biodiversity Species Workshop, which uses CGI Perl scripts in a frame-based web
application, has already produced interesting results and holds the potential
for other applications. Recently, a study that made use of the Workshop was
conducted to address issues of conversatism of ecological niches in
evolutionary time. The study looked at 37 pairs of closely related forest
species, separated by an arid barrier, and concluded that the ¡°geographic
barriers are the major force driving the formation of new species¡± (Peterson
1999) (i.e., geographic barrier rather than ecology was the critical factor in
speciation).
The
Peterson study was based on data from over 5 years of painstaking negotiation
and database development with museums around the world. A new tool, the Species Analyst (http://chipotle.nhm.ukans.edu),
developed by David Vieglais, a research associate at the University of Kansas
Natural History Museum, promises to these data at a scientist's fingertips.
Species Analyst pulls up records from disparate biodiversity databases, many of
which are located at natural history museums across the country, and can even
access databases compiled using incompatible software. ¡°It taps about 1.5 million records so
far, using a computer program written according to a standard protocol (Z39.50)
that libraries have long used to share bibliographic databases. The Kansas
team's tool sends a query to each database, then pools the data it gets back.
Putting the Web site to work is easy: Just tick any of the boxes next to each
of the nine collections now on the Web, type a search term such as species
name, and hit the query button. In seconds the site will produce a world map
showing where the plant or animal has been found, along with a table--available
as an Excel spreadsheet--that lists each specimen in the museums' collections
and the date, collector, geographic coordinates, and so on¡± (Kaiser 1999).
Both the Species Analyst and the Biodiversity Species Workshop can be used
alone, or linked together to produce a very powerful tool for integrating data
and predicting spatial distribution of species. Other potential uses for this
software suite include prediction of location of species (for determining where
to send an expedition), endangered species preservation design, and managing
deleterious invading species such as the hardwood-chomping Asian longhorn
beetle or the house finch.
3. Maintaining and Providing Access to Data: John
Helly, who had been involved in environmental modeling work, was recruited to
SDSC in 1993. Helly has a background in computer science and has played a key
role in data access and publication issues. He has worked with the Ecological
Society of America (ESA) on their Future of Long Term Ecological Data (FLED)
project, been involved in some initial International Long Term Ecological
Research site visits, and plays a key role in the National Partnership for
Advanced Computational Infrastructure.
In 1995 Helly was the key SDSC participant in an interagency project
involving 31 institutions or agencies that have interest in or responsibility
for monitoring data about the San Diego Bay (http://sdbay.sdsc.edu). The project
provided a forum for discussion between agencies that had not before shared
data, produced a platform and common grid for sharing data, and produced
several hydrological models of the bay within which one can visually navigate
the bay and simulate the impact of potential oil spills.
The San Diego Bay project, combined with insights gained from the FLED
committee report, led to a collaboration between ESA, the National Center for
Ecological Analysis and Synthesis (NCEAS), the Bishop Museum, and the
California State Resources Agency, to develop an ecological data repository to
support the acquisition, management, archival, and sharing of valuable
ecological research data. The group created the Caveat Emptor Ecological Data
(CEED) Web site (http://ceed.sdsc.edu),which provides a complete methodology for
and implementation of a controlled method for publishing and sharing scientific
data of any attribution type.
Currently, CEED contains collections data from the Bishop Museum in
Hawaii and data acquired through the San Diego Bay project.
Partnerships
To
have an impact on the grand challenge and address the funding realities of
research in the U.S., SDSC has also pursued multidisciplinary partnerships with
other likeminded major players—individual investigators and their institutions—within
or related to environmental informatics. Collaborations of these sorts provide
an opportunity to overcome cross-disciplinary barriers, such as those between
ecology and systematics—where the overlap has not been great—and between these
two and computer science. Four of the partnership activities SDSC is involved
with are:
1.
LTER
Network Office (http://www.sdsc.edu/sdsc-lter): In 1996, the University of New Mexico
(UNM) with SDSC teamed on a successful bid to become the Long Term Ecological
Research (LTER) Network Office. Our two institutions shared a vision that the ecology community—in particular,
the LTER sites—would take advantage of SDSC¡¯s high-performance computational
resources to confront the scientific challenges underlying LTER. Our goals
include promotion and dissemination of new computational technologies relevant
to research throughout the LTER network, providing leadership in data
management and dissemination to the LTER community and other scientific
communities, and enhanced interaction between the LTER network and the larger
environmental and scientific communities. Toward this end, in 1999 SDSC hired
Tony Fountain to be its key liaison with the LTER community.
2.
National
Partnership for Advanced Computational Infrastructure
(NPACI) (http://www.npaci.edu): NPACI, led by UC San Diego and SDSC, was formed in 1997 when the
NSF supplanted the original Supercomputer Centers Program—under which SDSC had
been created in 1985—with the Partnerships for Advanced Computational
Infrastructure (PACI) program. PACI combines computer and computational
scientists to develop and deploy an advanced computational infrastructure for
academic research. NPACI, to pair application ¡°push¡± and technology ¡°pull,¡±
established ¡°thrust areas¡± in Technologies (Programming Tools and Environments,
Data-Intensive Computing, Metasystems, and Interaction Environments) and
Applications (Earth Systems Science (ESS), Molecular Science, Neuroscience, and
Engineering). In each NPACI project, application need is linked to a critical
computational technology under development. NPACI ESS projects include:
¡¤
Multiscale, Multiresolution Modeling -
attempting to link together global (e.g. atmospheric and ocean), regional (e.g.
mesoscale climate), and local (e.g. bay and estuary) models together;
¡¤
Quantitative Geography for Ground Truth -
allowing environmental studies of global modeling for climate, the biosphere,
and the carbon cycle;
¡¤
Surface Water Transport and Flow - to
model the flow of bays and estuaries;
¡¤
Real-Time Coastal Data Acquisition
Systems (REINAS) – supporting regional-scale environmental science, monitoring
and forecasting; and
¡¤
Biological Scale Process Modeling – quantifying
biodiveristy and its ecosystem function.
These projects are working closely with the Data-Intensive Computing thrust
area (http://www.npaci.edu/DICE),
which is developing tools (such as the Storage Resource Broker) to share,
manage, and query distributed data, and working on several digital library
technologies
3.
LTER
Biological Scale Process Modeling: Since
the landscapes and ecosystems of the LTER sites are representative of larger
physiographic, climatic, and ecological provinces, the findings at LTER sites
are representative of larger geographic domains. NPACI, the LTER Network
Office, and the University of Kansas have been involved in an effort to help
LTER sites develop a regional perspective in their studies. Towards that goal,
John Helly, Stuart Gage, and Bob Waide organized a series of regional modeling
workshops at SDSC and invited members of the LTER, biodiversity, and computing
communities to attend, explore issues of common interest, and forge new
scientific collaborations.
The first workshop, a joint activity of LTER and NPACI, held in December
1998, had a primary goal of identifying ecological models that could make use
of high-performance networks, computers, and data management facilities. A
secondary goal was to define a series of regionalization experiments making use
of these models. Twelve LTER
models were identified, covering a variety of ecosystems, from coastal regions
to deserts to urban centers, and including such key ecosystem processes as
hydrology, atmospheric processes, and bio-geochemistry. Twelve regionalization experiments
making use of these models were defined.
These included plans to link models across domains (e.g., linking
bio-geochemistry and atmosphere models to capture terrestrial and atmospheric
feedback processes; ecology and museum collections to understand the
relationship between biodiversity and ecosystems function). The experiments also called for
simulations at broader temporal and spatial scales and at finer resolutions
than previously performed. Both
these factors increase the computational burden of the simulations,
necessitating the high-performance computing resources of SDSC. The center¡¯s visualization resources are also called on, as they provide an
opportunity to illustrate the impact of LTER regional research to the public.
A second regional modeling workshop is scheduled for November 10-17, 1999,
and is being organized by Tony Fountain, Stuart Gage and Bob Waide. The
workshop¡¯s goals are to report results in the
development of regionalized versions of models selected in the first workshop,
and to define procedures for the validation of the regional models, including
data and computation. A third regional modeling
workshop is planned in conjunction with the LTER All-Scientists Meeting in
Snowbird, Utah, in August 2000. At
that point the results of the modeling experiments will be presented to the
larger LTER community. Through this process we hope to encourage members of
the ecology community to take full advantage of the computational resources
available to researchers, to demonstrate the value of looking at
regionalization, and to strengthen the ties between the ecological and museum
collections community.
4.
Knowledge
and Distributed Intelligence (KDI):
SDSC is involved with two KDI projects, both of which are aimed at producing an
infrastructure for distributed data handling at the interface of ecology and
systematics. The first, Knowledge Networking of Biodiversity Information
(NWBI), led by the University of Kansas, will develop high-performance network
access to biological collection and classification databases through the
implementation of international information retrieval protocols and metadata
profiles, partly in collaboration with the U.S. Integrated Taxonomic
Information System (ITIS). The project will also integrate museum databases
with other earth systems sciences data, such as terrain and climate
information, for predictive modeling and visualization of species distributions
and related biodiversity phenomena. Finally, NWBI will initiate testbed
research projects to assess the ecosystem function of biodiversity and the
pattern and processes governing decline in amphibian populations worldwide.
The
other KDI—A Knowledge Network for Biocomplexity: Building and Evaluating a
Metadata-based Framework for Integrating Heterogeneous Scientific Data (led by
the National Center for Ecological Analysis and Synthesis (NCEAS, http://www.nceas.ucsb.edu) and the LTER
Network Office)—will integrate the distributed and heterogeneous information
sources necessary to develop and test theories in ecology and its sister
fields, into a standards-based, open architecture, knowledge network. Drawing
on recent advances in metadata representation, the network will provide
conceptually-sophisticated access to integrated data products acquired from
distributed, autonomous data repositories. It will also include advanced tools
for exploring complex data sets from which hypotheses can be tested.
Partnership for Biodiversity Informatics:
Given
shared vision of the grand challenge of environmental informatics, NCEAS, the
LTER Network Office, the University of Kansas Natural History Museum and
Biodiversity Research Center (KUNHM), and SDSC have formed a Partnership for
Biodiversity Informatics (PBI). The mission of PBI is to conduct informatics research (i.e., the study and application of
information technology) to advance knowledge discovery in systematics, ecology,
and biodiversity. To accomplish this mission, the PBI will:
¡¤
Enable research at the interface between
ecology and systematics;
¡¤
Enhance knowledge discovery through the
management, sharing, and integration of data and information;
¡¤
Demonstrate the value of collaborative
research in a shared data environment; and
¡¤
Expand access to information and
knowledge for research, resource management, policy decision making, and
education.
While
this Partnership has just been formed, we see limitless potential to bring together
the four founding organization¡¯s expertise and begin taking positive steps
toward addressing the grand challenge of the 21st century.
Summary
Harnessing
knowledge of earth¡¯s biological and ecological diversity and how it shapes
global environmental systems is a compelling problem. To address it, SDSC has
developed tools that are now in use and has established key partnerships to
conduct research, develop infrastructure, and train individuals through a
multidisciplinary approach. We
invite comments on our method, and would be interested in pursuing additional
collaborations on tool and infrastructure development, and research using our
tools. Possible pursuits involving environmental informatics include mapping
and monitoring the environment, products and productivity from the environment,
and avoidance and mitigation of natural disasters.
Acknowledgements
We
wish to thank members of the Partnership for Biodiversity Informatics for their
insight, in particular Jim Reichman, Leonard Krishtalka, Jim Beach, and Bob
Waide. In addition, we wish to thank Amy Finley for her editing which led to an
improved paper. We have been supported in part by NSF ACI 9619020 (PA and JH),
NSF DBI 9873021 and NSF DBI 9808739 (DS), and NSF DEB 9634135.
References
Finley,
A. Biocomplexity and the Art of Planetary Management, Envision, Jan - March,
1999.
Interim
Report: Environmental Science and Engineering for the 21st Century:
The Role of the National Science Foundation (NSB 99-133). July 29, 1999.
J.
Helly, T. Elvins, D. Sutton, D. Martinez, S. Miller, S. Pickett, A. M. Ellison,
Controlled Publication of Digital Scientific Data. (1999).
J.
Helly, T. Elvins, D. Sutton and D. Martinez, A Method for Interoperable Digital
Libraries and Data Repositories, Future Generation Computer Systems, Elsevier,
in press (1999).
J.Helly,
S. Levin, W. Michener, F. Davis, T. Case, The state of Computational Ecology,
SDSC, 1996.
Jasanoff
et al. (19 December) 1998. Conversations with the Community: AAAS at the
Millenium. Science 278 2066.
Kaiser,
J. Searching Museums from your Desktop Science
284: 888 (July, 99)
K.
Gross, E. Allen, C. Bledsoe, R. Colwell, P. Dayton, M. Dethier, J. Helly, R.
Holt, N. Morin, W. Michener, S.T. Picket and S. Stafford, Report of the
Committee on the Future of Long-term Ecological Data (FLED), Ecological Society
of America.
Michener
W.K. 1996 Environmental Informatics: Methodology and Applications of
Environmental Information Processing, Ecology (in book reviews), 77:4 p1309(2)
Peterson A.T., J. Soberón, V. Sánchez-Cordero. Conservatism of Ecological Niches in
Evolutionary Time Science 285: 1265-1267 (Aug
20, 99).
President¡¯s
Committee of Advisors on Science and Technology (PCAST). Teaming with Life:
Investing in Science to Understand and Use America¡¯s Living Capital, March,
1998.
Raven,
P, and Wilson, E.O. (13 November) 1992. A Fifty-year Plan for Biodiversity
Surveys. Science 258, 1099.
Stockwell,
D.R.B. 1999 Genetic Algorithms II. Pp 123-144 in A.H. Fielding (ed) Machine Learning Methods for Ecological
Applications. Kluwer Academic Publishers, Boston.
Stockwell
D.R.B. and D. Peters 1999. The GARP Modeling System: problems and solutions to
automated spatial prediction. International
Journal of Geographical Information Science 13:2 143-158.
Systematics
Agenda 2000, 1994. Charting the Biosphere. (¡°Technical Report¡±). Published by
SA2000, Department of Ornithology, American Museum of Natural History, New
York, NY 10024. 33pp
Wilson, E.O. (27 March) 1998. Integrated Science and the
Coming Center of the Environment. Science 279, 2048.
[RW1] I would also put in here somewhere a statement about the development of new research tools that will enhance the arsenal that ecologists have to conduct their science. I'm specifically thinking about things like cutting edge physical, chemical, microbiological and biochemical tools.
[RW2] in addition, many of these tools are only now being developed (eg.., visualization) and its vital that we bring these tools online n the ecological sciences as soon as possible.
[RW3]Peter I think this particular section on education needs to be expanded. I think we need to do more than say that education is important. I think what we need to do instead is to attempts to make the same kind of advances in education especially interdisciplinary education that we are going to make in our interdisciplinary science.