Carleton Author

Gross, Deborah S.; Musicant, David R.

Department

Chemistry; Computer Science

Journal Title

International Journal of Intelligent Systems

Publication Date

2005

Volume

20

Issue

7

First Page

759

Publisher

Wiley-Blackwell

Last Page

787

File Name

058_Gross-Deborah_TheEDAMProject.pdf

Keywords

Atmospheric aerosols, ATOFMS, association rules, Database Systems, Data Mining, query optimization, classification, clustering, frequent itemsets, mass spectra, multi-step mining, regression, subset mining, support vector machines

Abstract

Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain.

Rights Management

Carleton College does not own the copyright to this work and the work is available through the Carleton College Library following the original publisher's policies regarding self-archiving. For more information on the copyright status of this work, refer to the current copyright holder.

RoMEO Color

Yellow

Preprint Archiving

Yes

Postprint Archiving

Yes (with publisher permission, 0-24 month embaro depending on journal)

Publisher PDF Archiving

No

Contributing Organization

Carleton College

Type

Article

Format

application/pdf

Language

English

Share

COinS