Chemistry; Computer Science
International Journal of Intelligent Systems
Atmospheric aerosols, ATOFMS, association rules, Database Systems, Data Mining, query optimization, classification, clustering, frequent itemsets, mass spectra, multi-step mining, regression, subset mining, support vector machines
Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain.
Carleton College does not own the copyright to this work and the work is available through the Carleton College Library following the original publisher's policies regarding self-archiving. For more information on the copyright status of this work, refer to the current copyright holder.
Yes (with publisher permission, 0-24 month embaro depending on journal)
Publisher PDF Archiving
Paid OA Option
Ramakrishnan, Raghu, James J. Schauer, Lei Chen, Zheng Huang, Martin M. Shafer, Deborah S. Gross, and David R. Musicant., "The EDAM Project: Mining Atmospheric Aerosol Datasets". International Journal of Intelligent Systems. 2005, 20,7 759-787. Accessed via Faculty Work. Chemistry. Carleton Digital Commons. https://digitalcommons.carleton.edu/chem_faculty/5.