The objective of this project is to develop a system for knowledge discovery and local models integration without exchange of confidential and proprietary information in large spatial and spatial-temporal databases.
To achieve this objective, we have developed and tested novel exploratory data analysis methods for spatial data; developed machine learning algorithms for building and selectively applying multiple expert modules; tested model development and prediction capabilities using multiple non-centralized data sets; and prototyped a software package for knowledge discovery from spatial data.
A procedure for evaluating spatial sampling techniques in terms of sampling cost and interpolated feature accuracy was developed in our lab and applied to modify grid sampling as to achieve similar expected accuracy with twice less data (vucetic00j).
Exploratory data analysis
A spatial data partitioning procedures was developed for training and testing spatial regression methods (vucetic99). For heterogeneous spatial databases with unstable driving attributes (typical in earth sciences) an adaptive and spatial attribute boosting algorithm is proposed as an effective technique for increasing modeling accuracy through manipulating training data distributions (lazarevic00b).
For identifying more homogeneous sub-fields and designing corresponding expert models we have developed data partitioning methods based on spatial clustering (lazarevic99), sequential development of local regressors and the corresponding data distribution models (pokrajac99), and an iterative data partitioning using spatial error analysis (vucetic00c). All of the multiple expert approaches have resulted in better prediction than a single global model when tested on real-life agricultural data. Also, data partitioning and local regression algorithms were successfully adopted to a distributed environment where data mining is restricted to exchange of local models and essential statistics without raw data communication (lazarevic00c).
To fully characterize our knowledge discovery algorithms for a large distributed system, we have developed a spatial data simulator which generates feature layers statistically similar to real spatial data and computes a target layer according to previously observed rules and expert knowledge (pokrajac00ja). This is employed for analyzing the influence of sensor error, unexplained variance, sampling density and data distribution on spatial data prediction quality in precision agriculture (pokrajac00).
Currently, we are developing a data mining software package that integrates our algorithms for spatial and distributed data inspection, preprocessing, and partitioning into an easy-to-use toolbox (lazarevic00).