Abstract
The Multiple Instance Regression
(MIR) problem arises when a data set is a collection of bags, where each bag
contains multiple instances sharing the identical real-valued label. The goal is
to train a regression model that can accurately predict the label of an
unlabeled bag. Many remote sensing applications can be studied within this
setting. We propose a novel probabilistic framework for MIR that represents bag
labels with a mixture model. It is based on an assumption that each bag contains
the prime instance which is responsible for the bag label. An
expectation-maximization algorithm is proposed to maximize the likelihood of the
mixture model. The mixture model MIR framework is quite flexible and several
existing MIR algorithms can be described as its special cases. The proposed
algorithms were evaluated on synthetic data and remote sensing data for aerosol
retrieval and crop yield prediction. The results show that the proposed MIR
algorithms achieve higher accuracy than the previous
state-of-the-art.
The datasets used in the experiments is
available for download.