An Adaptive Index Structure for High-Dimensional
Similarity Search
P. Wu, B. S. Manjunath and S.
Chandrasekaran
Department of Electrical and Computer Engineering
University of California, Santa Barbara, CA 93106-9560
{peng, manj, shiv}@ece.ucsb.edu
Abstract
A practical method for
creating a high dimensional index structure that adapts
to the data distribution and scales well with the database size, is
presented. Typical
media descriptors, such as texture features, are high dimensional and are
not uniformly distributed in the feature space. The performance of many
existing
methods degrade if the data is not uniformly distributed. The proposed
method
offers an efficient solution to this problem. First, the data¡¯s marginal
distribution
along each dimension is characterized using a Gaussian mixture model.
The parameters of this model are estimated using the well known
Expectation-Maximization
(EM) method. These model parameters can also be estimated
sequentially for on-line updating. Using the marginal distribution
information,
each of the data dimensions can be partitioned such that each bin contains
approximately an equal number of objects. Experimental results on a real
image
texture data set are presented. Comparisons with existing techniques, such
as
the well known VA-File, demonstrate a significant overall improvement.
