Black Box Approach to Image Feature Manipulation used by Visual Information Retrieval Engines
Kshitij Shah
Amit Sheth
Srilekha Mudumbai
Large Scale Distributed Information Systems (LSDIS) Lab
Department of Computer Science
University of Georgia
415 Graduate Studies Research Center
Athens, GA, USA 30602-7404
Phone: (706) 542-2310


The Zebra image access system of the VisualHarness platform for managing heterogeneous data supports three types of access to distributed image repositories: keyword based, attribute based, and image content based. The Image based access component (IBAC) of Zebra supports the last type of access based on computable image properties such as those based on spatial domain, frequency domain or statistical and structural analysis. However, it uses a novel black box approach of utilizing a Visual Information Retrieval (VIR) engine to compute corresponding metadata that is then independently managed in a relational database to provide query processing involving image features and information correlation. That is, we overcome the difficulties in using the feature vectors that are proprietary to a VIR engine, as we do not require any knowledge of the internal representation or format of the image feature used by a VIR engine. IBAC also gives the user an option of combining any of the image properties. Moreover a user can assign different weights (relative importance) to each of the image properties so that the query results can be returned to the user depending on the weights assigned for different properties. Tests focusing on the quality of the results obtained using the black box approach, when compared to the results obtained by the VIR used by the black box approach, are encouraging. These are briefly presented and are also accessible on the Web-accessible prototype system.


Copyright 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. 


Multimedia computing has emerged in the last few years as a major area of research. Digital Imagery, is becoming an important component of computer and telecommunication usage. Image data plays an important role in many diverse fields such as health care, advertising, meteorology, etc. With the increased demand, use, and availability of image information many diverse systems became available for managing and retrieving image data.

In 1951 Calvin Moores mentioned "Information Retrieval embraces the intellectual aspects of the description of information and its specification for search, and also whatever systems, techniques, or machines that are employed to carry out the operation."[18]. He was referring to textual information retrieval. The same notion is now extended to non textual, visual, information[9]. Traditional search systems employ information retrieval techniques for searching only on the textual data. With the increased demand, use and availability of image information, search systems are expected to support searching on image data also. Again, most users prefer to search images by what it contains rather than trying to give a keyword description associated with the image. Search of images by just the keywords limits the querying capabilities. In a traditional DBMS, an image is treated as a file name or a binary large object (BLOB). This can be useful only for display purposes, but not for describing the image. An image being more expressive than thousands of words also implies that there are many features associated with an image. Hence it would be natural to expect the description of the image in different ways by different people and a need to retrieve images by their content features.

Human visual system can recognize an infinite number of shapes, colors, patterns, textures, objects and backgrounds. It is still not understood how a human visual mechanism breaks up an image into a set of features - like color, edges, contours, textures, objects etc. These individual features are together called visual information. The technology used to retrieve images based on the visual information is called Visual Information Retrieval (VIR).

The features of an image can always be extracted can be expressed in as little as 1Kb or 2Kb, regardless of the original image size[10]. Extraction of features from an image facilitates a content-based searching approach, which lets users express similarity queries that literally mean 'Give me all pictures that look like this'. Typical image features involve spatial content of an image, the knowledge of which is especially important in medical and geographic applications. Several approaches have been taken to retrieve images by content (Section 7 discusses the related work in this area). Query by content transforms the user query into image primitives. Retrieval is done for images whose primitives lie within the allowable range of primitives of the input image. By primitives we mean the interior properties of the image. Many VIR engines are available for extracting the content of an image such as Virage[24], QBIC[29], Photobook[23]. Each of these differs in the set of features they extract from an image for calculating similarities. They also differ in the representation that they use internally (unknown to the outside world) for storing the features.

We propose a black-box approach (BBA) to use these feature representations without knowing the internals of the VIR engine or the format in which these features are generated, stored or manipulated by the engine. Many engines provide access to routines, which would give a similarity measure between any two images, but there is no way to access the individual features extracted from an image. This means that for similarity queries all images in the repository need to be compared with the user query image for ranking at run-time. This approach is not very scalable and is computationally very expensive unless we find a cost effective approach. This is hard to do without knowing the features extracted from the images, especially for proprietary systems.

Since image features for individual image properties involve different computations, they belong to different topological spaces, and there is a distance metric defined for each of the image features in that topological space. Hence, finding the similarity in any image property would require a ranking of distances between the query image features and the other image features in that same space. Furthermore, we can assign weights to each property so that we can combine the individual distance metrics to a composite metric to find out the similarity.

Our BBA can now be explained as follows. Because a VIR engine would compute differences for an image with other images based on its features in different topological spaces, BBA involves comparing each image with a null image, say an entirely black or an entirely white image. Then we store its basic property value in that space under the appropriate metadata attribute. This is done at a pre-processing stage rather than as a run-time computation. These values are then normalized according to the weights assigned for each image property to give proper matches for the given image. We are basically pre-computing the "distance" of each image from a null image for all the image properties that we may be interested in. At run-time we compute the feature distances between the query image and an appropriate null image and retrieve images with close corresponding values. This is very efficient since all the pre-computed values can be stored in a database and the run-time image query is simply translated into a database query. One of the interesting research issues we address in this paper is, what is a (good) null image?
With BBA we can employ various weighting strategies to combine the distance metrics. We can also try and combine features computed using different engines since we are using normalized distances.

Existing approaches to image retrieval have limited query capabilities. The limitation can be overcome by choosing the proper metadata for querying the images. Metadata can be defined at the pixel-level, semantic level or the knowledge level or the domain level (for image content which can be used for image retrievals for user queries. With the above definition of metadata, the Image Based Access Component (IBAC) supports image retrieval by its content using BBA.

IBAC is complemented the keyword based and attribute based access in the Zebra system [19]. Zebra is the first extension to the InfoHarness (IH) in evolving it to what we now call the VisualHarness system. IH is an integration system, platform and toolset, which incorporates the above components to facilitate searching for heterogeneous information in distributed repositories without restructuring, reformatting or relocating the original information. Future work involve development of the second visual component to support digital video.

IH uses a relational database management system. IBAC has been implemented as a part of the search system for image data to demonstrate the metadata-based approach for retrieving the images by content through BBA discussed in this paper. The search system uses image extractors to extract the content independent and content dependent metadata based on which a query can be processed. The metadata extracted from the objects includes, among other things, the location and type of the object. Data about the relationships between metadata objects is created and stored along with the metadata itself. These objects are stored in the metabase used the IH to manage the metadata of heterogeneous data stored in distributed data repositories. The clients for the IH are browsers. Users can traverse the structure of the system through the Web interface, view the underlying data objects etc.

Issues related to metadata classification and storage are discussed in section 2. Issues concerning the VisualHarness system and its components are discussed in section 3. Section 4 concentrates more on the image analysis and comparison and discusses about the BBA. Query processing and image data management are discussed in section 5. Section 6 refers to the results obtained using the BBA. Related work in the context related to content based retrieval is discussed in section 7. A working prototype of the BBA has been implemented and is available for demonstration [cf.]. We are currently experimenting with the results in terms of quality and efficiency.


In this section we classify different kinds of metadata that are applicable to image data. Metadata denotes data about the data and hence is considered to support a higher level of abstraction over the raw data or the data managed in a database or repository. The concept of metadata is useful in extracting the content of the raw information irrespective of their media types, including multimedia. Metadata can be classified based on their dependency on the documents to the data extent or the information extent [15].

The same information can be stored in different formats and in different physical locations. Metadata should be capable of handling this heterogeneous information by differentiating between the data type and the media type. We discuss a metadata classification in Section 2.1. Issues of metadata extraction and storage are discussed in Section 2.2.


In this section, we discuss the different types of metadata used by researchers involving image data. For more information on the general classification applicable to different media types, refer sec 3.1 in [15]. The classification depends on whether we refer to just the data or the information content of different media documents or objects. The generalized classification of metadata is:

Visual Information Retrieval (VIR) engines generally use three representation layers for image data: Image properties are computed in spatial domain that refers to the color arrangement, in frequency domain that refers to sharp spectral peaks and by statistical analysis that refers to random texture. These properties can be computed globally for the entire image or locally for portions of an image. For each image property such as color, texture, composition and shape [cf. sec 4.1.2], a number of features are computed. These image features refer to the content of the image extracted.

Media data can be heterogeneous. Even though images have different structures and formats, they all have similar content that can be extracted. Metadata extracted refers to the user's features of interest from the raw image.

In a Visual Information Retrieval (VIR) engine [10], images have to be preprocessed to improve smoothness and contrast, so that different extraction routines can be run on them depending on the number of primitives of interest. A data set is computed from each of these extraction routines for the primitives of interest. After the data sets for different extraction routines are computed, a vector of the computed primitive data is stored in a proprietary data structure. Since this proprietary data structure is not available/accessible in its original form for the outside world, we use BBA [cf. Section 4.2.3], to extract and store the required values from a VIR as metadata.

Figure 2.1 shows how we store the extracted metadata in the metabase, a database of metadata. Image properties required for all domains in the distributed repositories are stored in the metabase. The metadata values are pre computed using extractors [cf. Section 4.2.3]. For example, if color, composition, texture and structure are the image properties to be computed [cf. Section 4.1.2], then the extractor would compare each image with a null image [cf. Section 4], compute the values for the above properties and stores them under appropriate metadata in the metabase table, managed by a relational DBMS. During runtime, the user query would be translated into a database query with appropriate metadata values. The storage is efficient in a database since we are restricted to storing a few supporting values for an image instead of storing the image itself. This would necessarily meet the needs of the exponential growth of the image data.

IH stores each image as an object (IHO) with an OID (object identification) and the property values extracted for that image etc. (see Figure 2.2). IH stores all the objects that are logically related as a collection called the IH collection. There may be different collections for different domains or collections within a collection for the same domain. IH also stores the relationships among the collections as a part of metadata such as parent-child relationship, etc.


This section describes an architecture for image data management in distributed repositories.

The VisualHarness System

The exponential growth of available digital information makes it difficult to know about its existence, location and means of retrieval. The VisualHarness system is aimed at providing rapid access to huge amounts of heterogeneous image data available over the World-Wide Web. It uses the components of the IH platform that supports textual data, with extensions to deal with visual data, such as the Zebra system for supporting image data. The metabase consists of indices (e.g., full text index for textual data, shape based index for image data), attribute based metadata to which we limit our focus in this paper, and in future run-time metadata extractors. VisualHarness system maintains metadata about the information space without restructuring, reformatting or relocating the original information enabling access to information by logical units of interest. An object-oriented layer (using IHOs) supports logical structuring of the metadata objects and thus allows arbitrary relationships amongst the represented information artifacts. In this paper, we also discuss about using different VIR engines and/or combining their search strategies. VisualHarness is built using the IH integration platform. The VisualHarness system architecture is open and extensible and provides hooks using which the third party indexing engines for textual data, and third party VIR engines for image content based access.

Query processing in the VisualHarness system is performed as follows (see Figure 3.1). The IH server accepts a user query as a client request from a browser. Query Engine module of the Query Processing Unit (QPU) creates subrequests for the relevant search components. The search components use metadata (whether precomputed and stored in metabase or computed at run-time) to determine references to the relevant data and provide them to the result composition module of the QPU, which performs normalization, rescaling and formatting of the result. The result is then displayed to the user by the IH server. When the user selects one or more data objects to be displayed, the IH server accesses the appropriate repositories directly to retrieve data. Figure 3.1 shows the component level architecture of the VisualHarness system.  

Let us now focus attention to the access of image data using IBAC. IBAC is designed to provide a management system for image data envisioned in the metadata classification given in Section 2.1. The IBAC uses the VIR engine as a black box to extract image features in the form of distance metrics. Distance metric is computed for each property of an image as shown in Figure 2.1, by comparing each image in different repositories against a null image [cf. Section 4], classified in that topological space.

Query Processing Unit (QPU)

The basic function of the query processing unit is to formulate the user queries for retrieving the image data. It transforms the user query into a database query that can be used to locate the data in the database system. The query processing unit is responsible for computing the results with the property weights the end user provides, normalize the results and scale them before sending it to the server. For example, the result R would be a combination of all the image properties with their individual weights.

where i1, i2, .., in are the weights assigned for different image properties P1, P2, .., Pn.

A generic query can be

The generic query involves all image properties. This query will be directed to IBAC which would scale and normalize its individual results according to the user assigned weights. We also present a detailed description of query processing and the results to gain knowledge about IBAC and its importance in Sections 5 and 6.

Image Based Access Component (IBAC)

Image based component depends on the computable properties of an image. One of the fundamental challenges in image retrieval is choosing an appropriate set of features. For content based retrieval a set of features that concisely represent the entire image are needed. Several VIR engines that are available for supporting content-based retrieval. Each of these differ in the set of features they extract from an image for computing similarities. They also differ in the representation that they use internally for storing the features. Many applications need to use these feature representations for their own purposes. The proposed black-box approach use these feature representations without knowing the internals of the VIR engine or the format in which these features are generated.

Many engines provide access to routines which would give a similarity measure between any two images but there may be no way to access or interpret the individual features extracted from an image. Since the engine would compute differences for an image with the other images based on its features in different topological spaces, the BBA involves comparing each image with a null image [see Figure 2.1], which is a featureless image in a topological space, such as an entirely black or an entirely white image with respect to the color property. Then we store the property value in the database under appropriate metadata. This is done as a pre-processing stage as opposed to run-time computation.
Because the space in which the features of the image can be defined is multidimensional, features can be computed in different dimensions. This requires the combination of individual distance metrics into a composite metric using a method of weighted contributions [cf. Section 5.1.4].

IBAC is responsible for normalizing and scaling the output according to the weights assigned for each of the properties of an image by the end user. KJS Again, the property weights assigned by the user indicates the extent to which a given property should be matched for the similarity retrieval. These results are scaled by the QPU according to the weight assigned for the image based search. The run time query is a simple database query as defined under the generic query in the paragraph under query processing.

With the BBA [cf. Section 4.2.3], we can hook any VIR into the search system and also analyze the combination of different VIRs and get the better quality results for the end user.


While image analysis plays a major part in finding out the feature vectors of an image to facilitate the content-based search, comparing the images and managing the extracted content are equally important for information retrieval. This section deals with the issues of computing the feature values required for the image data, comparing images based on the computed values, and efficient storage of these computed values.


During the analysis phase, an image undergoes a lot of preprocessing, such as smoothing and contrast enhancement so that the image is ready for different extraction routines. The extraction routines differ depending on the primitives they extract. Each extraction routine takes the input image and computes a data set called feature vectors of the image. The feature vectors are proprietary to the VIR engine and hence the number of image properties supported depends on the VIR engine used by the search system. The search system is responsible for managing and retrieving the images based on similarities with these property values. We do not perform any image processing ourselves (at least for the features adequately supported by the VIRs we use) but use the information returned by the image processing routines that make up the VIR engines.


Most VIR engines support matching images according to the color, shape, texture and relative position. For matching based on color, a typical choice would be color histograms; for shape-matching, the turning angle [12] (the moments [7] or the pattern spectrum [14] are among the choices); for texture, the directionality, granularity and contrast [5] are a few perceptual properties considered; and for relative position, the 2-D strings method and its variants have been used [4, 20]. Among the several systems that are available, a system that can support fast searching capabilities with the ability to retrieve accurate results is preferred. In this paper we discuss our use of one of the VIR engines (see Section 7.1) and how we incorporate it in our search system using the BBA (see Section 4.2.3).  


The computable image properties depend on the domain knowledge of the images. We currently use four properties-color, color composition, texture, and structure- that are supported by Virage?s VIR that we use. An excellent discussion of these properties appear in [9]. Briefly, the color property refers to global distribution of colors and allows answering of questions such as show me all the pictures that have dominant orange, red and yellow colors as in a sunset. The color composition property refers to local distribution of colors and allows answering of questions such as show me all pictures that have red color on the top and bottom with the white color at the center such as flags. The texture property refers to recognition of patterns based on spectral peaks etc. and allows answering of queries such as show me all images with brick patterns. The structure property refers to analyzing the shapes available in the image based on segmented regions of the same color and allows answering of queries such as show me all pictures with two red circles close to the center on either side of the image.


This section discusses how the images are compared by VIRs, how the weights assigned by the user helps in ranking the desired results, and how we apply the BBA on top of the VIR to facilitate the information retrieval without knowing the internals of the VIR.


Several applications involve image data and a typical content query for medical image data would be find X-rays that contain objects with textures similar to the texture of a tumor. A distance function between the two objects is generally used to determine the similarity of those objects that could be their Euclidean distance in the feature space (sum of squared differences).

Similarity of images can be based on the whole pattern match or other matches. A whole pattern matching would be Given a set of objects O1, O2, O3, ...., On and a query object Q, we want to find those data objects that are within distance e from Q.

Pattern matching is done by the VIR engines that adopt different matching strategies. An ideal pattern matching should satisfy the following requirements

For similarity measures, a distance function is defined which would compute the distance between different objects in the feature space. Any VIR engine would compute image features associated with the image objects and map it into a feature space and using the distance function we can find out the similarity between images for a given threshold e as shown in figure 4.1.


Again the retrieval techniques would be different for different VIR engines. For example retrieval may involve the usage of Sequential Access Methods (SAMs) to achieve faster retrieval [21] than the sequential access.

The access method mentioned above would apply to any of the Visual Information Retrieval (VIR) engines assuming knowledge about and access to the feature vectors of the image objects in the database. For systems that do not know about the internals of the VIR engine this access method might not be applicable. An alternative, in cases where we either do not have access to the actual feature vectors or have no way of interpreting them, is the BBA. In this we try to compare the objects based on their differences with a null image[cf. Figures 2.1,4.1] rather than a direct comparison between the objects themselves.
Feature vectors from an image refer to the features extracted from different topological spaces as shown in Figure 11. Distances between the objects and the input query object are required in order to obtain a ranking of the objects similar to the given query object. The BBA uses a null image, e.g., an entirely black or an entirely white image which do not have any specific feature and hence the properties of their own. If N is a null image and the objects in the database are O1, O2, ..., On then the feature distance

i.e. the distance between any two objects O1 and O2 in the feature space would be equal to the absolute value of the difference between each object compared with the null image for a particular property. The emphasis is on analyzing what would be a good null image so as to obtain accurate results compatible with the ones obtained from VIR engines without involving the null image. This approach is very scalable since the information retrieval is not limited to a particular VIR engine. Any engine can be hooked up into our search system. Run time computation is not expensive as we pre-compute the distance between each object and the null image for each of its property and store it in a database. Run time computation basically involves retrieving the appropriate results from the database by converting the user query image, Q, into a database query D(Q,N). Without this approach, we have to compute the distance between the query image and each image object in the databases during runtime in a sequential manner, which would be computationally very expensive. With this approach we can also employ different weighting strategies to combine the distances obtained in comparing each object with the null image in that topological space. We can also try and combine features computed using different engines since we are using normalized distances. We discuss more about weighting strategies in section 5.



IBAC supports image queries using BBA for image data retrieval from distributed repositories. The basic function of this query-processing unit lies in transforming the user query into a database query to locate the data using the database system. While transforming the user query into a database query, QPU is responsible for taking care of weighting strategies assigned to different properties involved in the search system.  

Image management itself demands significant research in optimizing the data available in distributed repositories, assigning weighting strategies for the retrievals and giving them specific rankings, and performing efficient runtime computing. Images are not stored in the database. During runtime the IH knows where the images are and how to fetch them to the user.
In this section, we provide details for data interactions, scaling and normalizing of the results as a part of query processing, and data optimization and computing scores and weights as a part of image management.

Data interactions mainly deal with the insertion and retrieval operations of the metadata extracted from the image data for different image properties. Data optimization, scoring and weighting strategies, and normalizing and scaling based on the scores and weights are also discussed in this section as part of data interactions.

The primary function of the insertion module is to facilitate the acquisition of the data model (properties which represent an image) from the image data. These image properties are managed as metadata within the VisualHarness system as explained earlier. The VisualHarness extractor plays a critical role in determining the metadata attributes which will be utilized for browsing, searching, and retrieval. The space overhead associated with the insertion of the values of this content-dependent metadata into the metabase is quite small since the data itself is not managed within the system. All metadata values, i.e. image features, are pre-computed [cf. Section 4.2.3] and stored in the metabase. This eliminates the need for runtime image processing or sequential scanning of the encapsulated image objects. This also eliminates the need for any external indexing which might be needed for keeping track of the image property values.
Extractor: The extractor is a three-step process involving three major blocks namely the Null Image Classifier, Image Feature Extractor and Image Classifier (see Figure 5.1).

Null Image Classifier: The Null Image Classifier unit is responsible for the selection of a suitable null image. The null image is used for computing the distance metrics based on the image features associated with the image. This distance (from the null image) is subsequently treated and managed as the object metadata of the image being inserted. Distance metrics refer to the degree of dissimilarity between two objects in a given topological space. Currently, the null image classifier uses an entirely black or an entirely white image as a null image. Such a null image does not have any property of its own. Domain specific information can also be used to determine an appropriate null image for a specific domain. Thus 'what is a good null image?' still represents an open research problem.

Image Feature Extractor: This unit extracts image features from the image data. The features would be the image properties supported by the integrated VIR engine(s). Depending on the nature of the VIR engine, the features extracted may vary. In our approach, we depend on the VIR engine for retrieving all the features using its own proprietary image processing algorithms. Hence the feature extraction is automatic.  

Image Classifier: As discussed earlier, our approach is very scalable with respect to attaching and using different VIR engines. Different VIRs may extract different image features for different domains. This unit is responsible for classifying the image features extracted by the Image Feature Extractor unit using different VIRs and considering the best features suitable for a given application. This unit takes the distance metrics between a given object and the null object obtained from the Null Image Classifier by comparing each property of the image with the corresponding property of the null image and stores it in the database.


Storage of extracted content from an image, i.e., feature vectors, in a file system is not very efficient since indexing and updating becomes difficult. A good optimizing strategy should be used so that the retrievals are faster while maintaining the quality of the results. In our proposed BBA, we compute the distance metrics between an image and the null image for each of its properties[cf. Figure 2.1] that are considered and store it in the database with the metadata as the property and the distance metric being its value. In this way, storage is optimized. This facilitates faster retrieval by passing the client requests as a database query. For a given input image, its property values are retrieved from the database and a database query is constructed with these metadata values to retrieve images that have values similar to the given input image within a tolerance e from the database.


Weighting strategies are employed to provide a scalable approach. By scalable approach, we mean that a user can assign different weights to different properties on which the similarity is based. Information retrieval from the database is restricted according to the user assigned weights. For example, if we have three properties say P1, P2 and P3 supported by the VIR engine, then the user can assign different weights i1, i2 and i3 to each of these properties so that the retrieval is based on

The resulting values are normalized and scaled in order to give a ranking to each of the objects retrieved from the database.


Property weights refer to the user weights assigned for different properties of the image via the user interface. The property weights vary between 0 and 1.0. If O1, O2,..,On are the objects in the image database, P1, P2,..,Pn are the different properties supported for an object Oi and Q is the input query object, the score S, obtained for each retrieved object Oi for the user assigned property weights i1, i2, i3 and i4 would be

where z1 = abs(P1 value of Oi - P1 value of Q) and similarly for z2, z3 and z4. Scaling of property weights is done by multiplying the property weights into the appropriate difference in the property values z1, z2, z3 and z4. The normalization is performed by giving the highest ranking with value 1.0 to the object that has the highest score. For all other objects retrieved, we normalize them by dividing the score of that object by the score of the highest ranked object (prior to giving it the value of 1.0). This gives the overall ranking of the objects that are retrieved from the image database.



As discussed earlier under section 4, runtime computation would be significantly more expensive if we have to perform a sequential scanning by computing the distance of each of the objects in the database against the input query object at run time. For this reason, we pre-compute the distance metrics for each of the image properties and store them in the metabase. For the input query object Q, we retrieve the property values of Q say Q(P1), Q(P2) and Q(P3) from the metabase and build a database query according to the weights assigned by the user and within the tolerance e.

A sample database query constructed by this way would be

where w1, w2 and w3 are the user assigned weights for properties P1, P2 and P3, O1,O2, .., On are the objects stored in the database. Runtime computation is quite efficient since the information retrieval is based on a SQL query.


The first image in the sequence is the input query image. Results have been tested on the four properties of the image- color, composition, texture and structure. So far, the results have been promising and additional work with larger image repositories is in progress. The null image used for achieving the following results is a full white image. As we mentioned earlier, open research question remains on identifying a better null image for the BBA to achieve more competitive results compared to those obtained from an usual VIR engine without the black box.

To compare our results with the VIR we calculate a hit ratio which is the proportion overlap between the result set returned by VisualHarness and the VIR for n top hits. If A is the set of top n hits returned by VisualHarness and B is the set of top n hits returned by the VIR (Virage in this case) then:

color (HR = 91.7%)
Figure 7.1

color (HR = 75%)
Figure 7.2

color (HR = 100%)
Figure 7.3

color (HR = 91.7%)
Figure 7.4

color (HR = 91.7%)
Figure 7.5

color (HR = 83.3%)
Figure 7.6

In this section we discuss the related work conducted by different research groups in this area and the way our approach differs from others. A perspective of related work in the context of content based retrieval is discussed here. A general comparison between web-based search engines and VisualHarness system is also discussed.

In the related work with respect to content based retrieval, we discuss about Virage's VIR, MIT Media lab's Photobook, QBIC and other research groups. The VisualHarness system reported in this paper Virage's VIR technology for computing the image properties.


The VIR technology of Virage [10] supports a rich feature set for image retrieval with a higher level of accuracy. Currently, they support various image properties such as color, texture, structure, and composition for retrieval on image collections based on the visual similarity.

Images are dealt in four different layers: the Image Representation layer, the Image Object Layer, the Domain Object Layer and the Domain Event Layer. The first three layers form the basic content of an image. The internal properties (primitives) of the image are computed from the feature set. VIR also computes the distance metrics between the objects in the feature space from their feature set as a floating point number between 0 and 100. Similarity of images may be recomputed depending on the property weights assigned by the user, for the properties supported by VIR.

The core module of Virage technology is the Virage Engine. This deals with the functions of image analysis, comparison and management. The capabilities of VIR engine include a base Engine and an extensible VIR Image Engine. The base Engine provides the constant set of properties (color, texture, structure and composition) as e have discussed earlier. The extensible VIR Engine basically improves the scalability of the VIR Engine. It extends the capability of the engine to add more properties in addition to the constant set available, without having to make a lot of changes in the existing system.
Virage provides a Command Line Interface (VCLI) which aids in creating applications that can work with other applications or systems. VCLI offers many functionalities. It allows user to analyze, adjust weights, and perform searches. It also allows user to create a database that can be saved and loaded to the memory when required. It supports an interactive and a non-interactive mode for the users for loading and querying the database. VCLI is invoked at the local site listening to a specific port and is made available over a network. Client requests from different sites are forwarded as VCLI calls to the server running the software. Because, VCLI loads the images in the database into memory at run-time, the image retrieval is comparably faster.

The Media Laboratory

MIT's media labs has a Photobook system [23] which contains a set of interactive tools for browsing and searching images and their sequences. The novelty in their approach is that a direct search on image content is possible by use of semantics-preserving image compression. For preserving semantics, images are reduced to a small set of perceptually-significant coefficients. The system has three descriptors which support searching based on appearance, 2-D shape, and on textual properties. These descriptions can act together in combination for browsing and searching capabilities.

The model that the media lab presents use semantics-preserving coefficients which allow reconstruction of the image. Users can browse over large image databases quickly and efficiently by using both text annotation associated with the images and the image content [23, 8, 21]. With this system, user can place flexible queries, involving semantics such as get me all images that have text annotations similar to the given image but shot in the west coast, or get me all images that have visual appearance similar to this image.

Photobook supports different types of image descriptions used for different applications. For example, Appearance Photobook would be applicable to image database that has images related to appearance say a face. A Texture Photobook would be useful for databases containing texture patterns. Similarly, a Shape Photobook would be applied to hand-tool, animal, clip-art databases that have dominating shapes. Photobook has provisions to combine the above three descriptions for a variety of applications.

With the above features, Photobook provides users with a sophisticated and efficient utility for database search based on image content.  

The Query By Image Content (QBIC) System

QBIC (Query By Image Content)[29] is a prototype software system for image retrieval developed at the IBM Almaden Research Center. It supports user queries on an image collection based on different features (properties) of image content such as color, texture, shape, location and layout of images and image objects. Graphical user queries are supported. QBIC has a rich user interface which allows an end user to query for specific colored objects by selecting that color from a color wheel, to query on a texture pattern from a selective set of texture patterns, draw shapes on a black board for querying on specific shaped objects and so on. Based on the feature vectors computed from an image, a similarity based approach is used. The queries also include standard SQL and text/keyword predicates. The graphical user interface lets a user build the queries, display multiple results, re-query based on returned images, and modify and resubmit queries.

Several approaches have been proposed such as VIMS[3] that retrieve similar images by relaxing feature values of the image based on the standard deviation of the features. Many commercial products for content based retrieval are also available such as Aphelion, ImagePro Plus, HLImage++, WiT [2, 13, 27, 26] etc. in different platforms. Each of these products vary based on the similarity strategies they apply for image comparison.

The above mentioned retrieval techniques focus on a specific domain or limited domains and hence have limited query capabilities. The VisualHarness system is not domain specific and can access data from distributed data repositories as far as these repositories are registered with the IH server. There may be different image features needed for supporting multiple domains. Since InfoHarness facilitates the hook up of different VIR engines, using BBA any required feature can be extracted from any VIR supporting that feature and stored in the database. A SQL query will be based on the features supported for the specific domain on which a user is querying during runtime.


Many web-based search engines are available, such as Altavista, Excite, WebCrawler, Yahoo [1, 6, 25, 27]. Most of these engines support only keyword queries based on textual data. Few support keyword queries semantically (also referred to as concept based search. Lack of precision with these search engines is a well known problem. Two engines, one named Electric Library [16] and another from Magnifi [17], search for image data based on textual queries. These keyword queries are usually done against annotations of the image (e.g., a text file related to the image). Currently, no search engines are available for web-accessible images. Even though some of the VIRs support searching based on image content on a local repository (not web-accessible) their application is still limited to specific domains. The VisualHarness system provides capability to access images in distributed repositories that belong to different domains using the BBA. We generally view the issue to be a lack of rich metadata in the current systems that are being developed and hence their support for limited querying capabilities. The VisualHarness system provides the option of extending metadata to whatever extent one needs for supporting different domains and different digital media information.  


The black box approach (BBA) is a new strategy proposed to facilitate image based searching and normalized retrieval without knowing the internals of a VIR engine. BBA is implemented as a part of the VisualHarness system. The BBA depends on extracting content-dependent metadata by comparing each image with a null image corresponding to each image feature supported by a VIR engine. The null image is currently used as an entirely black or an entirely white image. One of the challenging research issues for the future is "What is a good null image?".

The BBA approach allows different VIRs to be plugged into the VisualHarness system. Efficient storage and retrieval of metadata is enabled by the BBA since it permits the image content to be managed, searched and retrieved using a relational DBMS. The BBA allows the image feature queries to be translated to a SQL query over the BBA calculated distance metrics (metadata) during run time for retrieving the encapsulated image objects that are within a given tolerance of the input query image object for the specified features scaled by user given weights. The BBA also facilitates metadata correlation (metadata for different domains can be stored together in a database table) and provides access to distributed data repositories.

Our initial experiments with the BBA reveal that the proposed strategy is feasible. The results shown here indicate that the strategy is comparable to the VIR retrievals and is an effective and efficient solution for image management and retrieval by exploiting the capabilities of various VIRs.


[1] AltaVista.

[2] Aphelion.

[3] J. R. Bach, S. Paul, and R. Jain. A Visual Information Management System for the
Interactive Retrieval of Faces. In IEEE Transaction on Knowledge and Data Engineering,
October 1993.

[4] S. Chang, Q. Shi, and C. Yan. Iconic indexing by 2-d strings. In IEEE Transactions on
Pattern Analysis and Machine Intelligence, volume 9, pages 413-428, May 1987.

[5] W. Equitz. Retrieving images from a database using texture algorithms from qbic system.
Technical report, IBM Almadem Research Center, San Jose, Ca, 1993.

[6] Excite.

[7] C. Faloutsos and S. Christodoulakis. Optimal signature extraction and information loss. In
ACM TODS, volume 12, pages 395-428, September 1987.

[8] P. Gast. Integrating eigenpicture analysis with an image database. M.I.T.Bachelors Thesis,
Computer Science and Electrical Engineering Department, Advisor: Alex Pentland, 1993.

[9] A. Gupta and R. Jain. Visual information retrieval. Communications of the ACM, 40(5),
May 1997.

[10] A. Gupta. Visual information retrieval: A virage perspective. Technical report, Virage, San
Mateo, CA, 1995.

[11] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison-Wesley,

[12] B. Horn. Robot Vision. MIT Press, Cambridge, Mass., 1986.

[13] ImagePro.

[14] F. Korn, N. Sidirpoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest-neighbor
search in medical image databases. In Conf. on Very Large Data Bases (VLDB), September

[15] V. Kashyap, K. Shah, and A. Sheth. Metadata for building the multimedia patch quilt.
Technical report, University of Georgia, Athens, GA, 1995.

[16] Electric Library.

[17] Magnifi, Inc.

[18] C.N. Moores. Datacoding applied to mechanical organization of knowledge. In Am. Doc. 2,

[19] S. Mudumbai. Zebra Image Access System: Customizable, Extensible Metadata-based
Access to Federated Image Repositories. M.S. Thesis, LSDIS, CS Dept., Univ. of Georgia,
May 1997..

[20] E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. In
IEEE Trans. on Knowledge and Data Engineering (TDKE), 1996.

[21] R. W. Picard and T. Kabir. Finding similar patterns in large image databases. In Proc.
ICASSP, pages 161-164, Minneappolis, MN, 1993.

[22] A. Pentland, R. Picard, G. Davenport, and R. Welsh. The bt/mit project on advanced image
tools for telecommunications: An overview. In ImageCom 2nd International Conference on
Image Communications, Bordeaux, France, March 1993.

[23] A. Pentland, R.W. Picard, and S. Sclaroff. Photobook: Content Based Manipulation of Image
Databases, chapter 2, pages 43-75. Kluwer, Academic publishers, 1996.

[24] Virage.

[25] WebCrawler.

[26] WiT.

[27] WVision.

[28] Yahoo.

[29] J. Ashley, M. Flickner, J. Hafner, D. Lee, W. Niblack, and D. Petkovic. The Query By Image
Content (QBIC) System. Proc. of the ACM SIGMOD Intl. Conf, on Management of Data,
San Jose, CA, May 1995.



This project is funded in part by the Massive Digital Data Systems program with software donation from Virage, Inc.