Web Exclusive

Coverage of ASIS 1997 Annual Meeting

The State-of-the-Art of Search Engines [and Their Potential for Evolution]as Intelligent Agents


ASIS Annual Meeting Technical Session sponsored by SIG IAE, November 4, 1997

Speakers:
Bette Brunelle, OVID Technologies Inc.
J. Michael Schultz, Infonautics Corporation
Dr. Tim Finan, University of Maryland, Baltimore County: Software Agents for Information Retrieval
Moderator and Reporter:
Deanna Morrow Hall, Corporate Information Resources

Session Abstract: The evolution of search software is entering a third iteration in which search software is characterized by 1. natural-language queries; 2. additional functions based on advanced natural-language processing, e.g. relevance evaluation and customized presentation of results; 3. a dissolving of the direct relationship between search software and specified databases. This session will present commercial vendors, with special consideration of the extent to which third-generation features may augment or replace human interaction. (From the Final Program)

Session Report: Bette Brunelle, OVID Technologies Inc.
Brunelle described OVID, the experimental natural language search engine which OVID Technologies has developed over the past 10 years to search their full text journal databases. OVID, which is based on the BRS software, is intended to achieve improvements on traditional Boolean search capabilities. It will be beta-tested in 1998.

Their initial approach was to develop relevance ranking based on both overall statistical occurrence of the search term and the location of the terms in the document, supplemented by a fractional weighting system that includes documents in results even if they do not have all the search terms. Their further development efforts have been targeted at getting a match between the terminologies of the search request and the targeted database. Key to their approach is the use of semantic explosions, in which the terms in the query are automatically expanded by:

This is particularly applicable to discourse domain-specific full text files working from a domain-specific thesaurus. Initial tests of the search engine show results similar to those attainable by fairly sophisticated Boolean techniques.

The software is especially valuable when:

Other features of the software:

Limitations of the software:

J. Michael Schultz, Infonautics Corporation
Infonautics produces the Electronic Library, which is an amalgam of thousands of data sources without domain-specific or controlled vocabularies, e.g, National Public Radio transcripts, Literary Times and ethnic newspaper articles. The search engine is based on Excalibur (natural language processing) software., which includes the following elements:

Searches are processed in two stages. An initial query yields a list of "recurring themes" representing concepts (people, places, company, and products) from retrieved documents. Then the user clicks on the most pertinent themes, and results are displayed and re-ordered on the basis of the themes selected. The perspective gained when the recurring-theme listing is examined and used can enable a user to find an answer to a specific question without examining large sets of documents.

Results are presented graphically, with relevance rankings. There is clustering based on lists of key terms appearing in documents. Weaknesses of the clustering system include: arbitrariness of numbers of clusters presented; lack of "cluster type" identification; and the generation of keywords from each document in isolation.

Value is added from supplemental sources, e.g, the "people" theme in a search for "Who shot Abraham Lincoln?" resulted in a picture of John Wilkes Booth being imported from a secondary database for the search results. The user can click on a listed theme to re-order the list of results. Homonyms can cause problems at this stage too, e.g., "Flyers" on a list of themes could link to documents related to the Philadelphia Flyers hockey team, or to airline travel.

Natural language queries are supplemental to traditional Boolean queries, whose power is still valuable.

Dr. Tim Finan, University of Maryland, Baltimore County
"Software Agents for Information Retrieval"

Most information on the World Wide Web may go unread and unused because search systems provide a limited keyword-based interface that is often slow, unselective and does not support heterogeneity. Consequently, there is frustration over the slowness with which the new systems that have been visualized are being implemented. In response, the development of an agent-based information retrieval (ABIR) architecture is proposed to address four key concerns:

The emerging paradigm for ABIR system-building centers on the ideal characteristics of software agents : autonomy, adaptation, and cooperation. In addition, software agent technology allows the development of user-interfacing agents, personal expert assistants, mobile software technology, and cooperating software agents. The possibility of achieving collaborative filtering by which agents gather semantic information from interactions with users, then rank, retain and share these findings will lead to a new era in building large-scale distributed ABIR software systems.

See http://umbc.edu/~finin/asis97 for more details on the presentation.