Entry Vocabulary -- a Technology to Enhance Digital Search    Fredric Gey, Michael Buckland, Aitao Chen, and Ray Larson

This paper describes a search technology which enables improved search across diverse genres of digital objects -- documents, patents, cross-language retrieval, numeric data and images. The technology leverages human indexing of objects in specialized domains to provide increased accessibility to non-expert searchers. Our approach is the reverse-engineer text categorization to supply mappings from ordinary language vocabulary to specialist vocabulary by constructing maximum likelihood mappings between words and phrases and classification schemes. This forms the training data or 'entry vocabulary'; subsequently user queries are matched against the entry vocabulary to expand the search universe. The technology has been applied to search of patent databases, numeric economic statistics, and foreign language document collections.

