Query Expansion Using UMLS Tools for Health Information Retrieval

Kun Lu and Xiangming Mu

Four new automatic query expansion strategies based on UMLS Metathesaurus are proposed to improve the effectiveness of health information retrieval: String index with Concept expansion (SC), String index with Term expansion (ST), Word index with Concept expansion (WC), and Word index with Term expansion (WT). Results from a comparison evaluation study using Medline plus dataset indicated that under maximum recall, using query expansion with string index for both concept and term level expansion achieves better average precision (10.7% and 6.9% improvement, respectively) and recall (2.4% and 26.2% improvement, respectively). Particularly, we found that recall for term level expansion is much higher than that for concept level: the term expansion strategy enhances the average recall by 35.5%. In terms of precision, string index approach enhances the average precision by 8.8%. We also found that query expansion did not work for long queries (more than three phrases queries). These results will help us better understand the effectiveness of different automatic query expansion strategies using UMLS Metathesaurus and further inform the design of future Healthcare IR system.

