Information retrieval in medicine: The electronic medical record as a new domain

Catherine Arnott Smith, PhD

ASIS&T Annual Meeting - 2006 (ASIS&T 2006)
Austin, Texas, November 3-9, 2006


“The medical record is a material form of public memory,” Berg (1996) writes, “a structured distributing and collecting device, where all tasks concerning a patient’s trajectory must begin and end…” [Italics original;

p. 510]. Structured distributing and collecting devices are the natural interest of information science. Unfortunately, of the 130 articles published about medicine in almost 36 years of JASIST, although 70 (54%) deal with information retrieval, communication and the work processes behind them, only 2 of these articles (1.5%) have focused on the medical record.

The most fundamental function of the medical record, whether paper or electronic, has been to document both the knowledge domains of clinical practice, and the work processes and practices that support and maintain the operation of these domains. The content of electronic health records (EHRs) reflects this multiplicity of needs and audiences. It is a mix of highly structured numeric data and excessively unstructured and idiosyncratic narrative text; increasingly, images are included as well. In fact, any information can be part of the medical record that is relevant for clinical decision making. This data makes its way into the record via voice transcription, data feed from machines, or conversion from paper. The body of existing information retrieval work most relevant to the medical record as a base for experiment is the work called “passage retrieval” defined as “the task of identifying and extracting fragments from large, or short but heterogeneous full text documents” (Melucci, 1998, p. 44).

This paper presents a document-centered approach to the EHR as an information retrieval problem. It is clear that passage retrieval researchers working in the field of information science have seen similar values in document passages as have researchers in medical informatics. Without either literature acknowledging the other, workers in both camps have identified the same potential in document structure, labels, specificity and explicit hierarchies of knowledge for signaling relevance to the reader. The National Health Information Infrastructure Initiative ( identifies academics and researchers as natural stakeholders, like clinicians and caregivers, in enabling better healthcare through better information sharing (National Committee on Vital and Health Statistics, 2003). Information science has much to contribute to the health information technology arena and to electronic health records in particular: their development, their maintenance, and most importantly their improvement to serve the needs of diverse users.

