A decade ago, most archivists thought about electronic records issues much the way that librarians do today – as a problem of documenting and preserving data files in specialized repositories. Since then, networked computing has transformed the mechanisms of business communications. Archivists have increasingly adopted the view that fundamental issues regarding records capture and retention, whether in paper or electronic form, are their identification, classification by provenance and retention in context of use so that they can be understood. Only when these challenges have been successfully met will questions of how or where to keep records or how to provide access to them arise. Thus the “archives” as files in need of retention and the “archives” as repository are issues only after what are currently the most difficult challenges of day-to-day recordkeeping have been satisfied.
Librarians and the preservation community still focus their attention on electronic objects prepared and published as coherent entities to reside in repositories. Thus they generally ignore the very real problem of acquiring coherent records from disparate business information systems not designed to keep records and rife with undocumented software and hardware dependencies. Nor do they usually deal with objects whose content is frequently splattered with proprietary, personal, private and legally troublesome non-public data. So when archivists go to meetings of librarians and preservationists focused on keeping electronic “archives” they generally find the discussion overlooks the front end of the issue, where records “happen.” Librarians and preservationists, meanwhile, find it hard to understand how archivists can seemingly shrug off the back-end, long-term retention issues as not terribly interesting and dependent on technology developments very much out of the hands of either community.
In May 1997, a working meeting of international researchers and practitioners of the archival approach to electronic recordkeeping was organized in Pittsburgh by Archives & Museum Informatics. This meeting focused primarily on the issues at the “front end,” before records can be brought together to become the problem of any repository. The following summary, however, is directed to the larger community. Issues of electronic record creation and capture are shared by all those who have become dependent upon technological systems to support their business processes.
The meeting confirmed the degree to which common ground has been reached in the past several years. However, much research has focused on particular portions of the problem: many solutions which appear independent are actually interdependent. Tensions are emerging between practitioners who want to “just get on with it,” and researchers who seem to be “peeling an onion.” These tensions reveal a critical juncture in the development of solutions for electronic records management. After a long period of developing models, agreeing on terminology and defining problems, we seem ready to begin serious testing of proposed solutions. Much research remains, though. The following themes were explored in presentations and breakout group discussions:
Research into the definition of records has been focused on two major groups of researchers at the University of Pittsburgh and the University of British Columbia. Both were asked to summarize their findings about what makes a record a record. Presentations by Luciana Duranti, Maria Guercio, Richard Cox and Wendy Duff focused on the source of authority for, and universality of, records metadata requirements. Driven by pragmatism, the University of Pittsburgh team looked for “warrant” in the sources considered authoritative by the practitioners of ancillary professions on whom archivists rely – lawyers, auditors, IT personnel, etc. (See Duff, Wendy M., “Compiling Warrant in Support of Functional Requirements,” Bulletin of the American Society for Information Science, June/July 1997, pp. 12-13.) In the European tradition, the UBC team examined the authority of diplomatics, a discipline grounded in the juridical systems of early modern Europe. To many, their differences on sources of authority (a more philosophical issue about the nature of truth) were overshadowed by their apparent agreement on basic characteristics and most concrete metadata requirements of electronic records.
Subsequent discussions demonstrated that neither definition is adequate for those responsible for managing electronic records or provides necessary algorithmic specificity for systems to recognize records when they are created by business events. The definitions put forward need to be synthesized, and the common core elements of an electronic record must be identified in a high-level definition useful across systems and communities. Variable sets of metadata drawn from the warrant of different juridical, business, organizational and procedural contexts could supplement this core. In combination with an architecture to express content, context and structure, a shared definition would provide a model that maps the differing concepts and languages of the research projects. This common semantic would enable collaboration across the discipline and would provide a means of communication with record creators, users and researchers in other disciplines.
A tension was inherent in the discussion of definitions of electronic records. While a more generalized framework was seen as necessary to bridge the philosophical differences of the researchers, it would not serve the needs of those who are building systems. There, concrete expressions of both the semantics and the syntax of electronic records and their associated metadata are required urgently. The utility of the definition is the basic issue.
The presenters agreed that broad frameworks directing people and organizations to keep electronic records need to be accompanied by specific performance standards, monitoring/reporting mechanisms, rewards and penalties. The presentations reinforced the view that records result from business processes and are the responsibility of process managers. Policy is a strategic, and not fundamentally a technological, issue. But as yet we know little about the acceptance or adherence to policies, the costs of implementing (or even developing) them or the appropriate level of granularity in implementation.
Discussion focused on the feasibility of implementing electronic records management policies. If much of the responsibility for the creation and retention of records is shifted to the desktop of individuals, how do we maintain the quality of records? What are viable strategies in terms of hardware and software implementation? Can we develop a generic set of specifications? What role can professional “best practices” play, and how do we train people to meet these new requirements?
Changes in policy require changes in accountability structures as well. Can policies be enforced? Which mechanisms work? How can project managers, whose output is measured in other business terms, be held accountable for records management? Some organizations respond more readily to policy changes than others. What kinds of organizations respond best to policy? Which to design? Which to implementation? Which to standards? What strategies are available as alternatives in less formal working environments? Are there identifiable and measurable differences between industries and between the public and private sectors?
Groups presenting in this session included Artificial Intelligence Atlanta, a team engaged in research with the Department of Defense, and ASTRA (a Swedish pharmaceutical firm) and the Swedish National Archives, jointly involved in research to develop methods for electronic recordkeeping in the pharmaceutical industry (an industry well represented at this meeting because they are both heavily regulated and have huge long-term liabilities that can be defended only with their now largely electronic, scientific records). Both teams are attempting to find methods to identify a record-creating event, or a business transaction, that requires a record to be created. How can a system recognize a “trigger” event? The ASTRA team used STEP (4) to model the business process and identify such events (which they have termed causa), while the DoD team tried to develop a set-based logic to identify events and provide “automated decision support for classification” to a human records classifier. Both acknowledged that models of types of actions don’t necessarily conform to actions as conducted; matching the process model to real events has proven difficult. Unfortunately, the archival rules to which the business model would relate, if it were a success, are also not as formal as they need to be. Expressions in set theory proposed by AIA look highly algorithmic, but in fact are too vague in operation.
Research questions focus on distinguishing creator vs. organizational requirements. A tension was recognized between the creation of functional and efficient business systems and the implementation of full electronic records capture functionality. For those in the group who felt that one of the primary characteristics of an electronic record was that it was “set aside,” classification became a key moment in the process (5). Much work has focused on how to classify documents consistently. Work-flow systems that position the creation of a record within a function and link that function to a pre-defined classification were seen as promising. Another tack would be to identify functions assigned to personnel classifying a record in order to narrow the possibilities available to them and improve accuracy. Both of these approaches suggest the creation of a structured electronic workspace where work is done within functional areas as an aid in the record capture process. Such a space enables system implementation methodologies that can test for rigorous adherence.
A reliance upon an understanding of the business processes carried out by organizations raised questions regarding modeling of workflow itself. What data is required about the function being performed and how is its location in workflow related to a captured record? Clear models for functional requirements specification are needed. But what is the role of the archivist within an interdisciplinary team that is creating new systems to support electronic work? Communicating recordkeeping requirements to systems designers and implementors is a major challenge that would be aided by a consistent and unambiguous model of events and activities. The model should establish a synthesis between the various models proposed and the business processes and functions identified.
McDonald reported on a vision developed at the National Archives of Canada where recordkeeping is transparent, incorporated into an overall IT strategy and integrated into tools and technology. But what does “transparent” mean, and what does this world look like? How do we articulate the relationships between programs, work processes and activities within organization? What is required in order to specify built-in capture and retention rules (to enable automated disposition)? How can systems be designed that support the relatively unstructured environment in the modern office, where work processes are complex, ad hoc and dynamic? Can recordkeeping be made invisible? Or should those responsible for record creation be made aware of their actions?
Even if systems could be designed and implemented to automate the capture of electronic records, research is still needed into the required metadata. How do we model recordkeeping systems that enable records and their metadata to remain meaningful over time? How can we ensure the integrity of a record through time? Will metadata have to be “registered”? What metadata is required to support future re-use? How does metadata required for electronic records map to that for other functions – information discovery for example?
If an encapsulated object approach is taken, what are the characteristics of a good envelope? Are there existing technologies or standards that can be adapted or implemented? Are there standard syntaxes that are “good enough” for some situations? Can we assign value metrics around the capture, management, retention and migration of electronic records? What are the costs vs. the benefits of various strategies?
Test-bed projects are needed to benchmark and cost various approaches to the capture and retention of electronic records and their associated metadata. The semantics and syntax of a generic attribute set need to be designed and tested against the functionality required. The effectiveness of metadata in reducing software dependencies must be evaluated and tested in a variety of circumstances.
Researchers in this session included Margaret Hedstrom of the University of Michigan, Anne Marie Makerenko from Babson College Archives, and Alan Murdock, representing a team from Pfizer Ltd., a British pharmaceutical company. Their practical research questions focused on the costs and mechanics of maintaining electronic archives. How can we model event-driven records retention scheduling? What are migration cost elements? What risks arise from what loss under what circumstances? And can models be developed and/or partners be found in highly regulated industries where long-term retention of electronic records is a legislated mandate?
It is evident to the researchers that much remains to be determined before scaleable solutions are available. Though practitioners keep asking for “core” definitions and implementable procedures, it is not yet clear that “cores” are workable. The last mile is proving hard to travel because frameworks aren’t good at the detailed semantics, because functional requirements are far from specifications and because the real costs of migrations depend on so many local variables. Concrete implementations are necessary to build our understanding of these factors, but comparative analysis and detailed reporting on choices made and the rationales for them will be critical to building shared strategies.
As Margaret Hedstrom observed, we need to improve our knowledge of alternatives to exact replication. What strategies are appropriate to different types of records and different preservation goals? How much functionality must be maintained in an archival electronic record? What is acceptable information loss? Could we consider the preservation of surrogates? Can we reconstruct context and structure? We need criteria for the creation and evaluation of surrogates as preservation tools.
Again, implementation became a major theme. How can we devise migration programs without a detailed understanding of the costs and benefits of particular approaches to migration? How do we assess the risks involved in information loss? We are unable to ensure that particular methods will work in all situations; how do we support local decision-making to enable the best conclusions for a particular situation? What are the project management and quality assurance techniques that will be most effective throughout the process?
Besides the need to maintain more explicit contextual metadata, it remains unclear whether or how the requirements for long-term preservation of records are fundamentally different from the requirements for the preservation of other types of digital information. If they are, then how are they different? Where can we collaborate with the broader community, and where must specific archival solutions be developed?
For the near term, the most promising areas for research seem to require greater specificity and granularity in their focus. In the definition of records, we need concrete risks associated with different definitions in different circumstances and an executable specification of recordness. In policy, we need to define the concrete costs and benefits of specific policies and their implementation through organizational, national and international mechanisms. To understand record creation, we need testable models of the kinds of records created by different business processes. In the arena of capturing records, we need tests of registry mechanisms for software and hardware dependency metadata and for business context metadata, and we need to test proposed structures for the inviolable storage of metadata and records’ content. For the maintenance of records over time, we need comparative migration data, equivalent measures of the effectiveness of different systems architectures and strategic solutions for the universal retention of records (obviating the need for each institution to invest in its own migration of dependencies). Finally, we need very detailed and granular research into the needs of users and how they are articulated so that metadata on the content and context of records will support the research process.
None of these problems is going to be easy to solve. The research agenda meeting in the spring of 1997 articulated a full set of open questions which will provide grist for researchers and practitioners for a long time to come. The archivists participating looked forward to the interdisciplinary collaboration necessary to move beyond open questions to workable solutions.