Web Exclusive

Coverage of ASIS 1997 Annual Meeting

Uniform Resource Identifiers, Metadata, and What They Mean for Access to Networked Digital Resources


ASIS Annual Meeting Technical Session sponsored by SIGs CR/LAN/ PUB and the ASIS Standards Committee, November 5, 1997

Speakers:
Where Do We Stand on Uniform Resource Identifiers?
Clifford Lynch, Executive Director, Coalition for Networked Information
21 Dupont Circle, Washington, DC 20036
202-296-5098 Fax 202-872-0884
clifford@cni.org   www.cni.org
Longer paper: www.arl.org/newsltr/194/identifier.html

URNs and URCs: Representation, Operation, and Status.
Michael Mealling, Network Solutions,
505 Huntmar Park Dr.
Herndon, VA 22070
703-742-0400 Fax 703-742-9552
michaelm@internic.net

Ron Daniel, Jr., Advanced Computing Lab., MS B287 Los Alamos National Lab., Los Alamos, NM 87545
505-665-0597 Fax 505-665-4939
rdaniel@lanl.gov

Metadata, MARC, and the Dublin Core
Rebecca Guenther, MARC Standards Specialist, Network Development and MARC Standards Office, Library of Congress, Washington, DC 20540-4102
202-707-5092 Fax 202-707-0115
(rps@andromeda.rutgers.edu)
Reporter:
Dagobert Soergel, University of Maryland

Session Abstract: One standard that supports the Web is the Uniform Resource Locator (URL), a standard way of addressing networked resources. URLs have serious limitations, including expired links, confusion between names and addresses, and difficulty in distinguishing between various versions of a resource. Unlike the world of online catalogs, the web does not offer an infrastructure for bibliographic control. To deal with these inadequacies, the Internet Engineering Task Force (IETF) established the Uniform Resource Identifier (URI) Working Group to discuss and develop standards for naming, describing and addressing Internet resources. One intent of the Working Group is to create an all encompassing concept and associated syntax that will include and coordinate all forms of UR*s that might be needed. Two forms of URIs have been proposed, the Uniform Resource Names (URNs) and the Uniform Resource Characteristics (URCs). The URN is intended to deal with the issue of unique identifiers for networked resources. The URC is intended to contain metadata about a URN. In other words, the URC will supply a "bibliographic" description to an Internet resource to facilitate discovery of networked digital resources and collections. The session will mainly focus on the URC. Various proposals for their implementation and other metadata standards, such as the Dublin Core, will be outlined and presented.

Outline of the Session:

C. Lynch

Universal Resource Names (URN) (A type of Universal Resource Identifier (URI))

Roles of names for objects Why do we need names

Identification

Control over access

Underpinning for citation, the basic building block of scholarly discourse

Communication across boundaries less important inside communities than across communities

Hook for other metadata

General metadata, such as author, subject descriptor

Rights management

Rating systems

In the network environment, names and citations become "actionable", they can be clicked on to get to the actual object.


As a consequence of citing, that is, including pointers via names becoming more consequential, citation is now longer taking for granted, as it is in the print world, but may become matter of dispute. Legal cases:


Microsoft pointer to TicketMaster

Total News pointers to other news services, where clicking results in the item form the other service being shown in a frame with the Total News logo.


The system can "action back": Track what users are looking at, charge for access

Naming systems and resolution


A naming system is a structure for assigning names to information objects following a set of policies. Naming system is concerned with the structure and the properties of names.

Resolution is concerned with getting from the name to an instance of the named object. How to communicate about names

Naming systems are more basic. Resolution depends on technology, could be provided by third party.

Assigning a name makes a statement about whether two things are the same or different. This depends on the user. For example, for publishers and booksellers, the hardback and paperback editions of a book are different, therefore they get different ISBNs (International Standard Book Number). For scholarly use, the two objects are the same if hardback and paperback have the same pagination, as is usually the case for scholarly books. Different editions of a work (such as a Shakespeare play) are considered the same for some purposes and different for others. With rare books, each individual copy needs to be distinguished.

From one perspective, an information object could be considered simply as a sequence of bits from which a name could be computed algorithmically. From another perspective, several sequences of bits may all correspond to the same abstract work. But different file formats of the same work may need to be distinguished because there may be an integrity loss in format conversion (loss of image quality in image compression, loss of formatting information in converting form a word processor file to a plain text file).

Granularity


Where do names leave off and navigation within a an object begin? (DS: Navigation within an object requires internal names!)

Can one tell from an object what its name is? Computable names, such as SICI codes for journal articles or hash codes/signatures

Privacy as a concern in the resolution database. Should it be private or public?

Who gets to use what names? Rights to names. Big issue in domain names.

What is wrong with URLs?


URLs are location-dependent, vary over time

URLs are protocol-dependent, but protocols are tied to access methods that change with technology.

There are no well-defined policies for assigning URLs.

Role of IETF (Internet Engineering Task Force)


Standards for Internet infrastructure

IETF has not defined a naming system but rather a framework for other organizations to establish naming systems. Some naming systems will be extended from the print world, for example ISBN and ISSN (International Standard Serial Number).

Structure of Unified Resource Names (URN)

Presentation: www.rwhois.net/michael
Look for presentations and other materials on www.acl.lanl.gov and www.acl.lanl.gov/~rdaniel
Documents:

RFC 2141 URN syntax
RFC 2168 Resolution of URIs using DNS (Domain Name Server)
IETF (Internet Engineering Task Force) www.ietf.org
W3 metadata overview and RDF working group home page: www.w3.org/Metadata/RDF

Both are members of the IETF.

Work of IETF

Michael Mealling

Structure of URNs

At the top level, URNs are divided into name spaces, such as inet, cid, http, handle, ILN, ISBN. Each name space has its own rules for names. Thus the structure for a URN is


<Name space ID>:<opaque string>
The opaque string may contain a delimiter (such as @) that separates a domain or subspace ID.

To resolve a URN (find metadata about the resource, incl. pointers to one or more copies of it), follow these steps:

  1. Look for the NID in the NID registry and find the naming rule used in this name space.

  2. Apply the rule to the opaque string. If necessary, find subspace and rule used for it in the subspace registry for the name space.
    If necessary, repeat 2 at the next lower level until you are told where to get information about the resource identified by the name.

URN standards are determined by IETF. IETF maintains mailing lists for people who want to participate in the formulation of specific standards. Join by signing up. Mailing lists are the final authority.

Progress: URN name spaces are being defined. There is work on a URN client (program to resolve URNs) specification. VRML (Virtual Reality Modeling Language) uses URN

Ron Daniel, Jr.

Whatever happened to URCs (Uniform Resource Characteristic)

URCs were designed to link a URN with the appropriate URL(s)

URC history: IETF URI-WG working group 1992. URC originated as the data string to bind a URN to a set of URLs

The demise of URCs: The URI-WG defined URL schemes, defined URN and URC. Contemplated URN agents, but ended work. A URN-WG was established in Fall 1996, but a proposed URC-WG was not approved. The URN-WG adopted a very loose definition of URC.

Meanwhile, another organization, W3C worked on PICS (Platform for Internet Content Selection. PICS started out as a system to connect third-party ratings with URLs; it had a three-part architecture: labels, rating system, rules. Labels were numeric only. PICS-NG (Next Generation), defined in January 1997, dealt with strings and incorporated other metadata, such as the Dublin Core elements and digital signatures. It evolved into the Resource Description Framework (RDF), a broadly applicable approach to metadata on the Web. The data model is a directed graphs consisting of nodes and arcs. Arcs are labeled with the type of relationship; these labels are part of the name space. RDF defines a system for establishing link types and specifies some broadly useful primitives. The RDF syntax is mapped to an XML format. The generic structure of RDF allows implementation of the basic ideas of the Warwick framework.

Status of RDF: First draft of model and syntax Oct. 2, 1997. The syntax for types will be enhanced. Abbreviation syntax is being defined. Schemas and rules still to come.

Why is RDF likely to be important? Major browser vendors are heavily involved. RDF defines syntax and structure, allowing user communities to concentrate on semantics. RDF is the basis for using and reusing descriptive schemas.

Summary:

RDF does what URC was intended for
Broadly supported for Web metadata
Can incorporate the Dublin Core and rights management

Rebecca Guenther

Metadata, MARC, and the Dublin Core

  1. MARC as a metadata standard
  2. Integrate metadata form several data standards in digital libraries
  3. Dublin Core
  4. MARC and the Dublin Core. Other MARC mappings
  5. Action needed

1. MARC as a metadata standard

MARC follows Z39.2 and ISO 2709 in its structure. Field content is governed by other standards (AACR2r) (Anglo-American Cataloging Rules, 2nd ed., 1986 revision)

Use of MARC for cataloging Internet resources. New field 856 for URL and other identifiers. InterCat database (OCLC). Digital cataloging guidelines at LC.

Why MARC for Internet resources? Allows for incorporation in library catalogs. In many cases, a record for the print version exists and can be amended with a pointer to the digital version. Since MARC cataloging requires effort, it is applied only to high-level resources.

Using metadata for navigating digital collections; constructing finding aids.

LC National Digital Library. Framework; access aids; includes digital images which form part of multimedia objects. NDL uses logical names for objects. The repository system links a logical name to a physical location and to metadata about the object.

2. Integrate metadata form several data standards in digital libraries

For unified access, access aids using different metadata structures (MARC, SGML metadata record, HTML header) must be integrated. Example: LC Civil War photographs

3. Dublin Core (DC)

Warwick Framework: Conceptual framework for the coexistence of many varieties of metadata. Web a strategic application (HTML/XML syntax), but not the only one.

Recent developments

CNI/OCLC Sep. 1996. Workshop on the Dublin Core and metadata for images DC-4, Canberra, Australia, March 1997. Discussion: Simplicity vs. flexibility. Minimalist camp vs. structuralist camp (advocate sub-elements and qualifiers). Result: Canberra qualifiers.

DC-5, Helsinki, Finland, Oct. 1997. Refinements and implementation strategies and standards. Agreed on formal data model expressed in RDF (Resource Description Framework, see Daniel above). Minimalist DC frozen with semantics for additional substructure/refinements. Closer DC - W3C collaboration

30+ DC implementation projects. Consensus on DC as lingua franca. Applications to non-electronic resources being considered

4. MARC and the Dublin Core. Other MARC mappings

MARC amended to cover all DC elements. Conversion DC to MARC (simple DC to MARC. complex DC to MARC). Results in skeletal MARC record which can then be enhanced. Several sample projects are underway. Conversion MARC to DC easier, results in complete DC records (but looses information)

MARC mappings to other standards
GILS (geospatial data)
EAD

5. Action needed

Guidelines and registry for qualifiers to DC
Integrate metadata in the global information infrastructure: Exploiting metadata in Web search services; library Web pages with DC metadata.

References:
Dublin Core: purl.org/metadata/dublin_core
MARC: www.loc.gov/marc
Intercat database: www.oclc.org:6990
Warwick framework: www.oclc.org:5046/~weibel/html-meta.html
EAD: www.loc.gov

Discussion

IETF standardization process is less formal than that of other standard bodies. Drafts are put out for comment, and everybody is free to comments.

Who is expected to create DC records? Both authors of documents (some elements could be filled in automatically by authoring software) and catalogers, but catalogers are more likely to produce MARC records (which are more complete).

Is there a movement to embed DC elements into non-text documents? The DC is intended to be applicable to non-text documents. The problem is hoe to link the metadata to images. One possibility is to create a separate HTML page with metatags not commonly found in the TIF header. Metadata can be (1) Part of a document (in-line), (2) in separate records, or (#3) created as a byproduct of retrieval operations.

Metadata for non-electronic resources, for example museum objects? Need to provide for an element that stores the type of the object. Also problem: How to relate the description of an object to the description of a digital image of the object?

Does DC require authority control, esp. of subjects? No requirement, could be agreed upon in special domains, otherwise choice of keywords is up to the author. There is a controlled list of resource types.

Do software producers (such as producers of SGML tools) introduce their own subject schemes as part of authoring tools? Only example would be the use of the Arts and Architecture Thesaurus (AAT) in tools specifically designed for the art world. A DDT (Document Type Definition) may specify authority control, but authoring tools are not bound to a specific DTD.

Benefits of search systems based on DC vs. those based on MARC vs. combined systems? DC is intended for the Web; Web documents are unlikely to get full MARC records. More generally: MARC and other systems give more information than DC and therefore allow for more powerful searching; but for the same reason they require knowledge to produce and use. A user who has limited knowledge or wants to search multiple databases using different metadata standards should use DC, provided a search-time mapping form DC to the other standards is available.