Editor's note: Mr. Arora's paper placed second in the 2001 SIG/III International Paper competition. The paper has been condensed for publication in the Bulletin.

Network-Enabled Digitized Collection at the Central Library, IIT Delhi

by Jagdish Arora

Jagdish Arora is head of computer applications in the Central Library at the Indian Institute of Technology, Hauz Khas, New Delhi - 110 016; telephone: 91-11-6591452, 91-11-6591467; fax: 91-11-6862037, 6855227; e-mail: jarora42@hotmail.com.

The emergence of the Internet, particularly the World Wide Web (WWW), as a new medium of information delivery, coupled with availability of powerful hardware, software and networking technology, has triggered large-scale commercial and non-commercial digitization programs the world over. An increasing number of publishers are using the Internet as a global way to offer their publications to the international community of scientists and technologists resulting in the large-scale appearance of STM (scientific, technical, medical) electronic journals on the Web. The number of electronic journals has grown in dramatic proportion from less than 10 in 1989 to more than 8500 in April 2000. The 37th edition of Ulrich's International Periodical Directory (1999) reports that of 157,000 serials listed in the directory 10,332 were available exclusively online or in addition to a paper counterpart.

Internet and Web technology together provide an unparalleled medium for delivery of information with great speed and economy. Moreover, Web-based electronic information products not only eliminate paper, physical storage and transportation costs, they also offer a host of other possibilities for incorporating multimedia and hyperlink features into electronic documents hitherto impossible on paper media.

Web-based electronic information products are exerting ever-increasing pressure on traditional libraries, which, in turn, are committing larger portions of their budgets for either procuring or accessing Web-based online or full-text search services, CD-ROM products, online databases, multimedia products, etc. The libraries and information centers, as consumers of electronic journals and online databases, are benefiting greatly from this technology-driven revolution. The information products of the technological revolution, in turn, have triggered a major shift in the traditional practices and policies of buying, storing and accessing journals.

During the past decade great progress has been made in both theoretical and practical research in digital libraries. Besides acquiring and buying access to digital collections, academic and research libraries are making efforts to initiate digital library projects in their respective institutions to build their own digital collections. The increasing commitment to the Web-based digitized collections at the Central Library, India Institute of Technology (IIT) Delhi coincides with installation of a fiber optics-based Campus-LAN connected to a 2 Mbps radio link with VSLN, enabling faster Internet access for the academic community of the Institute. The availability of this high-speed Internet connection has led to a number of sponsored and unsponsored projects for building network-based digitized collections within the framework of traditional library and information services at the Central Library, IIT Delhi. This article outlines the various constituents of its digital library program.

The Campus LAN & the Internet Connection at IIT Delhi

The Campus LAN at IIT Delhi consists of a state-of-the-art switched and routed network with a fiber-optics backbone and enhanced CAT-5 UTP cabling. The LAN consists of more than 1400 switched network access points, which are configured into 35 virtual LANs to cover each department, center, central facility and administration. Three routers have been configured in a hot standby mode to interconnect these virtual LANs, to create a DMZ (secure) LAN, and a non-routed administration LAN. The old Institute-bridged LAN, consisting of more than 15 Thick Ethernet backbone segments, has been also connected to the new LAN through one of the switched network access points.

The Campus LAN is connected to the Internet through a PIX Firewall, a DMZ LAN, an external router, and the 2 Mbps radio link with VSNL mentioned above. The firewall protects the Campus LAN from unauthorized user access from the Internet, does network address translation (NAT) from private and internal IP network numbers (, Class A Network ) to legal IP numbers (, Class C Networks), and provides controlled access to the IIT Delhi WWW, mail and DNS services as virtual resources on the DMZ LAN. Access servers provide free PPP-dialup access to the Campus LAN and the Internet through 32 modems (33.6 Kbps) and 32 internal lines of EAPBX to all the faculty resident on the campus. Twenty-one switches are interconnected in a tree topology, with Fast Ethernet trunking, providing 200 Mbps full duplex communication paths. In anticipation of increase in Internet and cross-country traffic, a memorandum of understanding has been signed with ERNET Society for an additional 2 Mpbs terrestrial link, which will become operational soon.

Cyber Cafés are also operational in each of eight Institute hostels with 19 or 20 access points in each café. The Institute has provided 5 to 10 PCs for each Cyber Café, all of which are connected to the Institute backbone over fiber links, with 200 Mbps full duplex communication paths. Eventually the network will be extended to each hostel room requiring an additional 3200 access points. The work has already been started. In the hospital, 33 network access points are provided, out of which 11 will provide Internet access to doctors and the remaining 22 will be used for computerization of hospital activities.

Building the Digital Collection at the Central Library, IIT Delhi

The Central Library at the IIT Delhi is using a multi-pronged approach to build up network-enabled digitized collections. The Library, by policy, acquires material in electronic form in preference to print form wherever possible. Besides acquiring and buying access to digital collections, efforts have been made to initiate in-house digitization of documents. A number of digitization projects are in various stages of execution. Major network-enabled digitized collections at the Central Library are described below:

Buying Access to Web-based Full-text Digitized Collections in the Library. The Library has been providing Web-based full-text access to several electronic journals since 1998. License agreements were signed with electronic publishers after negotiations to get maximum benefits for the users. The license agreement signed with the Elsevier Science Publishers provides access to all 1100 journals on the ScienceDirect site with download options without any restrictions. In all the full-text of about 1,450 electronic journals can be accessed. All IP addresses used in the Institute are authorized and enabled to access the above mentioned electronic collections.

Building a Digital Collection In-house: Converting Datasets That Are "Born Digital." Most libraries and the institutions implementing digital libraries invariably have datasets that were originally created in digital format. Doctoral dissertations submitted to universities and research institutions are highly valuable documents that qualify to be an important component of any digital library implementation. In addition, the Institute has annual reports, prospectuses, courses of studies, technical reports and other datasets that might be included in digital collection. The items listed above are invariably composed in a word processing program or desktop publishing package. Such documents can be converted into HTML, PostScript and PDF using tools like Acrobat 5.0 or Acrobat Exchange. Online converters are also available through Adobe's site.

Some publications, namely the Prospectus, the Course of Studies (Undergraduate and Postgraduate) and IIT Delhi at a Glance have already been converted into PDF from their native format in PageMaker. The content pages of each of these publications are linked to their respective descriptions using Acrobat Catalogue. These four publications are given to visiting dignitaries on CDs with a Web-based interface.

Initiatives have also been taken for electronic submission of theses and dissertations. Under this program old Ph.D. theses and dissertations would be scanned and made accessible on the Web as part of the Networked Digital Library of Theses and Dissertations (NDLTD) initiative. Projects sponsored by the Department of Biotechnology (DBT) and the Ministry of Human Resource Development (MHRD) provide funds for scanning of Ph.D. theses submitted to IIT Delhi. Figure 1 illustrates the process involved in digitization of Ph.D. theses and dissertations at IIT Delhi.

Building a Digital Collection In-House: Conversion of Existing Print Media into Digital Format. Several digital library projects are concerned with providing digital access to materials that already exist within traditional libraries as print media. Scanned page images are the only reasonable solution for institutions such as libraries to convert existing paper collections (legacy documents) without having access to the original data in computer-processible formats convertible into HTML/SGML or in other structured or unstructured text. There are several large projects using page images as their primary storage format, including project JSTOR ( www.jstor.org)  at Princeton University funded by the Mellon Foundation.

Capturing page images is comparatively easy and inexpensive. It is also a faithful reproduction of the original, maintaining page integrity and originality. Scanned textual images, however, are not searchable unless they are scanned by OCR, which is a highly error-prone process, especially when it involves scientific texts. The facility set up in the Central Library, IIT Delhi, for scanning deteriorating and fragile old volumes of journals consists of

  • Two HP Scanjet Flat-bed Scanners (6100C and 6300C)
  • OmniDoc 1.1
  • Pentium III Workstations (500 MHz, 128 MB RAM, 20 GB HDD Windows 98)

The old, fragile and bound volumes of journals are first scanned using OmniDoc 1.1. Scanned images of articles from an individual issue are then exported as TIFF (ver.5) while its indexing part is exported as a plain text file. While the TIFF files are preserved for archival purposes, a PDF is derived from the TIFF file using Acrobat Exchange version 3.0. The text files, consisting of the author, title and location information of an article, are pulled together in a content page, which is coded in HTML and hyperlinked to the article images in PDF format. The images of the articles in PDF format along with the associated Web interface is put up on the Campus intranet, which can be accessed by the users through the Library's home page.  The project was sponsored by the All India Council for Technical Education. The process is shown in Figure 2.

Subject Portal at the Central Library Website. The home page of the Central Library serves as a structured and organized guide to the electronic resources available on the Internet. The portal site is updated regularly. The home page provides more than 2500 links to electronic resources on the Web. It can be accessed both on the Internet and through the IIT Intranet at the following sites:

Other Digital Collections. The Central Library has acquired European Patent Office information, the Indian Standards database and many bibliographic databases on CD-ROM. In addition, its OPAC is a major resource. The Libsys package, bought in June 1998, has been fully implemented for computerization of all activities in the library including acquisition, cataloguing, circulation and serials control. All faculty, staff, researchers and postgraduate students are already enrolled for the computerized circulation system. The undergraduate students are being enrolled in the last phase, which will mark a complete transition from the manual to the computerized circulation system.

The library's online public access catalogue (OPAC) is operational both on Intranet and Internet. It can be accessed online to search more than 130,000 bibliographic records, available in the library database through the Web-based search interface or with the Libsys Windows client.

CD ROM-based Search Services through a CD NET System. With the advent of CD-ROM technology in the mid-1980s several bibliographic databases, which were earlier available only through online vendors, started appearing on CDs at an affordable price. CD-ROM-based search services were established at the Central Library, IIT Delhi in 1991. The library acquired three CD-ROM workstations and four important bibliographic databases on CDs, namely COMPENDEX Plus (1985+), INSPEC (1990+), METADEX (1990+) and World Research Database. The Advisory Committee for the Library made a conscious decision to discontinue the print version of indexing and abstracting services in favor of their CD-ROM counterparts, if available.

The search for a suitable CD-ROM networking system was started in 1994 with the receipt of special grants to the library from the Ministry of Human Resource Development (MHRD) for developing CD-ROM search services. One requirement was that the selected CD-ROM networking system could be hooked to the then existing 10-base-T Ethernet-based campus LAN. A Web-based CD networking system was finally procured from Meridian Data, Inc., (USA) after a series of technical presentation and negotiations. This system enabled campus-wide access of the CD-ROM databases to which the library subscribed. These databases are mostly bibliographic.

Silver Platter's Electronic Reference Library (ERL). The CD-ROM networking solution procured in 1998 had several limitations that included slow access, repeated failure of CD-ROM drives, a requirement to configure each client and to download the CD sharing application onto each client and a limitation in terms of the number of databases that could be made available online. The Library decided to replace this system with one that allowed the contents of a CD-ROM disc to be transferred onto the hard disc of a server. As a solution, IIT Delhi has recently adopted Silver Platter's ERL technology. Once ERL server is implemented fully, the CD-ROM networking solution mentioned above would be used only for databases that are ERL non-compliant.

Web-Based Access to the Materials Science Collection. The IIT Delhi Library has Web-based access to a group of databases called the Materials Science Collection (including Metadex) through M/s Cambridge Scientific Abstracts. The Materials Science Collection is made available against consortia subscription where National Aerospace Laboratories (NAL) is acting as the leader of the consortium, and M/s Informatics India is executing the orders on its behalf. Each of the consortium members has gained substantially both in terms of savings in the subscription amount and in accessibility of information in terms of number of databases made accessible under Materials Science Collection. The Materials Science Collection is accessible at http://www.csa.com/. All IP addresses used by the Institute are enabled for access to the databases under this subscription.

Web-based Access to the Databases Developed In-House on Micro CDS/ISIS. The Central Library has developed a number of databases in-house using Micro CDS/ISIS package of UNESCO for specialized collections aimed at handling activities that cannot be handled with easily using Libsys. These databases have now been ported to the WWW/ISIS interface to facilitate simultaneous access by users on Internet and Intranet. The databases, accessible at http://www.iitd.ac.in/library/isis/inhousedatabases.html are

  • Database of Book Bank and Text Book Collection. The database contains 6000 records of books available in Text Book and Book Bank collection. The databases, besides allowing searches on the specialized collection, allows reservations and overnight circulation of Text Books. The circulation of books from book bank issued for the entire semester is also handled by this database developed on CDS/ISIS.
  • Database of Ph.D. Theses Submitted to IIT Delhi. The database contains 2700 records of Ph.D. theses submitted to the IIT Delhi. Currently abstracts are being added to each record.
  • Database of Serials on Subscription in IIT Delhi. The database contains around 1600 records of serials including 850 currently received and 750 discontinued since 1990. The database is used for a variety of purposes including ordering of serials and generating reports.
  • Database on Research Articles Published by the Faculty and Researchers of the Institute. The database contains bibliographic records of more than 3500 research articles published by the faculty and researchers of the Institute downloaded from various CD-ROM databases subscribed by the library. These records were ported to the CDS/ISIS database using IsisAscii Import Utility 1.0 of UNESCO. Records from different databases were merged into one using "Reformating FST." Records are still being added to the database on regular basis.

Directory of Online Interactive Courseware in IT

A portal site on "Web-based Online Interactive Courseware in Information Technology" has been launched and is available at http://www.iitd.ac.in/courses/ under a project sponsored by the Ministry of Information Technology. The site has 4000 courseware packages including 375 in the public domain. As a mirror site for the Central Institute of Technology, New Zealand, the site includes around 25 courseware packages published by them. The site was successfully demonstrated at the ELITEX'2000, ELITEX'2001 and Swadeshi Mela held at the IIT Delhi. Registered with major search engines, the site has been visited 5000 times. Targeted to the students, IT professionals and general public, the site provides the following functionalities through its intuitive Web interface:

  • About: Provides a brief introduction to the project site, its objectives and target audiences with links to the Ministry of Information Technology site.
  • Courses: Provides a browsing interface to the courseware listed in the Directory under 11 major categories. Selecting one of the categories provides a tabular list of courses available in the Online Directory with details on the name of course, its duration and developer's name. While the developer's name is a link to the site where the courseware is actually available, the name of course is a link to the database that is hosted at the project server. Clicking at the name of course provides further details on the courseware such as Link to Courseware, Course Developer's Name, Address, Contact Person, Format, Keywords, Duration, Language, Learning Level, Course Code, Hardware and Software requirement for accessing the course.
  • Links: As a resource site for online courseware in information technology, one of the mandates of the project is to provide resources and links to other sites that may be of assistance to users in IT education. The site, therefore, provides links to (1) other online courseware directories, (2) public domain courseware hosted locally, (3) sites for courseware developers and (4) Indian educational sites.
  • Search: The site provides an intuitive and sophisticated interface to search courseware available in the online courseware directory. Each surrogate record in the courseware directory is assigned two to six standardized keywords to facilitate retrieval of records with higher relevance and precision. The search interface provides for the following search options: Keywords; Institutions; Subject Categories; Learning levels; Languages and Programs.
  • Frequently Asked Questions (FAQ)s
  • Feedback: Users are encouraged to visit the website and provide feedback for further improvements. An interface that facilitates posting comments online is available through the project website.
  • Add Courses: The site provides an interface for courseware developers to register new courseware with the directory.
  • Technical Architecture of the Portal Site. The portal site on online courseware consists of these components:
    • - Database of Online Courseware: The database is currently designed in Microsoft Access. Efforts are being made to export the database to a more robust RDBMS like Oracle or MySQL.

      - ODBC Driver: The ODBC (Open Database Connectivity) drivers for most of the important databases are built into the operating system. ODBC + ASP was preferred to CGI + PERL.

      - Browsing and Search Software designed in ASP: The directory provides a user-friendly browsing and search interface, designed using ASP, that displays courses for broad subject categories, deriving the data from the back-end database above.

      - Site Administration and Maintenance: Suitable interfaces have been developed to facilitate site administration, maintenance and update of the database and the website. A Web-based interface was developed and is being used for data entry of surrogate records from multiple locations. Administrative interfaces are available to edit records. An interface has been developed to generate administrative reports and statistics in various formats.

Access Infrastructure for Digital Collections at the IIT Delhi

An effective and efficient access mechanism that allows a user to browse, search and navigate digital resources becomes necessary as the electronic resources of a collection grow in number and complexity. The access infrastructure for digital resources at IIT Delhi thus consists of the following components as reviewed above: the Libsys OPAC/WebPAC, the websites for special collections, such as those developed for the Directory of Online Interactive Courseware in Information Technology and the scanned journals, linkages between bibliographic citations and full-text of journal articles, online access to many journals, and the Central Library's home page subject gateway.


The Central Library, IIT Delhi has intensified its computerization and Web-based activities and services with availability of faster Internet connections and willingness of authorities to provide additional funds for computerization of the library and for developing digital resources. The development has attracted appreciation and compliments from the users. The users are actively helping develop the subject portal. Several other initiatives are underway to further intensify the build-up of digital resources at IIT Delhi. Tools, techniques and protocols are now available for this purpose. The libraries need to identify collections that need to be digitized. It is important to join hands with sister organizations to begin collaborative digitization programs.

