digital bibliography & library project

  • solidarity - (ua) - (ru)
  • news - (ua) - (ru)
  • donate - donate - donate

for scientists:

  • ERA4Ukraine
  • Assistance in Germany
  • Ukrainian Global University
  • #ScienceForUkraine

search dblp

default search action

  • combined dblp search
  • author search
  • venue search
  • publication search


Welcome to dblp

Search dblp, full-text search, please enter a search query.

digital bibliography & library project

  • How to use the dblp search?
  • Which technology does dblp use for searching the website?
  • case-insensitive prefix search: default e.g., sig matches "SIGIR" as well as "signal"
  • exact word search: append dollar sign ($) to word e.g., graph$ matches "graph", but not "graphics"
  • boolean and: separate words by space e.g., codd model
  • boolean or: connect words by pipe symbol (|) e.g., graph|network

Update May 7, 2017: Please note that we had to disable the phrase search operator (.) and the boolean not operator (-) due to technical problems. For the time being, phrase search queries will yield regular prefix search result, and search terms preceded by a minus will be interpreted as regular (positive) search terms.

Author search results

Venue search results, refine list.

digital bibliography & library project

refine by author

  • temporarily not available

refine by venue

refine by type

refine by year

Publication search results

found 7,196,520 matches

skipping 7,196,520 more matches

failed to load more results, please try again later

digital bibliography & library project

  • by publisher

digital bibliography & library project

  • books & theses
  • reference works
  • edited collections

read as PDF

(read full post)

(updated 2023-06-28) A few days ago, we discussed the new dataset publications in dblp. As a preparation for more and more detailed datasets we slightly modify the DTD that defines the structure of our XML data export. A quick reminder: you can download the dblp dataset as a single XML […]

Datasets and other research artifacts are a major topic in the scientific community in the recent years. Many ongoing projects focus on improving the standardization, publication and citation of these artifacts. Currently, the dblp team is involved in three of them: NFDI4DataScience, NFDIxCS, and Unknown Data. As part of these […]

On November 4, 2022, the Joint Science Conference (GWK) selected Schloss Dagstuhl – Leibniz Center for Informatics and the consortium NFDIxCS for federal and state funding within the German National Research Data Infrastructure (NFDI). The consortium will be funded in the double-digit millions of Euros and over a duration of five […]

In the six months since the release of the dblp RDF dump and its persistent snapshot releases, the RDF dump has been downloaded a total of about a thousand times. We are pleased to see that the community is interested in using our semantic data in their research and beyond. […]

more blog posts

The dblp computer science bibliography provides open bibliographic information on major computer science journals and proceedings. Originally created at the University of Trier in 1993, dblp is now operated and further developed by Schloss Dagstuhl . For more information check out our F.A.Q.

dblp statistics

  • # of publications : 7,196,520
  • # of authors : 3,488,962
  • # of conferences : 6,602
  • # of journals : 1,871

publications by year

more statistics

digital bibliography & library project

Related resources

  • ACM Digital Library
  • IEEE Xplore | CSDL
  • Semantic Scholar

more external links

Social media links


manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default . You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.

Unpaywalled article links

load links from

Privacy notice: By enabling the option above, your browser will contact the API of to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy .

Archived links via Wayback Machine

load content from

Privacy notice: By enabling the option above, your browser will contact the API of to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy .

Reference lists

load references from and

Privacy notice: By enabling the option above, your browser will contact the APIs of , , and to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy , as well as the AI2 Privacy Policy covering Semantic Scholar.

Citation data

load citations from

Privacy notice: By enabling the option above, your browser will contact the API of and to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.

OpenAlex data

load data from

Privacy notice: By enabling the option above, your browser will contact the API of to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex .

retrieved on 2024-04-17 15:14 CEST from data curated by the dblp team

cc zero

see also: Terms of Use | Privacy Policy | Imprint

dblp was originally created in 1993 at:

University of Trier

since 2018, dblp has been operated and maintained by:

Schloss Dagstuhl - Leibniz Center for Informatics

the dblp computer science bibliography is funded and supported by:


Core: Leadership, Infrastructure, Futures

ALA User Menu

  • Core Connect
  • Contact Core
  • Support Our Work

Breadcrumb navigation

  • Contact Congress

Fundamentals of Digital Library Projects

  • Share This Page

Description:  This 6-week online course introduces students to the breadth of considerations, standards, and skills needed to successfully launch and manage a digital library program. The course will provide opportunity for hands-on activities to develop critical thinking and decision-making skills within the context of a digital library.

2024 Sessions

Click on a date range to register for that session.

  • Session 1: January 22 – March 1
  • Session 2: March 11 - April 19
  • Session 3: April 29 - June 7
  • Session 4: July 22 - August 30
  • Session 5: September 23 - November 1

Format:  Students receive login instructions one week prior to the course start date.  Students have 24/7 access to the course site for the six-week period, and aside from assignment and quiz deadlines, the course may be completed at their own pace.  Instructors provide guidance and feedback as students work their way through the course material.  Weekly, instructor-moderated chat sessions are the only live course events that students are asked to attend.

Weekly Chat Schedule:   The following times are tentative and may change according to instructor availability.

  • Week 1: Thursday at 1:00 pm CT
  • Week 2: Thursday at 1:00 pm CT
  • Week 3: Thursday at 1:00 pm CT
  • Week 4: Thursday at 1:00 pm CT
  • Week 5: Thursday at 1:00 pm CT
  • Week 6: Thursday at 1:00 pm CT

Learning Outcomes

Participants of this course will: 

  • Gain an understanding of the types of expertise and skills needed to successfully manage a digital library: such as digitization and types of digital objects, metadata, indexing/search/retrieval/, storage/architecture, user interface & interaction, preservation;
  • Learn about the common platforms used by libraries to manage digital objects and make them discoverable;
  • Discover the role of planning, documentation, and assessment.

Who Should Attend

Library professionals interested in developing a broad understanding of all aspects of digital library projects, from the solo practitioner responsible for all aspects of a digital library to a member of a team implementing or managing a digital library.


  • Sally Benny  is the Digital Archivist at Tufts University in Medford, MA.
  • Gretchen Gueguen is an archivist and librarian specializing in digital libraries and technology and an adjunct professor for the Clarion University.
  • Jane Monson is the Digital Collections Librarian at the Oregon Historical Society.
  • Jennifer Roper  is the Director, Digital Strategies and Scholarly Communications at the University of Virginia Library in Charlottesville, VA.
  • Melde Rutledge  is Digital Collections Librarian at Wake Forest University - Z Smith Reynolds Library


  • $196.71 Core members;
  • $224.10 ALA members;
  • $249 nonmembers.

How to Register

Register here

Register by mail  using the print registration form. Tip: If you’re unable to open this “register by mail” link, right-click the link and save the form to your computer.

For registration related questions, call 1-800-545-2433 to speak to our customer service representatives.

Registration Deadline

Registration for each course is limited to 25 people. For courses that are not sold out, online and fax registration ends at 12 noon CT on the Monday before the course begins. Mailed registration forms must be postmarked by two Mondays prior to the course start date.

Core Code of Conduct

Please review the Statement of Conduct before registering.

Tech Requirements

A Moodle-developed site is composed of self-paced modules with facilitated interaction led by the instructors. There are predetermined start and end dates and a suggested pace which include interaction with the instructors and your classmates. Students regularly use the forum and chat room functions to facilitate their class participation. Section quizzes are offered and feedback given, however, there is no final class grade.

The course website will be open for 1 week prior to the start date for students to have access to Moodle instructions and set their browser correctly. It will remain open 1 week after the end date for students to complete any sections and submit the course evaluation survey.

Contact Hours - 24 hours

Core defines contact hours in line with the IACET standards on Continuing Education Units .

Certificates of completion are sent upon successful completion (passing score of 70% or higher) of the course.

For questions about registration, contact  ALA Registration  by calling 1-800-545-2433 or email  [email protected] .

For all other questions or comments related to online courses, please contact [email protected] .

Purdue University

  • Ask a Librarian

Computer Science

  • E-Resources
  • Foundations and Trends
  • Bibliographies and Search Engines

Subject Guide

Profile Photo

Scholarly Search Engines

  • Google Scholar Google Scholar works just like Google except it indexes scholarly materials such as conference proceedings and journal articles. Very broad coverage.
  • DBLP The Digital Bibliography and Library Project (DBLP) indexes over 1.3 million conference proceedings, articles, series, and books in Computer Science.
  • CiteSeerX CiteSeerX indexes 1.5 million documents and over 30 million citations with a focus on computer and information science.
  • Odysci Academic Search for Computer Science and Electronics Search engine that covers scholarship in CS and ECE such as ACM, IEEE, DBLP, CiteSeerX, and others with some unique functionality for limiting searches and controlling results (e.g., by university, only papers that received awards, etc.) Beta version.


  • The Collection of Computer Science Bibliographies Collection contains more than 3 millions of references: mostly to journal articles, conference papers and technical reports. Clustered in 1,500 bibliographies.
  • Human-Computer Interaction Bibliography Over 63,000 entries on HCI hosted by ACM SIGCHI.
  • << Previous: Foundations and Trends
  • Next: Reference >>
  • Last Updated: Mar 13, 2024 9:14 AM
  • URL:


Digital Library Programs for Libraries and Archives: Developing, Managing, and Sustaining Unique Digital Collections

Find on LibraryThing.

Primary tabs

You don't need to be an ALA Member to purchase from the ALA Store,  but you'll be asked to create an online account/profile during the checkout to proceed. This Web Account is for both Members and non-Members. 

If you are Tax-Exempt , please verify that your account is currently set up as exempt before placing your order, as our new fulfillment center will need current documentation.  Learn how to verify here.

  • Description
  • Table of Contents
  • About the author

This book is available in e-book format for libraries and individuals through aggregators and other distributors —ask your current vendor or contact us for more information. Examination copies are available for instructors who are interested in adopting this title for course use.

Planning and managing a self-contained digitization project is one thing, but how do you transition to a digital library program? Or better yet, how do you start a program from scratch? In this book Purcell, a well-respected expert in both archives and digital libraries, combines theory and best practices with practical application, showing how to approach digital projects as an ongoing effort. He not only guides librarians and archivists in transitioning from project-level initiatives to a sustainable program but also provides clear step-by-step instructions for building a digital library program from the bottom up, even for organizations with limited staff. Approachable and easy to follow, this book

  • traces the historical growth of digital libraries and the importance of those digital foundations;
  • summarizes current technological challenges that affect the planning of digital libraries, and how librarians and archivists are adapting to the changing information landscape;
  • uses examples to lay out the core priorities of leading successful digital programs;
  • covers the essentials of getting started, from vision and mission building to identifying resources and partnerships;
  • emphasizes the importance of digitizing original unique materials found in library and archives collections, and suggests approaches to the selection process;
  • addresses metadata and key technical standards;
  • discusses management and daily operations, including assessment, enhancement, sustainability, and long-term preservation planning;
  • provides guidance for marketing, promotion, and outreach, plus how to take into account such considerations as access points, intended audiences, and educational and instructional components;  and
  • includes exercises designed to help readers define their own digital projects and create a real-world digital program plan.

Equally valuable for LIS students just learning about the digital landscape, information professionals taking their first steps to create digital content, and organizations who already have well-established digital credentials, Purcell's book outlines methods applicable and scalable to many different types and sizes of libraries and archives.

List of Figures Preface Acknowledgments Introduction

Part I    The Theory and Reality of Digital Libraries

Chapter 1    Growth of Digital Libraries

  • Brief History of Digital Libraries
  • Perspectives from Related Professions
  • Challenges of Technology
  • Original and Unique Digital Content

Chapter 2    Context of Today's Libraries and Digital Libraries

  • Changing Roles for Libraries
  • Fewer Resources, Greater Expectations
  • Library Spaces
  • Assessing the Changes
  • Scholarly Communication and Open Access
  • Management, Storage, and Curation of Data
  • Digital Collections

Chapter 3    Digitization and Digital Libraries

  • Stages of Digitization
  • Why Digitize
  • What to Digitize
  • Whom to Include
  • When and Where to Digitize
  • How to Digitize

Part II    Building Digital Library Programs: A Step-by-Step Process

Chapter 4    Vision and Mission Building

  • The Mission Statement
  • Vision Building
  • Sustaining and Adapting the Vision

Chapter 5    Identifying Resources and Partnerships

  • Who You Are
  • Whom You Know and Want to Know
  • What You Have and What You Need
  • Grants and External Funding Opportunities

Chapter 6    Evaluating, Selecting, and Building Digital Collections

  • Evaluating Digital Collections
  • The Power of Primary Sources
  • Types of Unique Collections for Selection
  • Selection of Materials
  • Copyright and Other Rights

Chapter 7    Technical Standards

  • Technical Workflows and Documentation
  • The Value of Metadata
  • Technical Elements of Digitization

Chapter 8    Management of Digital Projects

  • Librarians as Managers
  • Managing Budgets
  • Outsourcing and Vendors
  • Planning the Work

Chapter 9    Outreach and Instruction

  • The Principle and Reality of Access
  • Reaching Audiences
  • Educational Components

Chapter 10    Promotion, Assessment, and Sustainability

  • Generating Interest
  • Assessing Effectiveness
  • Enhancing and Sustaining the Effort

Chapter 11    Planning Digital Library Programs

  • Transition from Project to Program
  • Strategies for Building Digital Library Programs

Part III    Digital Library Planning Exercises

Exercise 1    Vision Building Exercise 2    Resource List Exercise 3    Collections List Exercise 4    Technical Strengths Exercise 5    Plan of Work Exercise 6    Education Plan Exercise 7    Marketing Plan Exercise 8    Project Plan Bibliography Index

Aaron D. Purcell

Aaron D. Purcell is professor and director of special collections at Virginia Tech. He earned his Ph.D. in history from the University of Tennessee, his master's of library science from the University of Maryland, College Park, and his master's degree in history from the University of Louisville. Purcell has also worked at the National Archives and Records Administration, the National Library of Medicine, and the University of Tennessee. Purcell is an active scholar, writing in the fields of history and archives. The University of Tennessee Press published his first academic book, White Collar Radicals: TVA's Knoxville Fifteen, the New Deal, and the McCarthy Era , in 2009. Purcell is completing an edited book on New Deal and Great Depression historiography for Kent State University Press, he is the editor of The Journal of East Tennessee History , and he is finishing a book on Arthur E. Morgan, the first chairman of the Tennessee Valley Authority.He has written articles on archival topics for the American Archivist , Archival Outlook , IMJ , and the Journal of Archival Organization . Purcell is an active member of the Society of American Archivists, the Mid-Atlantic Regional Archives Conference, and the Southern Historical Association.

"Purcell's knowledge of digital library programs and experience in the field comes through in the depth of information provided and in the organization of the book. The large practical component makes this book especially valuable for new project managers." — VOYA ”Purcell approaches the digitization of collections as an ongoing effort, and provides a framework for librarians to properly do so … A tremendously helpful resource for those individuals who are digitizing their collection for the first time, or for those who are adding to their current digital collection." — ARBA ”An experience-based, detailed overview of the digitization process from soup to nuts, for creation of digital library projects and conversion to sustainable digital library programs. It will read familiar to experienced professionals and provide a workable blueprint for neophytes." — Catholic Library World "Thoroughly 'reader friendly' in tone, commentary, organization and presentation." — Library Bookwatch

Loading metrics

Open Access

Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web

* E-mail: [email protected]

Affiliations School of Chemistry, The University of Manchester, Manchester, United Kingdom, The Manchester Interdisciplinary Biocentre, The University of Manchester, Manchester, United Kingdom

Affiliations The Manchester Interdisciplinary Biocentre, The University of Manchester, Manchester, United Kingdom, School of Computer Science, The University of Manchester, Manchester, United Kingdom

  • Duncan Hull, 
  • Steve R. Pettifer, 
  • Douglas B. Kell


Published: October 31, 2008

  • Reader Comments

Table 1

Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.

Citation: Hull D, Pettifer SR, Kell DB (2008) Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Comput Biol 4(10): e1000204.

Editor: Johanna McEntyre, National Center for Biotechnology Information (NCBI), United States of America

Copyright: © 2008 Hull et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: Biotechnology and Biological Sciences Research Council (BBSRC): grant code BB/E004431/1.

Competing interests: The authors have declared that no competing interests exist.

“The apathy of the academic, scientific, and information communities coupled with the indifference or even active hostility…of many publishers renders literature-data-driven science still inaccessible.” – Peter Murray-Rust [1]


The term digital library [2] – [4] denotes a collection of literature and its attendant metadata (data about data) stored electronically. According to Herbert Samuel, a library is “thought in cold storage” [5] , and unfortunately digital libraries can be cold, isolated, impersonal places that are inaccessible to both machines and people. Many scientists now organize their knowledge of the literature using some kind of computerized reference management system (BibTeX, EndNote, Reference Manager, RefWorks, etc.), and store their own digital libraries of full publications as PDF files. However, getting hold of both the data (the actual publication) and the metadata for any given publication can be problematic because they are often frozen in the isolated and icy deposits of scientific publishing. Because each library and publisher has different ways of identifying and describing their metadata, using digital libraries (either manually or automatically) is much more complicated than it needs to be [6] , and with papers in the life sciences alone (at Medline) being published at the rate of approximately two per minute [7] , only computerized analyses can hope to be reasonably comprehensive. What then, are these digital libraries, and what services do they provide?

As far as computational Biologists are concerned, and for the purposes of this Review, we shall define a digital library more broadly as a database of scientific and technical articles, conference publications, and books that can be searched and browsed using a Web browser. As of early 2008, there is a wide range of these digital libraries, but no single source covering all information (in part because of the cost, given that there are some 25,000 peer-reviewed journals publishing some 2.5 million articles per year [8] ). Each library is isolated, balkanized, and has only partial coverage of the entire literature. This contrasts with the historically pre-eminent library of Alexandria whose great strength was that it brought together all the useful literature then available to a single location. Like Alexandria, most digital libraries are currently read-only , allowing users to search and browse information, but not to write new information nor add personal knowledge. Other digital libraries are in danger of becoming write-only “data-tombs” [9] , where data are deposited but will probably never be accessed again. Indeed, the literature itself is now so vast that most scientists choose to access only a fraction of it [10] , at potentially considerable intellectual loss [11] (see also [12] ).

Digital libraries provide electronic access to documents, sometimes just to their abstracts and sometimes to the full text of the publication. Presently, the number of abstracts considerably exceeds the number of full-text papers, but with the emergence of Open Access initiatives (e.g., [13] – [16] ), Institutional Repositories (e.g., [17] – [20] ), and the like, this is set to change considerably. This is very important, as much additional information exists in full papers that is not seen in abstracts, and, in addition, full papers that are available electronically are likely to be much more widely read and cited [21] – [23] . The format of the full text of such documents can vary significantly among publishers. Such formats can be described using a Document Type Definition (DTD), e.g., that provided by the (U.S.) National Library of Medicine [16] , [24] , and, since not all publishers (especially those of non-biomedical material) conform to the NLM DTD, this can considerably affect the types of analysis that can be done on such documents.

In a similar vein, there is not yet a recognized (universal) standard for describing the metadata (see Table 1 ), although some (discussed below) such as the Dublin Core are becoming widely used.


  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Since all of these libraries are available on the Web, increasing numbers of tools for managing digital libraries are also Web-based. They rely on Uniform Resource Identifiers (URIs [25] or “links”) to identify, name, and locate resources such as publications and their authors. By using simple URIs, standard Web browser technology, and the emerging methods of the next generation Web or “Web 2.0” [26] , it has become possible for digital libraries to become not just read-only or write-only , but both read–write . These applications allow users to add personal metadata, notes, and keywords (simple labels or “tags” [27] , [28] ) to help manage, navigate, and share their personal collections. This small but significant change is helping to improve digital libraries in three main ways: personalization, socialization, and integration.

The focus of this Review is largely about searching and organizing literature data together with their metadata. For reasons of space, we do not consider in any detail issues surrounding Open Access (e.g., [13] , [29] ), nor structured digital abstracts [30] , [31] (note the recent initiative in FEBS Letters [32] – [34] and the RSC's Project Prospect for whole papers [35] – [38] ). Neither do we discuss the many sophisticated tools for text mining and natural language processing (e.g., [39] – [42] ), for joining disparate concepts [43] , [44] , for literature-based discovery (e.g., [45] – [49] , and for studies of bibliometrics [50] , [51] , literature dynamics [52] , knowledge domains [53] , detecting republication [54] , and so on, all of which become considerably easier to implement only when all the necessary data are digitized and linked together with their relevant metadata.

This Review is structured as follows (see also Figure 1 ): the section Digital Libraries, DOIs, and URIs starts by looking at the range of information in digital libraries, and how resources are identified using URIs on the Web. In the section Problems with Digital Libraries, we consider a fairly standard workflow that serves to highlight some problems with using these libraries. The following section, Some Tools for Defrosting Libraries, examines what Web-based tools are currently available to defrost the digital library and how they are making libraries more personal, sociable, and integrated places. Finally, the section A Future with Warmer Libraries looks at the obstacles to future progress, recommends some best practices for digital publishing, and draws conclusions.


Digital Libraries, DOIs, and URIs

Because computational biology is an interdisciplinary science, it draws on many different sources of data, information, and knowledge. Consequently, there exists a range of digital libraries on the Web identified by URIs [25] and/or DOIs [55] , [56] that a typical user requires, each with its own speciality, classification, and culture, from computer science through to biomedical science. DOIs are a specific type of URI and similar to the International Standard Book Numbers (ISBN), allowing persistent and unique identification of a publication (or indeed part of a publication), independently of its location. The range of libraries currently available on the Web is described below, starting with those that focus on specific disciplines (such as ACM, IEEE, and PubMed) through to libraries covering a broader range of scientific disciplines, such as ISI WOK and Google Scholar. For each library, we describe the size, coverage, and style of metadata used (summarized in Table 1 and Figure 2 ). Where available, DOIs can be used to retrieve metadata for a given publication using a DOI resolver such as CrossRef [57] , a linking system developed by a consortium of publishers. We illustrate with specific examples how URIs and DOIs are used by each library to identify, name, and locate resources, particularly individual publications and their author(s). We often take URIs for granted, but these humble strings are fundamental to the way the Web works [58] and how libraries can exploit it, so they are a crucial part of the cyberinfrastructure [59] required for e-science on the Web. It is easy to underestimate the value of simple URIs, which can be cited in publications, bookmarked, cut-and-pasted, e-mailed, posted in blogs, added to Web pages and wikis [60] – [62] , and indexed by search engines. Simple URIs are a key part of the current Web (version 1.0) and one of the reasons for the Web's phenomenal success since appearing in 1990 [63] . As we shall demonstrate with examples, each digital library has its own style of URI for being linked to (inbound links) and alternative styles of URI for linking out (outbound links) to publisher sites. Some of these links are simple, others more complex, and this has important consequences for both human and programmatic access to the resources these URIs identify.


Of all the libraries described, Google Scholar probably has the widest coverage. However, it is currently not clear exactly how much information Google indexes, what the criteria are for inclusion in the index, and whether it subsumes other digital libraries in the way shown in the figure. Note: the size of sets (circles) in this diagram is NOT proportional to their size, and DBLP, Scopus, and arXiv are shown as a single set for clarity rather than correctness.

The ACM Digital Library.

The Association for Computing Machinery (ACM), probably best known for the Turing award, makes their digital library available on the Web [64] . The library currently contains more than 54,000 articles from 30 journals and 900 conference proceedings dating back to 1947, focusing primarily on computer science. Like many other large publishers, the ACM uses Digital Object Identifiers (DOI) to identify publications. So, for example, a publication on scientific workflows [65] from the 16th International World Wide Web Conference (WWW2007) is identified by the Digital Object Identifier DOI:10.1145/1242572.1242705. The last part of the DOI can be used in ACM-style URIs as follows: . Metadata for publications in the ACM digital library are available from URIs in the style above as EndNote [66] and BibTeX formats; the latter is used in the LaTeX document preparation system [67] .

IEEE Xplore.

The Institute of Electrical and Electronics Engineers (IEEE) provides access to its technical literature in electrical engineering, computer science, and electronics, through a service called Xplore [68] . The exact size of the Xplore archive is not currently described anywhere on the IEEE Web site. Xplore identifies publications using Digital Object Identifiers that are supplemented with a proprietary IEEE scheme for identifying publications. So, for example, a publication on text-mining [69] in IEEE/ACM Transactions on Computational Biology and Bioinformatics is identified by both the Digital Object Identifier DOI:10.1109/TBME.2007.906494 and an internal IEEE identifier 1416852. These identifiers can be used in URIs as follows: and . Metadata for publications in IEEE Xplore are available from URIs in the style above in EndNote, Procite, and Refman. Alternatively, publication metadata are available by using a DOI resolver such as CrossRef. Currently, the IEEE offers limited facilities for its registered members to build a personal library and to share this with other users.

The Digital Bibliography and Library Project (DBLP) [70] , [71] , created by Michael Ley, provides an index of peer-reviewed publications in computer science. Recently, DBLP has started to index many popular journals with significant computational biology content such as Bioinformatics and Nucleic Acids Research , and currently indexes about 900,000 articles, with links out to full text, labeled EE for electronic edition. Thus an article by Russ Altman on building biological databases [72] is identified by the URI . Metadata for publications in DBLP are available in BibTeX format only. Unlike some libraries that we describe later, DBLP is built largely by hand [71] , rather than by bots and crawlers indexing Web pages without human intervention. One of the consequences of this is that authors are disambiguated more accurately [73] , e.g., where an author's middle initial(s) is not used or alternative first names appear in metadata. This kind of author disambiguation is particularly relevant to the naming conventions in some countries [74] . and PubMed Central.

PubMed [75] is a service provided by the National Center for Biotechnology Information (NCBI). The PubMed database includes more than 17 million citations from more than 19,600 life science journals [76] , [77] . The primary mechanism for identifying publications in PubMed is the PubMed identifier (PMID); so, for example, an article describing NCBI resources [77] is identified by the URI . Publication metadata for articles in PubMed are available in a wide variety of formats including MEDLINE flat-file format and XML, conforming to the NCBI Document Type Definition [77] , a template for creating XML documents. PubMed can be personalized using the MyNCBI application, described later in the section Some Tools for Defrosting Libraries. PubMed Central [78] , a subset of PubMed, provides free full-text of articles, but has lower coverage as shown in Figure 2 . Related sites are also emerging in other countries, such as that in the UK [79] . A URI identifying the NCBI resources article [77] in the US PubMed Central is: . Metadata are available from URIs in PubMed Central as either XML, Dublin Core, and/or RDF [80] by using the Open Archives Initiative (OAI) [81] Protocol for Metadata Harvesting (PMH), a standard protocol for harvesting metadata. For example, embedded in the page identified by the URI above, there are Dublin Core terms such as DC.Contributor, DC.Date, and DC.title, which are standard predefined terms for describing publication metadata. In addition to such standard metadata, PubMed papers are tagged or indexed according to their MeSH (Medical Subject Heading) terms, curated manually.

ISI Web of Knowledge (WoK).

ISI WoK [82] is The Institute for Scientific Information's Web of Knowledge, a service provided by The Thomson Reuters Corporation, covering a broad range of scientific disciplines (not just computer science or biomedical science). The size of the library is somewhere in the region of 15,000,000 “objects” according to the footer displayed in pages of search results. Unfortunately, ISI WoK does not currently provide short, simple links to its content; so, for example, the URI for an NCBI publication [77] in ISI WoK is hidden behind a script interface called cgi [83] ; this is usually displayed in the address bar of a Web browser, regardless of which publication is being viewed, as in this example: It is possible to extract individual URIs for publications, but regrettably they are usually too long and complicated and contain “session identifiers,” which make them expire after a set period of time (usually 24 hours). Temporary and long URIs of this kind cannot be easily used by humans, and prevent inbound links to the content. ISI WoK also provides various citation tracking and analytical features such as Journal Citation Reports, which measures the impact factor [84] , [85] of individual journals [86] . Metadata for publications in ISI WoK are provided in BibTeX, Procite, Refman, and EndNote. WoK provides citation tracking features, particularly calculating the H-index [87] for a given author, as well as “citation alerts” that can automatically send e-mail when a given paper is newly cited. .

Scopus [88] is a service provided by Reed Elsevier and seems to be the Digital Library with individually the most comprehensive coverage, claiming (June 2008) >33,000,000 records (leaving aside Web pages). As far as linking is concerned, Scopus allows links <1?show=[to]?>to its content using OpenURL [89] , which provides a standard syntax for creating URIs. For example, the URI identifes a publication [90] from the Semantic Web conference, with the ISSN, volume, and page as part of the URI. The Scopus OpenURL link shown above is the simplest kind that can exist; many get much more complicated as more information is included in the URI, doubling the length of the one shown. The longer and more complicated URIs become, the less likely they are to be useful for humans. Scopus also links out to content using OpenURL and provides citation tracking. Metadata can be exported in RefWorks [91] , RIS format (EndNote, ProCite, RefMan), and plain text, etc.

Citeseer [92] is a service currently funded by Microsoft Research, NASA, and the National Science Foundation (NSF), covering a broad range of scientific disciplines and more than 760,000 documents, according to Citeseer. The URI identifies a paper about UniProt [93] . Publication metadata are available from Citeseer in BibTeX format, and citation tracking is performed annually in the Most Cited Authors feature [94] .

Google Scholar.

Google Scholar [95] (e.g., [96] – [99] ) is a service provided by Google (see also [100] ), which indexes traditional scientific literature, as well as preprints and “grey” self-archived publications [19] from selected institutional Web sites. A typical page from Google Scholar is shown in Figure 3 . The size and coverage of Google Scholar does not seem to have been published, and the exact method for finding and ranking citations has not yet been made completely public [101] .


Google Scholar links out to external content using a number of methods including OpenURL [89] , shown here by the “Find it via JRUL” (JRUL is a local library) links. Unlike, e.g., WoK, it is relatively easy to create inbound links to individual authors and publications in Google Scholar; see text for details.

In contrast to some other digital libraries, Google Scholar provides simple URIs that link to different resources. For example, identifies citations of a publication [102] by Tom Oinn.

At the time of writing, Google Scholar does not currently offer any specific facilities for creating a personal collection of documents or sharing these collections with other users, other than using simple links such as the one above. Publication metadata can be obtained from Google Scholar where OpenURL links are found in its search results; otherwise, metadata can be obtained by clicking through the links to their original sources. .

arXiv [103] provides open access to more than 44,000 e-prints in physics, mathematics, computer science, quantitative biology, and statistics, and was created by Paul Ginsparg [104] . It is a leading example of what can be done, although it is presently little used by biologists. The arXiv has a different publishing model from that of the other digital libraries described in this paper, because publications are peer-reviewed after publication in the arXiv, rather than before publication. (A related but non-identical strategy is pursued with PLoS ONE, where papers are peer reviewed before being made accessible , but if they do not pass peer review they do not appear.) The arXiv is owned, operated, and funded by Cornell University and is also partially funded by the National Science Foundation. arXiv uses simple URIs to identify publications that incorporate the arXiv identifier. Because arXiv acts as a preprint server, some of its content eventually becomes available elsewhere in more traditional peer-reviewed journals. For example, an article on social networks published in Science [105] is also available from . Metadata for publications in arXiv are available in BibTeX format, with various citation-tracking features provided by the experimental citebase project [106] , [107] . This alternative approach to manual citation counts works by calculating the number of times an individual paper has been downloaded, as with the Highly-accessed feature on BioMedCentral journals.

…and the rest.

In a short review such as this one, it is not possible to describe every single library a computational biologist might use, because there are so many. Also, it is surprisingly hard to define exactly what a specific digital library is because the distinction between publishers, libraries, and professional societies is not always a clean one. Thus, we have not described the digital libraries provided by Highwire [108] , WorldCat [109] , JSTOR, the British Library, the Association for the Advancement of Artificial Intelligence (AAAI), the Physical Review Online Archive (PROLA), and the American Chemical Society (ACS) (e.g., SciFinder). Neither do we discuss commercial publisher-only sites such as SpringerLink, Oxford University Press, ScienceDirect, Wiley-Blackwell, Academic Press, and so on here, since most of this content is accessible, typically via abstracts, via the other libraries and databases described in the section on digital libraries with links out to the publishers' sites.

Summary of libraries.

Although they differ in size and coverage, all of these digital libraries provide similar basic facilities for searching and browsing publications. These features are well-documented elsewhere, so we will not describe them in detail here. With the exception of arXiv and PubMed Central, which provide full free access to entire articles, all other libraries described here provide free access to metadata (author, year, title, journal, abstract, etc.) and link to data (the full-text of a given article), which the user may or may not be licensed to view. The approximate relationship between the different libraries, as far as coverage is concerned, is shown in Figure 2 .

Where these libraries differ is in the subscription, personalization, and citation-tracking features. So, for example, ISI WoK is a subscription-only service, not freely accessible, but which offers more extensive citation tracking features (such as ranking papers by citation counts, the impact factor [85] , [86] , and h-index [87] ) than other libraries. Other services, such as the NCBI, are available freely, and provide additional features using custom tools to freely registered users. Other services such as Google Scholar and Citeseer are free, but currently offer no personalized view. Both ISI and Google Scholar provide services for counting and tracking citations of a given paper, which are not provided by most other libraries.

These libraries also differ considerably in the nature and power of their indexing by which users can search them on specific topics of metadata. Most permit Boolean searches on the basis of authors, keywords, words in a title or abstract, and so on, though none does this in real-time, and comparatively few allow sophisticated combinations.

All of this reflects the fact that these libraries and the means of searching them evolved independently and largely in isolation. Consequently, it is generally difficult for a user to build their own personalized view of all the digital libraries combined into one place, although tools described in the section Some Tools for Defrosting Libraries are now beginning to make this more feasible. Before we describe these further, we shall look at some of the current issues with using these digital libraries, as it is exactly these kinds of problems that have motivated the development of new tools. These tools, and the digital libraries they are built on, have to manage two inescapable facts: 1) redundancy: any given publication or author can be identified by many different URIs; 2) representing metadata: there are many different ways of identifying and describing metadata (and see Table 1 ). We describe some of the consequences of this in the next section.

Problems Using Digital Libraries

The digital libraries outlined in the previous section all differ in their coverage, access, and features, but the abstract process of using them is more standard. Figure 4 shows an abstract workflow for using any given digital library. We do not propose this as a universal model, which every user will follow, but provide it to illustrate some of the problems with managing data and metadata in the libraries described in the previous section on digital libraries.


Tasks represented by white nodes are normally performed exclusively by humans, while tasks shown in blue nodes can be performed wholly or partly by machines of some kind. The main problematic tasks that make digital libraries difficult to use for both machines and humans are “GET” (publication) and “GET METADATA”. These are shown in bold and discussed further in the Identity Crisis section of this paper.

To begin with, a user selects a paper, which will have come proximately from one of four sources: 1) searching some digital library, “SEARCH” in Figure 4 ; 2) browsing some digital library (“BROWSE”); 3) a personal recommendation, word-of-mouth from colleague, etc., (“RECOMMEND”); 4) referred to by reading another paper, and thus cited in its reference list (“READ”). Once a paper of interest is selected, the user: 1) retrieves the abstract and then the paper (i.e., the actual paper itself as a file), “GET” in Figure 4 ; 2) they save the paper, for example by bookmarking it, storing on a hard-drive, printing off, etc., (“SAVE”). Saving often involves getting the metadata, too, (“GET METADATA”). By metadata, we again mean the basic metadata about a publication, such as the author, date, journal, volume, page number, publisher, etc. In practice, this means any information typically found in an EndNote or BibTeX entry; 3) they read the paper, “READ” in Figure 4 ; 4) they may annotate the paper, (“ANNOTATE”); 5) finally, they may cite the paper (“CITE”). Citing requires retrieving the metadata, if these have not been retrieved already.

This abstract workflow is idealized, but highlights some problems with using current digital libraries, for both humans and machines. In particular, see the following list.

  • Identity Crisis. There is no universal method to retrieve a given paper, because there is no single way of identifying publications across all digital libraries on the Web. Although various identification schemes such as the PubMed identifier (PMID), Digital Object Identifier (DOI), ISBN, and many others, exist, there is not yet one identity system to “rule them all.”
  • Get Metadata. Publication metadata often gets “divorced” from the data it is about, and this forces users to manage each independently, a cumbersome and error-prone process. Most PDF files, for example, do not contain embedded metadata that can be easily extracted [110] . Likewise, for publications on the Web there is no universal method to retrieve metadata. For any given publication, it is not possible for a machine or human to retrieve metadata using a standard method. Instead there are many inadequate options to choose from, which add unnecessary complexity to obtaining accurate metadata.
  • Which metadata? There is no single way of representing metadata, and without adherence to common standards (which largely already exist, but in a plurality) there never will be. EndNote (RIS) and BibTeX are common, but again, neither format is used universally across all libraries.

We describe each of these issues more fully in the following sections.

Identity crisis.

We are suffering from an acute identity crisis in the life sciences [111] . Just as sequence databases have trouble managing the multiple identities of sequences [112] , digital libraries also suffer from being unable to identify individual publications and their authors [113] . These are essential pieces of information that make libraries easy to use, and also help to track citations, but in the present implementation they create considerable barriers to users and machines. Any single publication or author is identified by numerous different URIs. An important task for managing these disparate collections involves reconciling and normalizing these different identity schemes, that is, calculating if two different URIs identify the same resource or not. For example, a human can fairly easily determine (by following the links) that each of these URIs identify the same publication, but writing a generic program to automate this for arbitrary URIs is more challenging: ; ; ; and .

Where DOIs exist, they are supposed to be the definitive URI. This kind of automated disambiguation, of publications and authors, is a common requirement for building better digital libraries. Unlike the traditional paper library, machines play a much more important role in managing information. They come in many forms, typically search-engine bots and spiders such as Googlebot [114] , but also screen-scrapers [115] , feed-readers [116] , [117] , workflows [102] , [118] , programs, Web services [90] , [119] – [122] , and ad hoc scripts, as well as semantic Web agents and reasoners [123] . They are obviously of great importance for text-mining [39] – [41] , [124] – [126] , where computer algorithms plus immense computing power can outperform human intelligence on at least some tasks [127] . Publication metadata are essential for machines and humans in many tasks, not just the disambiguation described above. Despite their importance, metadata can be frustratingly difficult to obtain.

Metadata: You can't always GET what you want.

As well as the problem of extracting metadata from PDFs [110] , getting metadata for any given URI which identifies a publication is also problematic. Although the semantic Web has been proposed as a general solution to this [128] – [132] , it is currently a largely unrealised vision of the future [133] , [134] . The Open Archives Initiative mentioned previously provides a solution to this problem, though it is not adopted by all publishers. So, given an arbitrary URI, there are only two guaranteed options for getting any metadata associated with it. Using http [135] , it is possible to for a human (or machine) to do the following.

  • http GET the URI. Getting any URIs described in the previous section Digital Libraries, URIs, and DOIs will usually return the entire HTML representation of the resource. This then has to be scraped or parsed for metadata, which could appear anywhere in the file and in any format. This technique works, but is not particularly robust or scalable because every time the style of a particular Web site changes, the screen-scraper will probably break as well [136] . Some Web sites such as PubMed Central make this easier, by clearly identifying metadata in files, so they can easily be parsed by tools and machines.
  • http HEAD the URI. This returns metadata only, not the whole resource. These metadata will not include the author, journal, title, date, etc., of the publication but basic information such as the MIME type which indicates what the resource is (text, image, video, etc. [137] ), Last-Modified date [135] , and so on.

The lack of an adequate method for retrieving metadata has led to proposals such as the Life Sciences Identifier (LSID) [138] , [139] and BioGUID [140] (Biological Globally Unique IDentifier). These may be useful in the future if they become more widely adopted, but do not change the current state of the digital library. As it stands, it is not possible to perform mundane and seemingly simple tasks such as, “get me all publications that fulfill some criteria and for which I have licensed access as PDF” to save locally, or “get me a specific publication and all those it immediately references”.

Which metadata?

Even if there were a standard way to retrieve metadata for publications, there is still the problem of how to represent and describe them. In addition to EndNote (RIS) and BibTeX, there are also various XML schemas such as the U.S. Library of Congress Metadata Object Description Schema (MODS) format [141] and RDF vocabularies, such as the Dublin Core mentioned earlier. Having all these different metadata standards would not be a problem if they could easily be converted to and from each other, a process known as “round-tripping”. However, some conversions gain or lose information along the way. Lossy and irreversible conversions create dead-ends for metadata, and many of these mappings are non-trivial, e.g., XML to RDF and back again [123] . In addition to basic metadata found in EndNote and BibTeX, there are also more complex metadata such as the inbound and outbound citations, related articles, and “supplementary” information.

The identity crisis, inability to get metadata easily, and proliferation of metadata standards are three of the main reasons that libraries are particularly difficult to use and search as automatically as one would wish. These are challenging problems to overcome, and the tools we describe in the next section tackle these problems in different ways.

Some Tools for Defrosting Libraries

Although libraries can be cold, the tools described in this section could potentially make them much warmer. They do this in two main ways. Personalization allows users to say this is my library, the sources I am interested in, my collection of references, as well as literature I have authored or co-authored. Socialization allows users to share their personal collections and see who else is reading the same publications, including added information such as related papers with the same keyword (or “tag”) and what notes other people have written about a given publication. The ability to share data and metadata in this way is becoming increasingly important as more and more science is done by larger and more distributed teams [142] rather than by individuals. Such social bookmarking is already available on the Web site of publications such as the Proceedings of the National Academy of Sciences ( ) and the journals published by Oxford University Press.

The result of personalization and socialization is integration of a kind that cannot be achieved by machines alone. First, we look at personalization-only style tools, then we examine tools that also allow socialization of the library through sharing. and Mendeley.

Zotero [143] is an extension for the Firefox browser that enables users to manage references directly from the Web browser. As with most Web-based tools, Zotero can recognise and extract data and metadata from a range of different digital libraries. Users can bookmark publications, and then add their own personal tags and notes. Currently, Zotero does not allow users to share their tags in the same way that more “sociable” tools such as CiteULike and Connotea do (see below), although enhancements to the current 1.0 version of Zotero may include this feature. Zotero bookmarks cannot be identified using URIs, so it is not possible to link in from external sources to these personal collections. Mendeley [144] is a similar application that helps to manage and share research papers, although as well as having a Web-based browser version it is possible to store bibliographies using a more powerful desktop-based client that automatically extracts metadata from PDF files, but it can only do this where metadata is available in an amenable format [110] .

MyNCBI [77] allows users to save PubMed searches and to customize search results. It also features an option to update and e-mail search results automatically from saved searches. MyNCBI includes extra features for highlighting search terms, filtering search results, and setting LinkOut [145] , document delivery, and external tool preferences. Like Zotero, MyNCBI currently allows personalization only, with no socialization features. It is also limited to publications in PubMed. As we have previously seen, computational biologists frequently require access to many publications outside PubMed, so they cannot capture their entire library in MyNCBI alone. Like Zotero, it is currently not possible to link to personal collections created in MyNCBI.

Mekentosj Papers.

Papers [146] , [147] is an application for managing electronic publications, originally designed by Alexander Griekspoor and Tom Groothuis. Although it is not a typical browser-based Web application, it can be closely integrated with several services on the Web-like Google Scholar, PubMed, ISI Web of Knowledge, and Scopus mentioned in the Digital Libraries section of this paper. The Papers application demonstrates how large collections of PDF files can be managed more easily. Papers provides a simple and intuitive interface shown in Figure 5 to a collection of PDF files stored on a personal hard drive. It looks and behaves much like Apple's iTunes, an application for managing music files, because the user does not have to know where the data (PDF file) is stored on their hard drive [110] . Unfortunately, Papers is only available for Apple Macintosh users, and there is no version for Windows, which limits its uptake by scientists.


It looks and feels much like the popular iTunes application, allowing users to manage their digital libraries by categories shown at the top. It is presently available only under Mac OS/X.

The personalization of libraries is nothing especially new or groundbreaking, and scientists have been creating personal libraries for years, for example by having their own EndNote library or BibTeX file. Tools such as Zotero, MyNCBI, and Papers just make the process of personalization simpler. However, socialization of digital libraries is relatively new, in particular the ability of multiple users to associate arbitrary tags [27] , [28] , [148] with URIs that represent scientific publications. This is what CiteULike, Connotea, and HubMed (see below) all allow, thereby capturing some of the supposed “wisdom of crowds” [149] in classifying information. .

CiteULike [150] is a free online service to organize academic publications, now run by Oversity. It has been on the Web since October 2004 when its originator was attached to the University of Manchester, and was the first Web-based social bookmarking tool designed specifically for the needs of scientists and scholars. In the style of other popular social bookmarking sites such as [151] , [152] , it allows users to bookmark or “tag” URIs with personal metadata using a Web browser; these bookmarks can then be shared using simple links such as those shown below. The number of articles bookmarked in CiteULike is approaching 2 million, indicated by the roughly incremental numbering used. While the CiteULike software is not open source, part of the dataset it collects is currently in the public domain [153] . Publication URIs are simple: .

CiteULike normalizes bookmarks before adding them to its database, which means it calculates whether each URI bookmarked identifies an identical publication added by another user, with an equivalent URI. This is important for social tagging applications, because part of their value is the ability to see how many people (and who) have bookmarked a given publication. CiteULike also captures another important bibliometric, viz how many users have potentially read a publication, not just cited it. It seems likely that the number of readers considerably exceeds the number of citers [84] , [150] , and this can be valuable information. Time lags matter, too. This is particularly the case with Open Access, where the “most-accessed” Journal of Biology paper of 2007 [154] had in June 2008 been accessed in excess of 12,000 times, but has been cited just nine times (note that early access statistics can provide good predictors for later citations [155] ). CiteULike provides metadata for all publications in RIS (EndNote) and BibTeX, providing a solution to the “Get Metadata” problem described in the previous section Metadata: You Can't Always GET What You Want, because every CiteULike URI for a publication has metadata associated with it in exactly the same way. .

Connotea [156] is run by Nature Publishing Group and provides a similar set of features to CiteULike with some differences. It has been available on the Web since November 2004. Connotea uses MD5 hashes [157] to store URIs that users bookmark, and normalizes them after adding them to its database, rather than before. This post-normalization means Connotea does not always currently recognize when different URIs (such as the examples in the section Identity Crisis) identify the same publication, a bug known as “buggotea” [158] , which also affects CiteULike to a lesser extent. Like CiteULike, URIs in Connotea are simple. A publication about Connotea [156] , for example, is identified by the URI . Metadata are available from Connotea in a wider variety of formats than from CiteULike, including RIS, BibTeX, MODS, Word 2007 bibliography, and RDF, but these have to be downloaded in bulk only, rather than individually per publication URI. The source code for Connotea [159] is available, and there is an API that allows software engineers to build extra functionality around Connnotea, for example the Entity Describer [160] . .

HubMed [161] is a “rewired” version of PubMed, and provides an alternative interface with extra features, such as standard metadata and Web feeds [116] , [117] , which can be subscribed to using a feed reader. This allows users to subscribe to a particular journal and receive updates when new content (e.g., a new issue) becomes available. An example URI for a publication on HubMed [161] is . Like CiteULike, HubMed also solves the “Get Metadata” problem because metadata are available from each HubMed URI in a wide variety of formats not offered by NCBI. This is one of HubMed's most useful features. At the time of writing, HubMed provides metadata in RIS (for EndNote), BibTeX, RDF, and MODS style XML. Users can also log in to HubMed to use various personalized features such as tagging.

Advantages of using CiteULike and Connotea.

Both CiteULike and Connotea require users to invest time and effort learning how to use them, and importing or entering bibliographic information. Why should they bother? Managing bibliographic metadata using these tools has several advantages over the common scenario of storing un-indexed PDF files locally on a personal computer. Both CiteULike and Connotea provide a single place (a Web server) where data (PDFs) and metadata can both be shared and more tightly coupled; this has the following benefits.

Easier and more sophisticated searching is possible. Conversely, given a collection of PDFs on a hard drive, it is typically difficult (or impossible) to make simple queries such as “retrieve all papers by [a given author]”.

When authoring manuscripts, managing references in a Web-based repository can save some of the pain of re-typing metadata (e.g., author names) for a given publication. Provided the publication has a URI that is recognized by these tools, metadata are automatically harvested on behalf of the user, saving them time.

Tags are just keywords, but these allow both personalisation and socialisation of bibliographic data, see [162] for papers cited in this Review as an example. Tagging of papers by other users allows non-expert users to explore related papers in ways that may not be possible through traditional reference lists, since exploring a subject of research in which you are not expert is made easier by following links added by other potentially more expert users.


Hosting a bibliography on a Web server means that, if and when the user moves computer, the library is still accessible. However, keeping local and remote versions requires appropriate synchronisation, which can be problematic.


Many serendipitous discoveries [163] or intellectual linkages that may be determined via co-occurrences (e.g., [43] , [49] , [164] – [167] ) exist in science, and these can be assisted by browsing links provided via social tagging.

Future tools.

The tools described here are the first wave of Web 2.0, Library 2.0 [168] , or even Science 2.0 [169] style tools that are helping to defrost the digital library. There will certainly be plenty more in the future; for example, the Research Information Centre [170] from the British Library is investigating innovative new tools in this area, backed by Microsoft. Some are calling it “Web 3.0” [171] , but, whatever the name, it seems likely that we will see many digital library applications that will exploit the novel social features of platforms such as Facebook [172] , [173] and OpenSocial [174] . Here they can exploit the identity mechanisms already built into those systems.

Personalization and socialization of information will increasingly blur the distinction between databases and journals [175] , and this is especially true in computational biology where contributions are particularly of a digital nature. Scientific contributions to digital knowledge on the Web often do not fit into traditional scientific publishing models [31] . This is usually because they are either too “small” or too “big” to fit into journals. Web logs or “blogs” are beginning to fill the “too small” (see “microattribution” [176] ) gap and can be used for communicating preliminary results, discussion, opinion, supplementary material, and short technical reports [177] – [179] in the style of a traditional laboratory notebook. Biological databases, such as those listed in the annual NAR database review [180] , have long filled the “too big” gap in scientific publishing. They are clearly more significant than their publications alone. As we move in biology from a focus on hypothesis-driven to data-driven science [1] , [181] , [182] , it is increasingly recognized that databases, software models, and instrumentation are the scientific output, rather than the conventional and more discursive descriptions of experiments and their results.

In the digital library, these size differences are becoming increasingly meaningless as data, information, and knowledge become more integrated, socialized, personalized, and accessible. Take Postgenomic [183] , for example, which aggregates scientific blog posts from a wide variety of sources. These posts can contain commentary on peer-reviewed literature and links into primary database sources. Ultimately, this means that the boundaries between the different types of information and knowledge are continually blurring, and future tools seem likely to continue this trend.

A Future with Warmer Libraries

The software described in the section Some Tools for Defrosting Libraries are a promising start to improving the digital library. They make data and metadata more integrated, personal, and sometimes more sociable. While they are a promising start, they face considerable obstacles to further success.

Obstacles to warmer libraries.

We suggest that the main obstacles to warmer libraries are primarily social [184] rather than technical in nature [185] . Identity, trust, and privacy are all potential stumbling blocks to better libraries in the future.

One identity to rule them all?

The basic ability to identify publications and their authors uniquely is currently a huge barrier to making digital libraries more personal, sociable, and integrated. The identity of people is a twofold problem because applications need to identify people as users in a system and as authors of publications. The lack of identity currently prevents answering very simple questions such as, ‘show me all person x publications’, unless the authors concerned are lucky enough to have unique names. Both the NCBI and CrossRef have initiatives to identify authors uniquely in digital libraries, but these have yet to be implemented successfully. The use of Single Sign-On (SSO) schemes such as Shibboleth [186] and OpenID [187] (the latter is used in projects such as [188] and Connotea) could have a huge impact, enabling identity and personalization, without the need for hundreds of different usernames and password combinations. It remains to be seen what their impact on scientific literature will be. Technically, there are also tough challenges for creating unique author names [74] , [113] , such as synonymy, name changes, and variable use of initials and first names, which are ongoing legacy issues.

Who can scientists trust?

Passing valuable data and metadata onto a third party requires that users trust the organization providing the service. For large publishers such as Nature Publishing Group, responsible for Connotea, this is not necessarily a problem. That said, many users are liable to distrust commercial publishers when their business models may unilaterally change their data model, making the tools for accessing their data backwards incompatible, a common occurrence in bioinformatics. Smaller startup companies, who are often responsible for innovative new tools, may struggle to gain the trust of larger institutions and libraries. Most of the software described in the section Tools for Defrosting Libraries require a considerable initial investment from users to import their libraries into the system. Users have to trust service providers that this investment has a good chance of paying off in the longer term.

Scientists also have to decide how much to trust and rely on commercial for-profit companies to build and maintain the cyberinfrastructure they require for managing digital libraries. Not all commercial companies provide the best value-for-money services, and this is often true in scientific publishing. Paul Ginsparg, for example, has estimated that arXiv operates with a cost that is 100 to 1,000 times lower than a conventional peer-reviewed publishing system [189] . If the market will not provide scientists with the services they require, at a price they are willing to pay, they need to build and fund them themselves. The danger is that too much electronic infrastructure will be owned and run by private companies, and science will then be no better served than it was with paper-based publishing.

What data do scientists want to share?

Although the practice of sharing raw data immediately, as with Open Notebook Science [190] , is gaining ground, many users are understandably cautious about sharing information online before peer-reviewed publication. Scientists can be highly secretive and reticent at times [191] , selfishly not wanting to share their data and metadata freely with everyone and anyone, for fear of being “scooped” or copied without proper credit and attribution. Some tools provide security features, e.g., both CiteULike and Connotea allow users to hide references. However, this requires users to trust external providers to respect and protect their privacy, since the information is on a public server, and out of users' control.


Warmer digital libraries cannot be achieved by software tools alone. The digital libraries themselves can take simple steps to make data and metadata more amenable to human and automated use, making their content more useful and useable. Only with proper and better access to linked data and metadata can the tools that computational biologists require be built. We make the following recommendations to achieve this goal.

Simple URIs.

URIs for human use should be as simple as possible, to allow easy linking to individual publications and their authors. Short URIs are much more likely to be used and cited [192] than longer, more complicated URIs.

Persistent URIs.

It has been noted many times before [193] , [194] , but it is worth repeatedly restating: persistent URIs make digital libraries a much more useful and usable place. Although URIs will inevitably decay [195] , [196] , many (but not all) will be preserved by the Internet Archive [197] , [198] , and every effort should be made to keep them persistent where possible.

Exposing metadata.

Publication metadata, in whatever style (EndNote, BibTeX, XML, RDF, etc.), should be transparently exposed and readily available, programmatically and manually, from URIs, HTML [199] , and PDF files of publications.

Identifying publications.

URNs (such as Digital Object Identifiers) should be used to identify publications wherever possible. Most large publishers already do this, although there are still many confounding exceptions.

Identifying people.

This problem is twofold: people need to be identified as users of a system and as authors of publications. To tackle the first issue, tools and libraries should use Single Sign On (SSO) schemes, such as OpenID [187] to provide access to personalized features where possible, as this prevents the endless and frustrating proliferation of username/passwords to identify users in Web applications. The second requires unique author identification, an ongoing and as yet unsolved issue for digital libraries.

By following these recommendations, publishers, scientists, and libraries of all kinds can add significant value to the information they manage for the digital library.


The future of digital libraries and the scientific publications they contain is uncertain. Rumours of the death of printed books [200] and the death of the journal [201] have (so far) been greatly exaggerated. In scientific publishing, we are beginning to see books and electronic journals becoming more integrated with databases, blogs, and other digital media on the Web. These and other changes could lead to a resurgence in the role of nonprofit professional societies and institutional libraries in the scientific enterprise [104] as the cost of publishing falls. But the outcome is still far from certain.

What is certain is the fact that we can look forward to a digital library that is more integrated, sociable, personalized, and accessible, although it may never be completely “frost-free”. Ultimately, better libraries will be a massive benefit to science. The current breed of Web-based tools we have described are facilitating this change, and future tools look set to continue this trend. Ultimately, data and metadata will become less isolated and rigid, moving more fluidly between applications on the Web. There are still issues with trust, privacy, and identity that may hinder the next generation of Web-based digital libraries, and these social problems will need addressing.

It has frequently been observed that scientists lag behind other communities in their use of the Web to communicate research [202] , and that this is ironic given that the Web was invented in a scientific laboratory for use primarily by scientists [63] . Most scientists are painfully familiar with the shortcomings of the databases and software described in this Review, because these tools are at the very heart of science. Digital libraries are, and always will be, fundamental components of e-science, and of the “cyber-infrastructure” [59] , [203] – [205] , necessary for both computational and experimental biology in the 21st century.

Box 1. Glossary and Abbreviations

The following terms and abbreviations are used throughout this paper.

API Application Programming Interface. An API allows software engineers to re-use other people's software with standard programmatic “hooks.”

Blog WebLog, a suite of technologies for rapid publishing on the Web [177] – [179] , [208] , [209] .

DOI Digital Object Identifier, a persistent and unique identifier for Objects, usually publications [55] , [56] , specific type of URN (see below and ).

DTD Document Type Definition, a template or schema for describing the structure of XML documents. The most prominent of these is that set down by the National Library of Medicine, , although each publisher tends to have their own.

Dublin Core A standard for describing metadata across many different domains, .

HTTP Hypertext Transfer Protocol, a communications protocol used to transfer information on the Web [135] .

IETF Internet Engineering Task Force develops and promotes Internet standards such as HTTP, URIs, .

MeSH Medical Subject Heading terms represent a controlled vocabulary used by the National Library of Medicine, .

Metadata Metadata are data about data, e.g., publication metadata include author, date, publisher, etc.

MODS Metadata Object Description Schema, a proposed standard for metadata emanating from the Library of Congress, .

OpenURL Standard syntax for URLs that link to scholarly publications, requiring an OpenURL resolver [89] to make use of them.

OWL Web Ontology Language, a W3C semantic Web standard for creating ontologies that makes extensive use of logical reasoners; see, e.g., [123] , [210] .

RDF Resource Description Framework, a W3C semantic Web standard for describing meta/data as graphs [123] .

SSO Single Sign-On, a method for authenticating human users that allows one username/password to provide access to many different resources.

URI Uniform Resource Identifier, a URI can be further classified as a locator (URL), a name (URN), or both [25] .

URL Uniform Resource Locator refers to the subset of URIs that, in addition to naming a resource, provides a means of locating the resource using, e.g., .

URN Uniform Resource Name, an identifier usually required to remain globally unique and persistent . Unlike URLs, URNs provide a mechanism for naming resources without specifying where they are located; for example, urn:isbn:0387484361 is a URN for a book, that says nothing about where the book can be located.

W3C The World Wide Web Consortium, , an international standards body responsible for standards such as HTML, XML, RDF, and OWL, led by Tim Berners-Lee.

Web 1.0 The original Web, the first version created in 1990 [63] .

Web 2.0 The Web in 2004, a phrase coined by Tim O'Reilly [26] to describe changes since 1990, such as “social software.”

Web 3.0 Used to refer to future versions of the Web that do not yet exist [171] ; for instance, (largely) the Semantic Web.

Web feed Web feeds allow users to subscribe to content that changes, and to be notified when it does, using either RSS or ATOM [116] . This can save time visiting Web sites manually to check for updates. Many journals now make Tables of Contents available in this way.

XML eXtensible Markup Language, a W3C standard for describing meta/data as “trees.”


Duncan Hull would like to thank Timo Hannay and Tim O'Reilly for an invitation to Science Foo Camp [206] 2007, where some of the issues described in this publication were discussed; Kevin Emamy, Richard Cameron, Martin Flack, and Ian Mulvany for answering questions on the CiteULike and Connotea mailing lists; and Greg Tyrelle for ongoing discussion about metadata and the semantic Web at .

  • 1. Murray-Rust P (2007) Data-driven science—A scientist's view. NSF/JISC Repositories Workshop. Available: . Accessed 12 September 2008.
  • 2. Arms WY (2000) Digital libraries. Boston: MIT Press.
  • 3. Soergel D (2002) A framework for digital library research. D-lib magazine 8. Available: . Accessed 12 September 2008.
  • 4. Lesk M (2005) Understanding digital libraries, 2nd ed. San Francisco: Elsevier.
  • 5. Samuel H (1963) The Concise Oxford Dictionary of Quotations. Available: . Accessed 12 September 2008.
  • View Article
  • Google Scholar
  • 7. Anon (2003) MEDLINE Citation Counts by Year of Publication. Available: . Accessed 12 September 2008.
  • 12. Anderson CM (2006) The long tail: How endless choice is creating unlimited demand. London: Random House.
  • 25. Berners-Lee T, Fielding RT, Masinter L (2005) RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Technical report. Available: . Accessed 12 September 2008.
  • 26. O'Reilly T (2005) What Is Web 2.0? Technical report. Available: . Accessed 12 September 2008.
  • 28. Furnas GW, Fake C, von Ahn L, Schachter J, Golder SA, et al. (2006) Why do tagging systems work? In: Olson GM, Jeffries R, editors. ACM. pp. 36–39.
  • 35. Anon (2008) Royal Society of Chemistry (RSC) Prospect Project. Available: . Accessed 12 September 2008.
  • 39. Ananiadou S, McNaught J, editors. (2006) Text mining in biology and biomedicine. London: Artech House.
  • 42. Anon (2008) National Centre for Text Mining (NaCTeM) Software tools. Available: . Accessed 12 September 2008.
  • 44. Anon (2007) The Arrowsmith Project Homepage. Available: . Accessed 12 September 2008.
  • 57. Anon (2008) dois for research content. Available: . Accessed 12 September 2008.
  • 58. Jacobs I, Walsh N (2004) Architecture of the World Wide Web, Vol 1. Available: . Accessed 12 September 2008.
  • 64. Anon (2008) The Association for Computing Machinery (ACM) Portal. Available: . Accessed 12 September 2008.
  • 65. Goodman DJ (2007) Introduction and evaluation of martlet: A scientific workflow language for abstracted parallelisation. Edinburgh: ACM. pp. 983–992. doi:10.1145/1242572.1242705.
  • 66. Anon (2008) EndNote—Bibliographies Made Easy. Available: . Accessed 12 September 2008.
  • 67. Kopka H, Daly PW (1999) A guide to LaTeX. New York: Addison-Wesley.
  • 68. Anon (2008) Institute of Electrical and Electronics Engineers (IEEE) Xplore. Available: . Accessed 12 September 2008.
  • 70. Ley M (2008) The DBLP Computer Science Bibliography. Available: . Accessed 12 September 2008.
  • 75. Anon (2008) NCBI A service of the U.S. National Library of Medicine and the National Institutes of Health. Available: . Accessed 12 September 2008.
  • 76. Trawick BW, McEntyre J (2004) Chap 1, Bibliographic databases. In: Sansom C, Horton R,, editors. The Internet for Molecular Biologists: A Practical Approach. Oxford: Oxford University Press. pp. 1–16.
  • 78. Anon (2008) PubMed Central (PMC) is the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature. Available: . Accessed 12 September 2008.
  • 79. Anon (2008) UK PubMed Central: Free archive of Life Science Journals. Available: . Accessed 12 September 2008.
  • 81. Anon (2008) Open Archives Initiative. Available: . Accessed 12 September 2008.
  • 82. Anon (2008) Institute for Scientific Information (ISI) Web of Knowledge (WoK). Available: . Accessed 12 September 2008.
  • 83. Stein L (1998) Official guide to programming with New York: Wiley.
  • 89. Apps A, Macintyre R (2006) Why OpenURL? D-Lib Magazine 12. Available: . Accessed 12 September 2008.
  • 92. Anon (2008) CiteSeer. IST Scientific Literature Digital Library. Available: . Accessed 12 September 2008.
  • 94. Anon (2006) Most cited authors in Computer Science with article citation counts normalized by publication year. Available: . Accessed 12 September 2008.
  • 95. Anon (2008) Google Scholar. Available: . Accessed 12 September 2008.
  • 103. Anon (2008) e-Print archive: Open Access to e-prints in Physics, Mathematics, Computer Science, Quantitative Biology and Statistics. Available: . Accessed 12 September 2008.
  • 107. Anon (2008) Citebase Search. Available: . Accessed 18 September 2008.
  • 109. Anon (2008) WorldCat, a global catalog of library collections. Available: . Accessed 12 September 2008.
  • 110. Howison J, Goodrum A (2004) Why can't I manage academic papers like MP3s? The evolution and intent of metadata standards. Proc 2004 Colleges, Code and Intellectual Property Conference. Available: . Accessed 12 September 2008.
  • 114. Anon (2008) Google 101: How Google crawls, indexes, and serves the web. Available: . Accessed 12 September 2008.
  • 115. Schrenk M (2007) Webbots, Spiders and Screenscapers: A guide to developing Internet agents with PHP/CURL. San Francisco: No Starch Press.
  • 116. Hammersley B (2005) Developing feeds with RSS and ATOM. Sebastopol (California): O'Reilly & Associates.
  • 122. Richardson L, Riuby S (2007) RESTful web services. Sebastopol (California): O'Reilly.
  • 127. Arms WY (2000) Automated digital libraries: How effectively can computers be used for the skilled tasks of professional librarianship? D-lib magazine 6. Available: . Accessed 12 September 2008.
  • 130. Davies J, Studer R, Warren PV (2006) Semantic web technologies: Trends and research in ontology-based systems. Chichester: Wiley.
  • 131. Baker CJO, Cheung K-H, editors. (2007) Semantic web: Revolutionizing knowledge discovery in the life sciences. New York: Springer.
  • 135. Fielding RT, Gettys J, Mogul J, Frystyk H, Masinter L, et al. (1999) RFC 2616 Hypertext Transfer Protocol—HTTP/1.1. Internet Engineering Task Force (IETF). Available: . Accessed 12 September 2008.
  • 140. Page R (2008) bioGUID: Bootstrapping the biodiversity semantic web. Available: . Accessed 12 September 2008.
  • 141. Anon (2008) Library of Congress Metadata Object Description Schema: MODS. Available: . Accessed 12 September 2008.
  • 143. Anon (2008) Zotero: The Next-Generation Research Tool. Available: . Accessed 12 September 2008.
  • 144. Anon (2008) Mendeley—Manage and Share Research Papers—Discover Research Data. Available: . Accessed 12 September 2008.
  • 145. Anon (2008) LinkOut: A configurable hyperlinking system. Available: . Accessed 12 September 2008.
  • 146. Griekspoor A, Groothius T (2008) mekentosj papers: Your personal library of science. Available: . Accessed 12 September 2008.
  • 149. Surowiecki J (2004) The wisdom of crowds: Why the many are smarter than the few. London: Abacus.
  • 150. Emamy K, Cameron RG (2007) Citeulike: A researcher's social bookmarking service. Ariadne 51. Available: . Accessed 12 September 2008.
  • 151. Anon (2008) Delicious: A social bookmarks manager. Available: . Accessed 18 September 2008.
  • 152. Bao S, Xue G, Wu X, Yu Y, Fei B, et al. (2007) Optimizing web search using social annotations. pp. 501–510. Proc 16th Int Conf on World Wide Web: ACM.
  • 156. Lund B, Hammond T, Flack M, Hannay T (2005) Social bookmarking tools (II): A case study—Connotea. D-Lib Magazine 11. Available: . Accessed 12 September 2008.
  • 157. Rivest R (1992) RFC 1321 The MD5 Message-Digest Algorithm. Technical report. Available at . Accessed 16 September 2008.
  • 158. Hull D (2006) Buggotea: Redundant links in Connotea. Available: . Accessed 16 September 2008.
  • 159. Anon (2008) Connotea Code. Available: . Accessed 12 September 2008.
  • 162. Anon (2008) All articles tagged defrost in CiteULike. Available: . Accessed 16 September 2008.
  • 163. Roberts RM (1989) Serendipity: Accidental discoveries in science. New York: Wiley.
  • 169. Waldrop M (2008) Science 2.0—Is Open Access Science the Future?: Scientific American. Available: . Accessed 18 September 2008.
  • 170. Barga RS, Andrews S, Parastatidis S (2007) The British Library Research Information Centre (RIC). In: Cox SJ, , editor. Proc UK e-Science All Hands Meeting 2007: National e-Science Centre. pp. 454–461.
  • 172. Golbeck J (2007) The dynamics of web-based social networks: Membership, relationships, and change. First Monday 12. Available: . Accessed 12 September 2008.
  • 174. Anon (2007) OpenSocial—Google Code: The web is better when it's social. Available: . Accessed 12 September 2008.
  • 183. Adie E (2008) Postgenomic. Available: . Accessed 12 September 2008.
  • 187. Recordon D, Reed D (2006) OpenID 2.0: A platform for user-centric identity management. In: Juels A, Winslett M, Goto A, editors. Digital Identity Management: ACM. pp. 11–16.
  • 188. De Roure D, Goble C (2007) my Experiment—A Web 2.0 Virtual Research Environment Proc International Workshop on Virtual Research Environments and Collaborative Work Environments; May 2007; Edinburgh, United Kingdom. Available: . Accessed 18 September 2008.
  • 189. Ginsparg P (2001) Creating a global knowledge network. Proc 2nd ICSU Press–UNESCO Conference on Electronic Publishing in Science. Available: . Accessed 12 September 2008.
  • 191. Giordano R (2007) The scientist: Secretive, selfish or reticent? A social network analysis. In E-Social Science 2007. Available: . Accessed 12 September 2008.
  • 195. Berners-Lee T (1998) Hypertext Style: Cool URIs don't change. Available: . Accessed 12 September 2008.
  • 196. Bar-Yossef Z, Broder AZ, Kumar R, Tomkins A (2004) Sic transit gloria telae: Towards an understanding of the web's decay. pp. 328–337. Proc WWW '0413th Int Conf on World Wide Web: ACM Press.
  • 197. Anon (2008) The Internet Archive. Available: . Accessed 12 September 2008.
  • 198. Kahle B, Prelinger R, Jackson ME (2001) Public access to digital material. D-Lib Magazine 7. Available: . Accessed 12 September 2008.
  • 199. Hellman E (2005) OpenURL COinS: A Convention to Embed Bibliographic Metadata in HTML. Available: . Accessed 12 September 2008.
  • 200. Gomez J (2008) Print is dead: Books in our digital age. London: Macmillan.
  • 201. Barry T, Richardson J (1997) Death of the journal: Will it be replaced by document delivery? Available: . Accessed 16 September 2008.
  • 206. Anon (2006) Science Foo Camp (scifoo). Available: . Accessed 12 September 2008.
  • 207. Buzan T (2002) How to mind map. London: Thorsons.
  • 210. Lacy LW (2005) OWL: Representing information using the web ontology language. Crewe: Trafford Publishing.

An Analysis and Visualization Tool for DBLP Data

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

DBLP - Digital Bibliography & Library Project

Connect to database

Integrating Bibliographical Data of Computer Science Publications from Online Digital Libraries

  • Conference paper
  • Cite this conference paper

Book cover

  • Tin Huynh 22 ,
  • Hiep Luong 23 &
  • Kiem Hoang 22  

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7198))

Included in the following conference series:

  • Asian Conference on Intelligent Information and Database Systems

In this paper we proposed and developed a system to integrate the bibliographical data of publications in the computer science domain from various online sources into a unified database based on the focused crawling approach. In order to build this system, there are two phases to carry on. The first phase deals with importing bibliographic data from DBLP (Digital Bibliography and Library Project) into our database. The second phase the system will automatically crawl new publications from online digital libraries such as Microsoft Academic Search, ACM, IEEEXplore, CiteSeer and extract bibliographical information (one kind of publication metadata) to update, enrich the existing database, which have been built at the first phase. This system serves effectively in services relating to academic activities such as searching literatures, ranking publications, ranking experts, ranking conferences or journals, reviewing articles, identifying the research trends, mining the linking of articles, stating of the art for a specified research domain, and other related works base on these bibliographical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Abbasi, A., Altmann, J.: A social network system for analyzing publication activities of researchers. TEMEP Discussion Papers 201058, Seoul National University; Technology Management, Economics, and Policy Program, TEMEP (2010)

Google Scholar  

Bainbridge, D., Thompson, J., Witten, I.H.: Assembling and enriching digital library collections. In: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2003, pp. 323–334. IEEE Computer Society, Washington, DC (2003)

Bast, H., Weber, I.: The completesearch engine: Interactive, efficient, and towards ir & db integration. In: CIDR, pp. 88–95 (2007)

Diederich, J., Balke, W.T.: FacetedDBLP - Navigational Access for Digital Libraries. Bulletin of IEEE Technical Committee on Digital Libraries 4 (2008) ISSN 1937-7266

Garfield, E.: Citation indexes for science: A new dimension in documentation through association of ideas. Science 122(3159), 108–111 (1955)

Article   Google Scholar  

Giles, C.L., Bollacker, K.D., Lawrence, S.: Citeseer: an automatic citation indexing system. In: Proceedings of the Third ACM conference on Digital Libraries, DL 1998, pp. 89–98. ACM, New York (1998)

Chapter   Google Scholar  

Lawrence, S., Giles, C.L., Bollacker, K.D.: Autonomous citation matching. In: Proceedings of the Third Annual Conference on Autonomous Agents, AGENTS 1999, pp. 392–393. ACM, New York (1999)

Ley, M.: The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 1–10. Springer, Heidelberg (2002)

Roth, D.L.: The emergence of competitors to the science citation index and the web of science. Current Science 89(9), 1531–1536 (2005)

Sattarzadeh, B., Saffar, Y.G., Asadpur, M.: Making bibliographies using bibtex (2003)

Schallehn, E., Endig, M., Sattler, K.U.: Integrating bibliographical data from heterogeneous digital libraries. In: Proceedings of ADBIS-DASFAA Symposium, pp. 161–170 (2000)

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 990–998. ACM, New York (2008)

Teregowda, P.B., Councill, I.G., Fernández, R.J.P., Khabsa, M., Zheng, S., Giles, C.L.: Seersuite: developing a scalable and reliable application framework for building digital libraries by crawling the web. In: Proceedings of the 2010 USENIX Conference on Web Application Development, WebApps 2010, p. 14. USENIX Association, Berkeley (2010)

Witten, I.H., Bainbridge, D., Boddie, S.J.: Greenstone: Open-source digital library software with end-user collection building. Online Information Review 25, 288–298 (2001)

Download references

Author information

Authors and affiliations.

University of Information Technology, Vietnam

Tin Huynh & Kiem Hoang

University of Arkansas, U.S.A.

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien-Kung Road, 80778, Kaohsiung, Taiwan

Jeng-Shyang Pan

Graduate Institute of Educational Measurement and Statistics, National Taichung University of Education, No. 140, Min-Shen Road, 40306, Taichung, Taiwan

Shyi-Ming Chen

Wrocław University of Technology, Wyb. Wyspiańskiego 27, 50-370, Wrocław, Poland

Ngoc Thanh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper.

Huynh, T., Luong, H., Hoang, K. (2012). Integrating Bibliographical Data of Computer Science Publications from Online Digital Libraries. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Intelligent Information and Database Systems. ACIIDS 2012. Lecture Notes in Computer Science(), vol 7198. Springer, Berlin, Heidelberg.

Download citation


Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-28492-2

Online ISBN : 978-3-642-28493-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Promoting Useful Knowledge through Digital Bibliography

Jeffery Appelhans


More than two years ago, postdoctoral fellows at the Library & Museum of the American Philosophical Society began what figures to be a signature product: the Members Bibliography and Biography Project, which now continues as part of a National Endowment for the Humanities: CARES grant. This public-facing digital project seeks to create a database that documents many of the most generative works and authors in the Atlantic world—including reprints and editions in any language—produced by Members elected between 1743 and 1865. This is no small task: APS Members were the who’s who of thinkers during the rise of natural philosophy and science, medical inquiry, and ethnography; most on this side of the Atlantic had deep connections to the American Revolution and the early republic. To date, the project interrogated more than 3,200 editions of more than 1,100 texts produced by 343 Members. Some 1,250 Members remain.

Moseley Drupal

We begin by gathering information about these publications. To do this, we draw together records from the public catalog of major historical libraries to see what our Member authors published. By drawing together records from the British Library’s English Short Title Catalog, the American Antiquarian Society, and the Library Company of Philadelphia, with APS holdings—and the vast foreign-language holdings cataloged in WorldCat—we aspire to comprehensive coverage. 

But getting a single book from these records into our database required some 200 mouse clicks and 50 copy-paste operations. This was not sustainable. As the early modern print revolution spun up, the profusion of editions alone would grind the project to a halt—consider Thomas Paine’s The American Crisis , later the runaway hit Common Sense —for example. No more. Although this process remains labor-intensive, the construction and continued refinement of digital automation totally transformed this project over the last nine months.

We found an unlikely solution: leveraging business software toward humanistic aims. We now fuse these records together using Microsoft Power BI, a kind of next-gen version of the venerable Excel, to generate a rough bibliography. This project seems to be among first to use Power BI for this kind of work: although built and marketed as Microsoft’s “business intelligence” solution for the corporate world, this free download is actually a kind of Swiss army knife for digital history. Scholars can use an Office-like visual interface to scrape spreadsheets, text files, webpages and even PDFs, generate any variety of charts, or produce visualizations and heatmaps with built-in ArcGIS.

Power BI queries mapped

For our purposes, I wrote a series of queries to transform, format, and combine the records into our layout. Elements of style, such as punctuation—which vary from catalog to catalog—become uniform. Duplicates disappear. But for all this software-fueled automation we continue to stand on the shoulders of the bibliographers of early America: details we privilege, such as references to gold-standards like Sabin, Evans, and Shaw-Shoemaker, coalesce for easier verification.

Adv editor

Necessity is the mother of invention, and one of our early development cases was the bibliography of Dr. William Cullen (APS 1768), the Scottish physician who trained a central corps of early Philadelphia’s medical community, including APS Members (and the founders of Philadelphia’s medical college) John Morgan, William Shippen Jr., Adam Kuhn, and Benjamin Rush. No wonder they flocked to him, for Cullen wrote the book on doctoring: First lines of the practice of physic, for the use of students in the University of Edinburgh (1777).

Cullen entries

Not only was Cullen’s textbook wildly popular and widely reprinted. It exemplifies the roles of international exchange and constant iteration in early American science. Through Cullen and his students, a new generation of American-educated doctors and scientists elevated Philadelphia and New York ever closer to the prominence of London, Paris, and Edinburgh.

In my next post, I’ll detail further some of the promise—and potential pitfalls—of this kind of ambitious digital bibliography, by looking at one of APS’s most famous Members of the Class of 1775–85. Who will it be!?

NEH logo

Any views, findings, conclusions, or recommendations expressed in this blog do not necessarily represent those of the National Endowment for the Humanities.

More from the blog

European origins, american lives, katharine way, john wheeler, and the dawn of nuclear fission, sketching splendor opens april 12, “have you been to the attic”: digitizing the 250 years of aps press books.

To read this content please select one of the options below:

Please note you do not have access to teaching notes, library digitization projects, issues and guidelines: a survey of the literature.

Library Hi Tech

ISSN : 0737-8831

Article publication date: 1 April 2006

To provide a selective bibliography of literature which explores issues and provides guidelines on library digitization projects.


Literature published from 2000‐2005 on library digitization projects was examined. Issues involving digitization projects are presented, as well as case studies and resources for digitization projects. The paper has the following sections: project management, funding digital projects, selection of materials, legal issues, metadata creation, interoperability, and preservation issues.

Libraries are undertaking digitization projects to provide wider access to and to preserve materials. The literature survey presents an overview of digitization activities and discussions of issues concerning library digital projects. The authors of the case studies detail how libraries dealt with various components of the projects, such as planning, cataloging, and handling copyright issues. Many aspects of digitization projects will be changing over time, with further research and advances in technology, and the literature on the subject bears watching in coming years.

Practical implications

The articles and resource guides in the literature survey can assist librarians in carrying out digitization projects in their institutions.


It explains how important issues in library digitization projects are being encountered and resolved and provides many practical guidelines and resources for librarians undertaking such projects.

  • Digital libraries
  • Research libraries
  • Collections management

Lopatin, L. (2006), "Library digitization projects, issues and guidelines: A survey of the literature", Library Hi Tech , Vol. 24 No. 2, pp. 273-289.

Emerald Group Publishing Limited

Copyright © 2006, Emerald Group Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Search form

Digital project supports ‘bibliographic turn’ in black literary studies.

A selection of books by Black authors

(Photo by Tubyez Cropper)

Yale’s Jacqueline Goldsby and Meredith McGill of Rutgers University recently received a $1.7 million grant from The Mellon Foundation to support the development of The Black Bibliography Project (BBP), an initiative that aims to revive and transform descriptive bibliography for African American and Black Diaspora literary studies.

Using “Linked Data” — a web-based technique for recording meaningful relationships between and among data drawn from disparate sources — and in conjunction with Wikibase, Goldsby and McGill are building an electronic database that will offer new pathways for understanding the history of Black-published books, magazines, and newspapers, and will help reveal the social formations and aesthetic practices that are specific to Black print culture, in the U.S. and across the Black Diaspora.

Meredith McGill and Jacqueline Goldsby

Designed by a collaborative team of metadata librarians, graduates, and scholars from Yale and Rutgers, the database will also connect Black authors with the organizations and people who published their writing, the places where these books were created and sold, and the people who owned or otherwise interacted with these objects as they moved through time and space.

“ The Mellon Foundation’s support for the BBP is bending the curve in what we have called ‘the bibliographic turn’ in African American literary studies,” said Goldsby, the Thomas E. Donnelley Professor of African American Studies and of English and professor of American studies at Yale. “This grant is a game-changing opportunity for the fields of bibliographic criticism and Black print culture studies.”

The BBP will establish partnerships with library and archival repositories across the nation, whose holdings in African American books, pamphlets, periodicals, and newspapers will populate the database.

During the pilot phase of the project, which also was funded by The Mellon Foundation, the team engaged in key preparatory work with the support of Yale’s Beinecke Rare Book & Manuscript Library and Rutgers’ Dean of Arts and Sciences.

“ The pilot phase of this project taught us how much there was to learn about the history of Black publishing, and how many clues about this history can be found in the books themselves,” said McGill, professor and department chair of English at Rutgers.

The new Mellon grant will allow Goldsby and McGill to take the BBP to scale.

“ This new grant, which will support the BBP’s activities for 3 1/2 years, will draw students and faculty from Yale and Rutgers together with experts in rare books and metadata librarianship and with specialists in information design to build cutting-edge knowledge infrastructure for African American studies,” McGill said.

For the implementation stage, Goldsby and McGill will direct a large team composed of newly hired staff librarians and graduate student fellows at both Yale and Rutgers, fostering cross-institutional collaborations on multiple levels.

Goldsby is also excited about the valuable capacity-building the project will provide at the two universities and beyond, including:

  • hiring new staff at Yale’s and Rutgers’ libraries;
  • training graduate students in Black book history and bibliographic criticism;
  • recruiting undergraduate data science students into the project; and
  • building bridges between Black Studies librarians and curators at university and public repositories across the country.

“ The interdisciplinary explorations ahead will be thrilling,” said Goldsby.

For more on the Black Bibliography project,  visit the website .

Arts & Humanities

Media Contact

Bess Connolly : [email protected] ,

digital bibliography & library project

Long-running Yale course connects students with social enterprises in Kenya

digital bibliography & library project

Inaugural Yale public theology conference urges preaching on moral issues

Nicolas Gertler and Luciano Floridi

Student-developed AI chatbot opens Yale philosopher’s works to all

digital bibliography & library project

Precision medicine for Parkinson’s Disease is focus of new Yale center

  • Show More Articles

Log in to access the archive or click here to obtain information on how to acquire your password.

digital bibliography & library project

The Beckett Digital Manuscript Project is a collaboration between the Centre for Manuscript Genetics (University of Antwerp), the Beckett International Foundation (University of Reading), the Oxford Centre for Textual Editing and Theory (University of Oxford), and the Harry Ransom Humanities Research Center (University of Texas at Austin), with the kind permission of the Estate of Samuel Beckett.

The purpose of the Beckett Digital Manuscript Project is to reunite the manuscripts of Samuel Beckett's works in a digital way, and to facilitate genetic research: the project brings together digital facsimiles of documents that are now preserved in different holding libraries, and adds transcriptions of Beckett's manuscripts, tools for bilingual and genetic version comparison, a search engine, and an analysis of the textual genesis of his works.

Not I / Pas moi, That Time / Cette fois and Footfalls / Pas

We are delighted to announce the publication of our tenth genetic edition: Not I / Pas moi , That Time / Cette fois and Footfalls / Pas . The accompanying monograph on its genesis is available through Bloomsbury Academic .

Practigal Guide

If you are interested in making a digital edition, here is a practical guide to digital genetic editing .

MLA Prize for a Bibliography, Archive, or Digital Project

digital bibliography & library project

Read more on our News & Updates page.

Samuel Beckett: A Bibliography

We are honoured to announce the publication of the first part of Breon Mitchell's Samuel Beckett: A Bibliography. Part I: The Early Years: 1929-1950 .

Beckett Digital Library (BDL)

We are delighted to announce the publication of the Beckett Digital Library . The accompanying monograph Samuel Beckett's Library , written by Dirk Van Hulle and Mark Nixon is available through Cambridge University Press .

© 2021 Samuel Beckett Digital Manuscript Project Directors: Dirk Van Hulle and Mark Nixon | Technical realisation: Vincent Neyt

Under the auspices of the Centre for Manuscript Genetics (University of Antwerp), the Beckett International Foundation (University of Reading), the Harry Ransom Humanities Research Center (Austin, Texas) and the Estate of Samuel Beckett.

The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement n° 313609.

Stirrings Still / Soubresauts and Comment dire / what is the word : © Samuel Beckett 1988, 1989 and the Estate of Samuel Beckett. The right of Samuel Beckett to be identified as the author of this work has been asserted in accordance with Section 77 of the Copyright, Designs and Patents Act 1988. L'Innommable / The Unnamable © Samuel Beckett 1953 and the Estate of Samuel Beckett. Krapp's Last Tape / La Dernière Bande © Samuel Beckett 1958 and the Estate of Samuel Beckett. Molloy © Samuel Beckett 1951 and the Estate of Samuel Beckett. Malone meurt / Malone Dies © Samuel Beckett 1951 and the Estate of Samuel Beckett. En attendant Godot / Waiting for Godot © Samuel Beckett 1952 and the Estate of Samuel Beckett. Fin de partie / Endgame © Samuel Beckett 1957 and the Estate of Samuel Beckett. Not I / Pas moi © Samuel Beckett 1973 and the Estate of Samuel Beckett. That Time / Cette fois © Samuel Beckett 1976 and the Estate of Samuel Beckett. Footfalls / Pas © Samuel Beckett 1976 and the Estate of Samuel Beckett. No part of this publication may be reproduced in any form by any electronic or mechanical means without permission in writing from the publisher and from the Estate of Samuel Beckett.


Moscow (Architecture Skyline)

leMusky Avatar

Help your fellow builder by leaving your feedback based on these three criteria:

  • Originality: How original is this - never seen before?
  • Building Techniques: How much skill do you think the creator of this MOC has, in terms of building technique?
  • Details: Express how much you like the details of the build.

Your feedback is only shown to the creator as well as yourself. It is not available for other users to see. The creator won't see your user name.

  • Description
  • Comments 31
  • Official LEGO Comments 2

Last Updated . Click "Updates" above to see the latest.

  • Moscow State University: The tallest of Moscow's Seven Sisters, it's been housing the State University since 1953, being the tallest educational building in the world with its 240m of height.
  • Zuev Workers' Club: Projected by Ilya Golossov to be a recreational center for factory workers, its construction was finished in 1929 and it's still a reference in Constructivist architecture.
  • Spasskaya Tower: Overlooking the Red Square, this clock tower on the Kremlin's walls was built by Milanese architect Pietro Antonio Solari back in 1491, and once the Kremlin's main entrance.
  • State History Museum: The State History Museum complex has been open since 1872, and houses many artifacts, varying from pre-historical relics to artworks acquired by the old royalty.
  • Mercury City Tower: The 5th tallest builing in Russia and Europe overall, this 338m tall skyscaper in the International Business Center stands out for its copper glass façade and spiky shape.
  • Bolshoi Theater: First opened in 1825, it's home of the internationally renowned classical ballet company, and premiered works of composers such as Tchaikovsky and Shostakovich.


Submit a product idea.

Opens in a new window

digital bibliography & library project

For the first time Rosatom Fuel Division supplied fresh nuclear fuel to the world’s only floating nuclear cogeneration plant in the Arctic

The fuel was supplied to the northernmost town of Russia along the Northern Sea Route.

digital bibliography & library project

The first in the history of the power plant refueling, that is, the replacement of spent nuclear fuel with fresh one, is planned to begin before 2024. The manufacturer of nuclear fuel for all Russian nuclear icebreakers, as well as the Akademik Lomonosov FNPP, is Machinery Manufacturing Plant, Joint-Stock Company (MSZ JSC), a company of Rosatom Fuel Company TVEL that is based in Elektrostal, Moscow Region.

The FNPP includes two KLT-40S reactors of the icebreaking type. Unlike convenient ground-based large reactors (that require partial replacement of fuel rods once every 12-18 months), in the case of these reactors, the refueling takes place once every few years and includes unloading of the entire reactor core and loading of fresh fuel into the reactor.

The cores of KLT-40 reactors of the Akademik Lomonosov floating power unit have a number of advantages compared to the reference ones: a cassette core was used for the first time in the history of the unit, which made it possible to increase the fuel energy resource to 3-3.5 years between refuelings, and also reduce the fuel component of the electricity cost by one and a half times. The FNPP operating experience formed the basis for the designs of reactors for nuclear icebreakers of the newest series 22220. Three such icebreakers have been launched by now.

For the first time the power units of the Akademik Lomonosov floating nuclear power plant were connected to the grid in December 2019, and put into commercial operation in May 2020. The supply of nuclear fuel from Elektrostal to Pevek and its loading into the second reactor is planned for 2024. The total power of the Akademik Lomonosov FNPP, supplied to the coastal grid of Pevek without thermal energy consumption on shore, is about 76 MW, being about 44 MW in the maximum thermal power supply mode. The FNPP generated 194 million kWh according to the results of 2023. The population of Pevek is just a little more than 4 thousand, while the FNPP has a potential for supplying electricity to a city with a population of up to 100 thousand people. After the FNPP commissioning two goals were achieved. These include first of all the replacement of the retiring capacities of the Bilibino NPP, which has been operating since 1974, as well as the Chaunskaya TPP, which has already been operating for more than 70 years. Secondly, energy is supplied to the main mining companies in western Chukotka in the Chaun-Bilibino energy hub a large ore and metal cluster, including gold mining companies and projects related to the development of the Baimsk ore zone. In September 2023, a 110 kilovolt power transmission line with a length of 490 kilometers was put into operation, connecting the towns of Pevek and Bilibino. The line increased the reliability of energy supply from the FNPP to both Bilibino consumers and mining companies, the largest of which is the Baimsky GOK. The comprehensive development of the Russian Arctic is a national strategic priority. To increase the NSR traffic is of paramount importance for accomplishment of the tasks set in the field of cargo shipping. This logistics corridor is being developed due regular freight voyages, construction of new nuclear-powered icebreakers and modernization of the relevant infrastructure. Rosatom companies are actively involved in this work. Rosatom Fuel Company TVEL (Rosatom Fuel Division) includes companies fabricating nuclear fuel, converting and enriching uranium, manufacturing gas centrifuges, conducting researches and producing designs. As the only nuclear fuel supplier to Russian NPPs, TVEL supplies fuel for a total of 75 power reactors in 15 countries, for research reactors in nine countries, as well as for propulsion reactors of the Russian nuclear fleet. Every sixth power reactor in the world runs on TVEL fuel. Rosatom Fuel Division is the world’s largest producer of enriched uranium and the leader on the global stable isotope market. The Fuel Division is actively developing new businesses in chemistry, metallurgy, energy storage technologies, 3D printing, digital products, and decommissioning of nuclear facilities. TVEL also includes Rosatom integrators for additive technologies and electricity storage systems. Rosenergoatom, Joint-Stock Company is part of Rosatom Electric Power Division and one of the largest companies in the industry acting as an operator of nuclear power plants. It includes, as its branches, 11 operating NPPs, including the FNPP, the Scientific and Technical Center for Emergency Operations at NPPs, Design and Engineering as well as Technological companies. In total, 37 power units with a total installed capacity of over 29.5 GW are in operation at 11 nuclear power plants in Russia. Machinery Manufacturing Plant, Joint-Stock Company (MSZ JSC, Elektrostal) is one of the world’s largest manufacturers of fuel for nuclear power plants. The company produces fuel assemblies for VVER-440, VVER-1000, RBMK-1000, BN-600,800, VK-50, EGP-6; powders and fuel pellets intended for supply to foreign customers. It also produces nuclear fuel for research reactors. The plant belongs to the TVEL Fuel Company of Rosatom.

digital bibliography & library project

Rosatom obtained a license for the first land-based SMR in Russia

On April 21, Rosenergoatom obtained a license issued by Rostekhnadzor to construct the Yakutsk land-based SMR in the Ust-Yansky District of the Republic of Sakha (Yakutia).

digital bibliography & library project

ROSATOM and FEDC agree to cooperate in the construction of Russia's first onshore SNPP

ROSATOM and FEDC have signed a cooperation agreement to build Russia's first onshore SNPP in Yakutia.

digital bibliography & library project

Rosatom develops nuclear fuel for modernized floating power units

Rosatom has completed the development of nuclear fuel for the RITM-200S small modular reactor designed for the upgraded floating power units.

digital bibliography & library project

Tvel completes development of new fuel for Paks nuclear plant


VVER-440 fuel fabrication (Credit: Tvel)

The full package of documents is handed over to the Hungarian customer, MVM Paks Ltd, for further licensing of the new fuel by the national nuclear power regulator, Tvel said.

The first fuel assemblies have also passed acceptance testing at Tvel's Elemash Machine-building plant in Elektrostal, Moscow region.

The new modification of VVER-440 second generation fuel increases the efficiency of fuel usage and advances the economic performance of the power plant operation, Tvel said.  

The engineering contract for development of the new VVER-440 fuel was signed in late 2017. The development and validation work involved a number of Russian enterprises, including OKB Gidropress (a part of Rosatom machine-building division Atomenergomash), Bochvar Institute (material science research facility of TVEL Fuel Company), Elemash Machine-building plant and Kurchatov Institute national research center. At the site of OKB Gidropress research and experiment facility, the new fuel passed a range of hydraulic, longevity and vibration tests .

The first consignment of the modified fuel will be delivered to Paks nuclear power plant later this year. The four-unit Paks nuclear plant, which entered operation between 1982 and 1987 currently operate on a 15-month fuel cycle and supply around 50% of Hungary's electricity.

Photo: Fabrication of new VVER-440 fuel for Paks nuclear plant (Credit: Tvel)

  • Terms and conditions
  • Privacy Policy
  • Newsletter sign up
  • Digital Edition
  • Editorial Standards

digital bibliography & library project


  1. How to write an annotated bibliography step-by-step with examples

    digital bibliography & library project

  2. How can I create a bibliography for sources in my library?

    digital bibliography & library project

  3. Annotated Bibliography Project

    digital bibliography & library project

  4. Annotated Bibliography

    digital bibliography & library project

  5. Create a Perfect NLM Annotated Bibliography with Us

    digital bibliography & library project

  6. Bibliography Examples for Students

    digital bibliography & library project


  1. Bibliography for school project

  2. Digital Annotated Bibliography

  3. Citation & Bibliography Tools

  4. Generating Project Bibliography

  5. Digital annotated bibliography ENGL 102

  6. Bibliography for project file


  1. dblp: computer science bibliography

    Datasets and other research artifacts are a major topic in the scientific community in the recent years. Many ongoing projects focus on improving the standardization, publication and citation of these artifacts. Currently, the dblp team is involved in three of them: NFDI4DataScience, NFDIxCS, and Unknown Data. As part of these […]

  2. DBLP

    DBLP is a computer science bibliography website. Starting in 1993 at Universität Trier in Germany, it grew from a small collection of HTML files and became an organization hosting a database and logic programming bibliography site. Since November 2018, DBLP is a branch of Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI). DBLP listed more than 5.4 million journal articles, conference ...

  3. Digital Library Project

    Other digital library projects at the Library of Congress include:. Standards: The Library of Congress is the maintenance agency for several key standards used in the information community, including US Machine-Readable Cataloging (MARC) formats, the Z39.50 information retrieval protocol, the Encoded Archival Description (EAD) Document Type Definition (DTD) for Standard Generalized Markup ...

  4. Fundamentals of Digital Library Projects

    Description: This 6-week online course introduces students to the breadth of considerations, standards, and skills needed to successfully launch and manage a digital library program. The course will provide opportunity for hands-on activities to develop critical thinking and decision-making skills within the context of a digital library. 2024 Sessions Click on a date range to register for that ...

  5. Computer Science

    The Digital Bibliography and Library Project (DBLP) indexes over 1.3 million conference proceedings, articles, series, and books in Computer Science. CiteSeerX. CiteSeerX indexes 1.5 million documents and over 30 million citations with a focus on computer and information science.

  6. Digital Library Programs for Libraries and Archives: Developing

    Exercise 8 Project Plan Bibliography Index. Aaron D. Purcell. ... "An experience-based, detailed overview of the digitization process from soup to nuts, for creation of digital library projects and conversion to sustainable digital library programs. It will read familiar to experienced professionals and provide a workable blueprint for ...

  7. The DBLP Computer Science Bibliography: Evolution ...

    The Digital Bibliography and Library Project (DBLP) with the new title of "The DBLP Computer Science Bibliography" 13 (Ley, 2002), is a famous bibliography website centering around CS. It has been ...

  8. Defrosting the Digital Library: Bibliographic Tools for the Next ...

    The Digital Bibliography and Library Project (DBLP) ,, created by Michael Ley, provides an index of peer-reviewed publications in computer science. Recently, DBLP has started to index many popular journals with significant computational biology content such as Bioinformatics and Nucleic Acids Research , and currently indexes about 900,000 ...

  9. An Analysis and Visualization Tool for DBLP Data

    The Digital Bibliography and Library Project (DBLP) is a popular computer science bibliography website hosted at the University of Trier in Germany. It currently contains 2,722,212 computer science publications with additional information about the authors and conferences, journals, or books in which these are published. Although the database covers the majority of papers published in this ...

  10. DBLP

    DBLP provides bibliographic information on major computer science journals and proceedings.

  11. Integrating Bibliographical Data of Computer Science ...

    The first phase deals with importing bibliographic data from DBLP (Digital Bibliography and Library Project) into our database. The second phase the system will automatically crawl new publications from online digital libraries such as Microsoft Academic Search, ACM, IEEEXplore, CiteSeer and extract bibliographical information (one kind of ...

  12. ZoteroBib: Fast, free bibliography generator

    Adding a bibliography entry. Simply find what you're looking for in another browser tab and copy the page URL to the ZoteroBib search bar. ZoteroBib can automatically pull in data from newspaper and magazine articles, library catalogs, journal articles, sites like Amazon and Google Books, and much more.

  13. Promoting Useful Knowledge through Digital Bibliography

    More than two years ago, postdoctoral fellows at the Library & Museum of the American Philosophical Society began what figures to be a signature product: the Members Bibliography and Biography Project, which now continues as part of a National Endowment for the Humanities: CARES grant. This public-facing digital project seeks to create a database that documents many of the most generative ...

  14. Full article: The digital humanities: Implications for librarians

    She created a TEI database project on the topic of roses from a print annotated bibliography. ... In Creating Digital Knowledge: Library as Open Access Digital Publisher, ... around a 3D model of historic Venetian buildings and a biodiversity community wiki project in Project Management for Digital Projects with Collaborators beyond the Library.

  15. Library digitization projects, issues and guidelines: A survey of the

    - To provide a selective bibliography of literature which explores issues and provides guidelines on library digitization projects., - Literature published from 2000‐2005 on library digitization projects was examined. Issues involving digitization projects are presented, as well as case studies and resources for digitization projects.

  16. Information retrieval from digital libraries in SQL

    The terms are stored in a SQL Server 2005 relational database management system (RDBMS). The authors use about 18,000 document abstracts from the publicly available Digital Bibliography and Library Project (DBLP) and ACM Digital Library abstracts. Their approach on storage, term weighting, and retrieval is presented in a concise manner.

  17. Digital project supports 'bibliographic turn' in Black ...

    The Black Bibliography Project aims to revive and transform descriptive bibliography for African American and Black Diaspora literary studies. Yale's Jacqueline Goldsby and Meredith McGill of Rutgers University recently received a $1.7 million grant from The Mellon Foundation to support the development of The Black Bibliography Project (BBP ...

  18. D-Lib, Digital Library Research

    The NSF/DARPA/NASA Digital Library Initiative (DLI). In the Digital Libraries Initiative Phase 1, there were six federally funded projects in digital library research, with partnerships led by universities. The individual projects are listed below. University of California, Berkeley: An Electronic Environmental Library Project.

  19. Samuel Beckett: Digital Manuscript Project

    Welcome. The Beckett Digital Manuscript Project is a collaboration between the Centre for Manuscript Genetics (University of Antwerp), the Beckett International Foundation (University of Reading), the Oxford Centre for Textual Editing and Theory (University of Oxford), and the Harry Ransom Humanities Research Center (University of Texas at Austin), with the kind permission of the Estate of ...


    Moscow, capital of the Russian Federation, and the second largest city in Europe, with over 12.5 million people. For a city so famous, then why not to have a dedicated Architecture Skyline set? It contains 694 pieces (without the brick remover and spare pieces) and one exclusive printed piece (the 1x8 name tile).

  21. For the first time Rosatom Fuel Division supplied fresh nuclear fuel to

    Rosatom Fuel Division is the world's largest producer of enriched uranium and the leader on the global stable isotope market. The Fuel Division is actively developing new businesses in chemistry, metallurgy, energy storage technologies, 3D printing, digital products, and decommissioning of nuclear facilities.

  22. Russia establishes special site to fabricate fuel for China's CFR-600

    These are unique projects when foreign design fuel is produced in Russia. Since 2010, the first Chinese fast neutron reactor CEFR has been operating on fuel manufactured at the Machine-Building Plant, and for the supply of CFR-600 fuel, a team of specialists from MSZ and TVEL has successfully completed a complex high-tech project to modernise ...

  23. Tvel completes development of new fuel for Paks nuclear plant

    Russian fuel company TVEL has completed a project to develop and validate nuclear fuel modifications for the VVER-440 reactors in operation at the Paks nuclear power plant in Hungary. The full package of documents is handed over to the Hungarian customer, MVM Paks Ltd, for further licensing of the new fuel by the national nuclear power ...