Boston College Libraries Faculty Newsletter



Google Partners with Libraries in Major Digitization Initiative

The Project


In December, 2004 Google announced a major new initiative. The company will work with the libraries of Stanford, Harvard, and Oxford Universities, the University of Michigan and the New York Public Library to digitally scan books from their collections, include the content into the Google index and then allow users worldwide to search this content in Google. Different arrangements are being made with the five institutions. All seven million volumes in Michigan’s library will be scanned, a task that will take about six years. Stanford has agreed to a pilot phased project, though all its eight million books will be scanned. Oxford’s Bodleian Library will contribute an unspecified, though large, number of its pre 1900 public domain works. NYPL will initially contribute only a subset of its non-copyrighted material. About 40,000 of Harvard’s fifteen million volumes will be digitized in its pilot project. The pilot will then be evaluated and a decision made about digitizing far larger numbers of Harvard’s volumes. Though it is currently unclear how many volumes Google will eventually digitize from the five libraries, the final figure might easily be as high as thirty million.


As Google’s press release stated, “Users searching with Google will see links in their search results page when there are books relevant to their query. Clicking on a title delivers a Google Print page where users can browse the full text of public domain works and brief excerpts and/or bibliographic data of copyrighted material. Library content will be displayed in keeping with copyright law.” The new project is an expansion of the Google Print program, which assists publishers in making their books searchable online. Presently Google locates the books found by a Google Print search at the top of the page indicated by an icon of books to the left. Under its new initiative, Google does not plan to have a separate search engine specifically devoted to searching the scanned monographs. This has resulted in the criticism that the digitized books, though numbering in millions, may be swamped by the hundreds of millions of other web pages searched during a Google search.


Other Full-Text Digitization Projects


Google’s initiative is, of course, not the first book digitization project engaged in by libraries and others. Large monographic digitization programs include Michigan’s own Digital Library Text Collection, Oxford’s Text Archive, the Alex Catalogue of Electronic Texts, The Electronic Text Center at the University of Virginia,Project Bartleby Archive, Project Gutenberg, Berkeley’s Literature@SunSITE, the Internet Archive Million Book Project. Will such projects, small in scale when contrasted with Google’s undertaking, survive? It is not yet clear. While most of the books digitized in these projects are freely available, there are other large sophisticated digitization projects that may be purchased, for example Early English Books Online (EEBO), Eighteenth Century Collections Online, Evans Digital Edition/Early American Imprints all of which are owned by BC Libraries. Will libraries continue to expend often considerable funds on full-text collections like these? I think that the answer is yes, at least for the next several years. First of all, they currently exist and scholars need them now. They are also discrete uniform collections that allow complex searching. It is unlikely that Google will permit users to select such a distinct body of works that make up, say, Evans Early American Imprints, out of all its millions of digitized materials and facilitate advanced searching of this sub-group. Nevertheless, I believe that Google is raising the bar for future digitization projects. The latter’s survival will surely depend on what value, for example scholarly essays, biographical materials, annotations, sound, video etc., they add to mere digitized text to create more attractive packages.


Reactions to Google’s Initiative


Many are criticizing Google’s new initiative. One influential library author argues that it will be disastrous for Google users to have access to the full-text of only pre-1923 monographs, that is works in the public domain, the implication being that users will confine their searches to this material and fail to seek out later works. This seems particularly ironic as for years librarians and others have been critical of students’ tendency to limit their reading to electronic material much of which is of recent vintage. Michael Gorman, Dean of Library Services at Cal State, Fresno and President-elect of the American Library Association, is also quite critical of the Google initiative. As he argued in an op-ed piece in the Los Angeles Times ( 17 Dec., 2004): “books in great libraries are much more than the sum of their parts. They are designed to be read sequentially and cumulatively, so that the reader gains knowledge in the reading.” He considers that the results of a Google search of these millions of electronic volumes will be an array of disconnected, frequently meaningless parts of books. Still, many are applauding the new venture. University of Michigan President Mary Sue Coleman observed: “This project signals an era when the printed record of civilization is accessible to every person in the world with Internet access.” As a statement from Harvard University Library declared, looking forward to the future greater involvement by Harvard in the project, “For users outside of Harvard, the larger project would make accessible the full text of a large number of public-domain books. It would also make the copyrighted portion of the Harvard collection searchable. Including works from the vast Harvard library collection in an information location tool available on the Internet would greatly expand the scope and quality of information available to a worldwide audience of knowledge-seekers.”


Many librarians and faculty have for years been critical of the increasingly pervasive Google culture and the great range of quality of web content to which Google’s search engine points. Some contend that though the internet makes readily available so much information, far too many students are still ignorant of enormous amounts of knowledge not available on the web and indeed are unwilling to seek such material. However, the inclusion of millions of books in this new digitization project should result in Google searches retrieving more quality hits. A potentially wonderful benefit of this new undertaking is that it will alert huge numbers throughout the globe to the existence, as well as the full-text content, of many of the world’s books. Some might say that the goal of Google is grandiose, i.e. “to organize the world's information and make it universally accessible and useful. Since a lot of the world's information isn't yet online, we're helping to get it there. Google Print puts the content of books where you can find it most easily – right in Google search results.” Nevertheless, to the extent that Google makes some, indeed a great deal, of this latter knowledge accessible on the web, it is indisputably a great boon. It may not be hyperbolic to predict that Google’s initiative will create the world’s first great virtual library.




Though a wonderful virtual library is nigh, let me underscore my conviction that this new Google initiative will not herald the imminent demise of the research library as we know it. Rather, I believe that it will assist the latter in developing into the hub of a pervasive electronic community where diverse information technologies will become ever more integral to the university's mission of teaching, research and learning. The library is evolving and assuming new roles; it is far from becoming obsolete. Nor do I believe that librarians are emulating the Irish elk by becoming extinct! Initiatives like that of Google and others that are introducing such positive changes to the range and availability of the informational world, while at the same time rendering information assessment so challenging, will ensure that librarians continue to play their established role in providing library instruction and in helping students critically evaluate the worth of information that is now so accessible.


Brendan Rapple

Collection Development Librarian


Questions, comments? Contact the BC Libraries Newsletter Review Board.
Subscribe to the Faculty Newsletter