Google Partners with Libraries in Major Digitization
Initiative
The Project
In December,
2004 Google announced a major new
initiative. The company will work with the libraries of
Stanford, Harvard, and Oxford Universities, the University
of Michigan and the New York Public Library to digitally scan
books from their collections, include the content into the
Google index and then allow users worldwide to search this
content in Google. Different arrangements are being made with
the five institutions. All seven million volumes in Michigan’s
library will be scanned, a task that will take about six years.
Stanford has agreed to a pilot phased project, though all its
eight million books will be scanned. Oxford’s Bodleian
Library will contribute an unspecified, though large, number
of its pre 1900 public domain works. NYPL will initially contribute
only a subset of its non-copyrighted material. About 40,000
of Harvard’s fifteen million volumes will be digitized
in its pilot project. The pilot will then be evaluated and
a decision made about digitizing far larger numbers of Harvard’s
volumes. Though it is currently unclear how many volumes Google
will eventually digitize from the five libraries, the final
figure might easily be as high as thirty million.
As Google’s press release stated, “Users searching
with Google will see links in their search results page when
there are books relevant to their query. Clicking on a title
delivers a Google Print page where users can browse the full
text of public domain works and brief excerpts and/or bibliographic
data of copyrighted material. Library content will be displayed
in keeping with copyright law.” The new project is an expansion
of the Google Print program, which assists publishers in making
their books searchable online. Presently Google locates the books
found by a Google Print search at the top of the page indicated
by an icon of books to the left. Under its new initiative, Google
does not plan to have a separate search engine specifically devoted
to searching the scanned monographs. This has resulted in the
criticism that the digitized books, though numbering in millions,
may be swamped by the hundreds of millions of other web pages
searched during a Google search.
Other Full-Text Digitization Projects
Google’s
initiative is, of course, not the first book digitization project
engaged in by libraries and others. Large monographic digitization
programs include Michigan’s own Digital
Library Text Collection, Oxford’s
Text Archive, the Alex
Catalogue of Electronic Texts, The
Electronic Text Center at the University of Virginia,Project
Bartleby Archive, Project
Gutenberg, Berkeley’s Literature@SunSITE,
the Internet
Archive Million Book Project. Will such projects, small in
scale when contrasted with Google’s undertaking, survive?
It is not yet clear. While most of the books digitized in these
projects are freely available, there are other large sophisticated
digitization projects that may be purchased, for example Early
English Books Online (EEBO), Eighteenth
Century Collections Online, Evans
Digital Edition/Early American Imprints all of which are
owned by BC Libraries. Will libraries continue to expend often
considerable funds on full-text collections like these? I think
that the answer is yes, at least for the next several years.
First of all, they currently exist and scholars need them now.
They are also discrete uniform collections that allow complex
searching. It is unlikely that Google will permit users to select
such a distinct body of works that make up, say, Evans Early
American Imprints, out of all its millions of digitized
materials and facilitate advanced searching of this sub-group.
Nevertheless, I believe that Google is raising the bar for future
digitization projects. The latter’s survival will surely
depend on what value, for example scholarly essays, biographical
materials, annotations, sound, video etc., they add to mere digitized
text to create more attractive packages.
Reactions to Google’s Initiative
Many are criticizing Google’s new initiative. One influential
library author argues that it will be disastrous for Google users
to have access to the full-text of only pre-1923 monographs,
that is works in the public domain, the implication being that
users will confine their searches to this material and fail to
seek out later works. This seems particularly ironic as for years
librarians and others have been critical of students’ tendency
to limit their reading to electronic material much of which is
of recent vintage. Michael Gorman, Dean of Library Services at
Cal State, Fresno and President-elect of the American Library
Association, is also quite critical of the Google initiative.
As he argued in an op-ed piece in the Los Angeles Times (
17 Dec., 2004): “books in great libraries are much more
than the sum of their parts. They are designed to be read sequentially
and cumulatively, so that the reader gains knowledge in the reading.” He
considers that the results of a Google search of these millions
of electronic volumes will be an array of disconnected, frequently
meaningless parts of books. Still, many are applauding the new
venture. University of Michigan President Mary Sue Coleman observed: “This
project signals an era when the printed record of civilization
is accessible to every person in the world with Internet access.” As
a statement from Harvard University Library declared, looking
forward to the future greater involvement by Harvard in the project, “For
users outside of Harvard, the larger project would make accessible
the full text of a large number of public-domain books. It would
also make the copyrighted portion of the Harvard collection searchable.
Including works from the vast Harvard library collection in an
information location tool available on the Internet would greatly
expand the scope and quality of information available to a worldwide
audience of knowledge-seekers.”
Many librarians and faculty have for years been critical of
the increasingly pervasive Google culture and the great range
of quality of web content to which Google’s search engine
points. Some contend that though the internet makes readily available
so much information, far too many students are still ignorant
of enormous amounts of knowledge not available on the web and
indeed are unwilling to seek such material. However, the inclusion
of millions of books in this new digitization project should
result in Google searches retrieving more quality hits. A potentially
wonderful benefit of this new undertaking is that it will alert
huge numbers throughout the globe to the existence, as well as
the full-text content, of many of the world’s books. Some
might say that the goal of Google is grandiose, i.e. “to
organize the world's information and make it universally accessible
and useful. Since a lot of the world's information isn't yet
online, we're helping to get it there. Google Print puts the
content of books where you can find it most easily – right
in Google search results.” Nevertheless, to the extent
that Google makes some, indeed a great deal, of this latter knowledge
accessible on the web, it is indisputably a great boon. It may
not be hyperbolic to predict that Google’s initiative will
create the world’s first great virtual library.
Conclusion
Though a wonderful virtual library is nigh, let me underscore
my conviction that this new Google initiative will not herald
the imminent demise of the research library as we know it. Rather,
I believe that it will assist the latter in developing into the
hub of a pervasive electronic community where diverse information
technologies will become ever more integral to the university's
mission of teaching, research and learning. The library is evolving
and assuming new roles; it is far from becoming obsolete. Nor
do I believe that librarians are emulating the Irish elk by becoming
extinct! Initiatives like that of Google and others that are
introducing such positive changes to the range and availability
of the informational world, while at the same time rendering
information assessment so challenging, will ensure that librarians
continue to play their established role in providing library
instruction and in helping students critically evaluate the worth
of information that is now so accessible.
Brendan
Rapple
Collection Development Librarian