FALL 2015

HathiTrust's Copyright Review Project

Many familiar with HathiTrust may think its open access book collection includes only those titles published up to and including 1922. That's because, according to copyright law, any book published before 1923 is considered out-of-copyright and therefore freely available for reading online and, for HathiTrust's institutional partners (BC is one), for downloading the full PDF text. However, a group of university libraries collaborating with HathiTrust in a copyright review project have thus far added over 270,000 post-1922 books to the public domain. Begun in 2008, this project has helped increase the public domain portion of HathiTrust to 39% of its total collection.

The initial members of the group were four - University of Michigan, University of Wisconsin, University of Minnesota, and Indiana University, but the group has grown to nineteen and now includes Columbia, Duke, Dartmouth, UCLA, among others. Their project involves two review systems, one for US publications (Content Review Management System or CRMS-US), and another for non-US publications (CRMS-World). In its current phase, the project is limited to books in the English language. During 2008-2014, the reviewers found that 58% of the books they examined were out-of-copyright while 16% were in copyright. The remaining 26% were deemed too complex to determine their copyright status for reasons like multiple authorship (too time consuming) and the use of image reproductions requiring additional and possibly more difficult copyright review.

CRMS-US focuses on US books published during 1923-1963 and attempts to determine if a book published during this period has had its copyright renewed. CRMS-World focuses on books published in the UK, Canada, or Australia and tries to determine if the author's death dates fall between the 1870's and the 1940's. In both cases, there are two independent reviews of each volume, with a third review in case of disagreement. Final determinations are used to update the HathiTrust rights database. From the start, the project's goal has been to achieve legally reliable results by identifying a practical scope of inquiry (only books, not serials), comprehensively researching the law, using proven and auditable methods, standardizing training for the reviewers, and – last but certainly not least - expecting complications and uncertainty.

The two basic tools used by the review teams were Stanford University's Copyright Renewal Database and the Virtual International Authority File or VIAF. The latter was used for finding authors' death dates. Other resources included national biographies and obituary databases. It's interesting to note that Wikipedia was one of the tools used, a choice supported by the Library of Congress Research Division. The success of this project can be found in the usage data. Last November, it was reported that of the top 500 views of open access books in HathiTrust, 30% were of books unlocked by the copyright review.

Although new books are being added to HathiTrust all the time, the CRMS project is steadily reducing the percentage of items needing review. Its next big challenges will be to decide how to tackle the 26% of undetermined books and how to go about reviewing books in other languages. But it is clear that its work has already succeeded in opening up a large corpus of material for future scholarship and readers around the world.

NOTE: Much of the information for this article was gotten from "Finding the Public Domain: 19 Institutions Making an Impact," a web presentation given in November 2014 by Kristina Eden, Melissa Levine, and Suzanne Traxler. Information about the speakers and a link to their webinar slides are available on the EDUCAUSE website.

Jonas Barciauskas
Head of Collection Development