HathiTrust Full Text Downloading Now Available

In 2004, Google made a stunning announcement: it would work with five major research libraries to digitize 15 million books within ten years. What once seemed like the stuff of futuristic dreams would become a reality, thanks to a high-tech company synonymous with internet searching. But for some, Google's involvement was problematic. The print holdings of some of the world's greatest libraries were being digitized by a for-profit corporation. There was no guarantee that Google would provide the same level of stewardship of the digital record that libraries have provided for their print and manuscript collections for centuries.

Clearly there was a need to come up with a solution that would ensure the proper management of this new and growing digital body of texts and, in 2008, a group of research libraries agreed to form HathiTrust. With a membership of over 70 research institutions, HathiTrust's mission is "to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge." According to Jeremy York, HathiTrust Project Librarian, the organization's name was inspired by the Hindi word for elephant or hathi (hah-tee), a wonderfully apt choice because, as York explains, the animal symbolizes great size, strength, wisdom, trustworthiness, and a reliable memory.

Boston College joined HathiTrust in the fall of 2011, but it was only recently that the final piece, a type of login technology required by HathiTrust, was put in place. Members of the Boston College community can now perform two actions hitherto not possible: download any HathiTrust text in the public domain and create collections of HathiTrust items on the HathiTrust site. To log in, BC faculty, students, and staff can now go to the HathiTrust home page, click on the Log-In button, select Boston College in the dropdown menu, click on Continue, enter BC username and password, and click on Sign In.

Huck Finn

As of this writing, the HathiTrust Digital Library contains 10.7 million volumes of which 3.37 million or 31% are in the public domain. Most of the content comes from the member libraries' books and serials digitized in the Google and Internet Archive digitization projects, but it also includes items digitized by the HathiTrust partners themselves. Our HathiTrust membership has allowed the Boston College Libraries to contribute 2,179 volumes so far with more to come. Over four hundred languages are represented as are books from all periods of book publishing history including incunabula (books printed before 1501) like the Calendarium by Johannes Regiomontanus (Venice, 1476), a German mathematician and astronomer, and an early 1885 edition of Mark Twain's The Adventures of Huckleberry Finn complete with 174 illustrations and an illustrated cover.

Users can find books in HathiTrust by using its search interface or Holmes, the Boston College Libraries' discovery system. If using HathiTrust's search interface, you can search just its catalog or its texts (together with catalog information). Keep in mind that you will only be able to download works which are in the public domain. Full text search results for works still in copyright will include page numbers where your search terms occur but downloading will not be an option (with one exception - see the next paragraph). If using Holmes, begin with a search using the basic or advanced modes and then, in the Refine My Results column under Local Collections, select HathiTrust Digital Library. It is important to keep in mind that in Holmes you will be searching HathiTrust's out-of-copyright texts. The 1.5 million HathiTrust records in Holmes are for public domain works only. If you want to search HathiTrust's in-copyright texts, you will need to use the HathiTrust interface.

Although in-copyright works cannot be downloaded, it is possible for print disabled users to gain full text access to all texts. The HathiTrust website provides information about how to gain permission for eligible users to view full text copyrighted works and how to make use of its PageTurner application which facilitates web page navigation.

The other newly available feature, the ability to create collections of HathiTrust texts on its website, offers researchers a way of enhancing access to the material they regard as the most important for their work. But it also gives them a way of laying the groundwork for textual analysis of a selected set of works. In order to facilitate textual analysis and text mining, the HathiTrust Research Center was launched in 2011 by Indiana University and the University of Illinois with the purpose of providing "a secure computational and data environment for scholars to perform research using the HathiTrust Digital Library." As an exercise, I created a collection of nineteen volumes of the Monumenta Historica Societatis Iesu and called it Jesuitica; it is now available to the public for searching. The HathiTrust website has more information about how to get involved in this particular development of its Digital Library. It is safe to say that the creation of this enormous digital collection - truly a cultural "elephant" of information - is the beginning of a vital new chapter in the development of academic scholarship.

Jonas Barciauskas
Head of Collection Development, O'Neill Library