FALL 2014

Digital Preservation and Boston College Libraries

The Library of Congress recently held its annual Digital Preservation conference in Washington, DC. This two day event was attended by over 300 people. Many of the participants are stewards of digital collections, i.e. they are responsible for ensuring that their institution's digital content remains accessible to patrons for the foreseeable future. The content of interest to this community is quite diverse and includes digitized special collections, scholarly works, newspapers, multimedia, websites, and social media just to name a few. The papers and posters at this conference were, in effect, a progress report on how well we are all doing at implementing systems that can be trusted to preserve our digital content.

An excellent solution for digital preservation (DP) would need to meet many requirements. The main goals are to make sure that every item of lasting value can be easily found, its authenticity can be verified, its history can be ascertained, and it can still be used (viewed, listened to,…) in the near and distant future. Digital collections will need to survive technology failures, obsolescence of software and hardware, natural disasters, security breaches, or just a lack of adequate information about the items making them difficult to discover. The conference is an opportunity for DP practitioners to co-develop best practices and standards in order to address DP challenges in an affordable and scalable manner. It is also an opportunity to collaborate in the adoption and refinement of open source "free" DP tools and infrastructure, rather than relying on vendors who over time could make DP too expensive.

As part of a "break-out session", I gave a talk on "Developing a Born-Digital Preservation Workflow" based on work that Jack Kearney (Irish Music Center) and I have accomplished this year. A hard drive had been donated to Boston College by Mary O'Hara, a world-famous Irish harpist and soprano. Jack and I evaluated the electronic records on this hard drive and in the process developed a methodology that can be used going forward for any other "born-digital" content received by the Burns Library. We looked for (and found) virus-infected files, took an inventory of the contents of the hard drive (21,988 files amounting to 104.3 GigaBytes), discovered and tabulated a plethora of file formats (many of them proprietary), located some files containing personally identifiable information (raising privacy concerns), and identified a great many duplicate files (the same files, but in different folders). At the outset we computed a unique alphanumeric value for every file, a value that will persist as long as the bits in each file remain exactly the same; this value will be re-computed in the future to verify a file's integrity and authenticity. Boston College's archivists will soon build upon our work, making many of these electronic records accessible as part of the Mary O'Hara collection and connecting them (via links) to the online finding aid that researchers often use as the entry point to collections in the Burns Library.

In the past seven years Boston College has gradually developed a robust system for preserving our digital collections. We have been able to do this in part by forming partnerships with other major institutions facing the same challenges. For example, as one of the 50 members of the MetaArchive Cooperative, we preserve several of BC's major collections: its dissertations and theses (since 2008), the Hanvey photographic collection, the Brooker Collection of American Legal and Land Use Documents, the Becker collection of Civil War era drawings, and others. Along with some MetaArchive fellow members, we have participated in two major grants (one about newspaper collections, the other about theses and dissertations). Being an active contributor to and participant in the larger DP community has allowed us to ride the wave of progress being made at the Library of Congress and elsewhere rather than going it alone.

Of course with the challenge of preserving ever-increasingly complex content (e.g., websites, blogs, multimedia,…) we need to sustain our efforts to look for even better tools that can help us to automate and document the actions that are undertaken over time to ensure that the cultural legacy of this era is available to future generations. We, the current stewards of BC's collections and institutional records, must see well beyond the quotidian concerns of our jobs and organizations and think proactively about what we need to do today so that our colleagues of the future will understand the strategies and tactics that we have devised and be able to improve upon the foundation that we have created. Ultimately, the success of our efforts will be gauged by the extent to which future information-seekers will be able to find and use the intellectual treasures hosted by Boston College.

Bill Donovan
Digital Imaging & Curation Manager
O'Neill Library