Italian Trulli

Clearing the Air for Maintenance and Repair: Strategies, Experiences, Full Disclosure

James Smithies (james.smithies@kcl.ac.uk), King's College London and Arianna Ciula (arianna.ciula@kcl.ac.uk), King's College London and Jessica Otis (jotis2@gmu.edu), George Mason University and Faolan Cheslack-Postava (fcheslack@faolancp.com), George Mason University and Martin Holmes (mholmes@uvic.ca), University of Victoria and Stewart Arneil (sarneil@uvic.ca), University of Victoria and Greg Newton (gregster@uvic.ca), University of Victoria and Jasmine Mulliken (jasmine.mulliken@stanford.edu), Stanford University Press

1. Introduction

The digital humanities (DH) community has reached an inflection point. Conceptual issues related to DH are now routinely discussed, and a significant body of literature about DH tools and methods exists, but very little is said about the challenges of maintaining the projects and tools that result from DH activity. The truth is that many teams are struggling with decades of accumulated technical debt, and the natural process of technological entropy. Lack of openness about the problems of sustaining and, when appropriate, archiving DH projects is natural: our problems expose weaknesses at the heart of our community, make us feel insecure when comparing ourselves to other teams, prompt us to question our relationship with funding agencies, and raise questions about the sustainability of our core activities. But we need to discuss the issues so we can increase our understanding, share best practices, and advocate for change. Long-term technical maintenance can be daunting, but good planning, carefully considered processes, transparent and healthy relationships with administrators and IT departments, and some common sense can resolve most issues.

Our panel brings together four DH software engineering teams and initiatives, based in the United Kingdom, United States, and Canada, with responsibility for over 350 projects built over two decades. The panel aims to ‘clear the air’, by openly discussing the problems we face and detailing the security, maintenance, archiving, and sustainability solutions we have put in place to resolve them. The wide range of issues across the international DH community mean that we can only initiate a conversation, but we hope our commitment to full disclosure, coupled with a degree of geographical breadth, will help set the tone for a new era of collaboration and information sharing. Our long-term goal is to foster a culture of sustainability across the DH community, and respect for those engaged in the essential work of maintenance and repair.

Our four panelists will describe the work of two DH centres and two archiving and sustainability projects, detailing the wide range of approaches they have adopted to manage technical but also operational (human resources, administrative) and financial complexities. Common themes are apparent across the four teams, related to forward planning and consideration of application and data life-cycle management, but differences exist too. Sometimes differences relate to technical philosophy (a decision to support heterogenous technologies, or a decision to focus on a more defined toolset); sometimes differences relate to the exigencies of local funding and operational realities (permanent versus fixed-term HR contracts, availability of internal or external maintenance funding). In all cases, however, our technical development is informed by a firm belief – resulting from hard experience – that maintenance and repair are integral to the art of making. Our panel members believe the next phase in the evolution of DH will require greater attention to the practical, epistemological, and methodological imperatives of maintenance and repair.

2. Paper One: We Need to Maintain 100 Projects, Without Funding? Really? A Pragmatic Approach to Archiving and Maintenance

King’s Digital Lab (KDL) was launched in November 2015. It was established to improve Digital Humanities (DH) software engineering process and quality, and ensure digital research is scalable and sustainable. It works in partnership with a DH department, and other departments across the Faculty of Arts & Humanities at King’s College London, and comprises a Director, Project Manager, Systems Manager, and a team of 10 analysts, UI/UX designers, and engineers. The team inherited ~100 projects when it was established, built using heterogeneous technologies over two decades. The vast majority of those projects had no funding for maintenance and were running outdated operating systems and software as a result. Several had experienced minor hacks, and the infrastructure they used was approaching end of life. The Lab ‘estate’, although representing a significant corpus of high-quality DH research attracting ~250,000 unique users per year, constituted significant risk to DH at our university, and security risk to the wider university network.

Over the last three years the team have undertaken a full audit of its projects and integrated archiving, sustainability, and research data management to its Software Development Lifecycle (SDLC). An internal process triaged the projects according to security risk, scholarly value, cultural and cultural heritage value, ‘brand’ value, and maintenance cost. Principal Investigators (PIs) were contacted and given a range of options to consider, from archiving their site to upgrading it and placing it under a Service Level Agreement (SLA), usually of between 2 and 5 years. Approximately 50 sites (most of which were of relatively low scholarly value, or merely proof of concept quality) were quickly archived, with the remainder being upgraded and moved to SLAs progressively over 18 months. A major infrastructure upgrade was implemented over the same period, providing 50TB of disk space, ~1TB RAM, and enterprise backup systems capable of supporting significant growth. The process of discussing projects with PIs and finding funding to upgrade and maintain projects was often difficult. The Faculty supported every project led by one of their staff members, and offered cost-recovery rates to all other projects, but many PIs had difficulty reconciling themselves to the need for ongoing maintenance and support or found it difficult to gain support from their administrators (even at cost-recovery level).

At the time of writing all but a handful of projects have been resolved, but the process highlighted significant gaps in understanding across the DH community along with policy issues related to funding of DH projects. Archiving and sustainability issues have been reduced to an acceptable level in the Lab, but risks remain (as they do for any similar digital team), and ongoing management will be required. Archiving and sustainability are now integral to the Lab’s software engineering process, from requirements elicitation during concept development to archiving or ongoing maintenance in the post-funding period. The Lab’s processes are aligned, in turn, to the University’s Research Data Management (RDM) process and an effort is being made to align technical design with the national web archive.

3. Paper Two: "Running God Knows How Many Versions of PHP": The Challenges of Successfully Sustaining Digital Projects

In 2019, the Roy Rosenzweig Center for History and New Media is celebrating its 25th year, including a long legacy of both creating and sustaining digital history projects. Founded on the principle of democratizing history, RRCHNM is committed to creating open-source software and sustainable digital projects; millions of people use center-designed software such as Zotero, Omeka, and Tropy, while tens of millions visit the center's various project websites every year. Nor is this engagement limited to the center's most recent accomplishments; one of the center's older projects, the twenty-year-old History Matters, is still receiving over three million website hits annually.

While this widespread and continuous engagement with center software and websites is a testament to the intellectual value of the center's digital scholarship, it also intensifies the sustainability pressure inherent in all digital projects. The center's earliest projects involved bespoke code which requires considerable labor either to maintain or to later transition to a content management system, e.g. Omeka, which was built in part to streamline the creation and management of the center's digital projects. Even projects running on a CMS still require upkeep, as new versions of the system or its components are released; external links break when websites restructure or eliminate content; and older technologies such as Adobe Flash are deprecated. Furthermore, the servers of which these projects live eventually reach the end of their hardware's lifespans and the projects must be migrated to new servers to survive.

These technological challenges can be further exacerbated by logistical conditions that arise in a environment where collaborations happen across departments and institutions, projects have temporally limited funding support, and project team members are either soft-funded staff positions or graduate students who are—usually, though not always—transient members of an institution. While transdisciplinary and cross-institutional collaboration, grant funding, and graduate student involvement are generally seen as positive aspects of the digital humanities as a field, they lead to logistical challenges with respect to sustainability. The decentralization of knowledge across institutions and teams, as well as personnel discontinuity over time, leads to challenges in documenting and maintaining contextual knowledge around projects as they age, often requiring personnel to "reinvent the wheel" during the already challenging and un(der)funded process of technologically updating old projects.

Despite these challenges, RRCHNM has sustained its digital projects over the course of its existence and—thanks to the institutional support it receives from its associated department and college, as well as the leadership of its previous and current directors—is well positioned to continue sustaining its digital projects for the foreseeable future. This paper will discuss both the institutional conditions that enable RRCHNM to sustain its digital projects as well as the technical and logistical challenges that it must overcome when continuously sustaining a wide variety of digital projects created by hundreds of people over the course of two and a half decades.

4. Paper Three: Ruthless Principles for Digital Longevity

Project Endings is a SSHRC funded collaboration which aims to provide practical solutions to issues attendant on ending a project and archiving the digital products of research, including not only data but also interactive applications and web-based publications. Endings is a collaboration between the Humanities Faculty and the Library at the University of Victoria, and endeavours to align the aims of faculty researchers producing projects and the archivists who will eventually be responsible for curating their work.

Using both practice-based methods and scholarly research, Endings is already producing recommended approaches (Holmes 2017; Arneil & Holmes 2017; Holmes & Takeda 2018) and diagnostic tools (Holmes & Takeda 2017) that will assist scholars in ensuring that their project will be completed, archivable, functional, and available well in to the future.

The project has conducted a survey with 128 project leaders, and conducted 28 follow-up interviews to gain insight in to the practical issues faced by DH scholars. Simultaneously it has been actively working on 'ending' several existing in-house projects (The Diary of Robert Graves, Le mariage sous L'Ancien Régime, The Map of Early Modern London, and a number of others) using the a set of principles developed from our work. These principles focus on reducing technological overhead and applying software development best practices to the planning and construction of a project’s digital outputs. Our methodology is based on paring back the range of technologies used to the absolute minimum (HTML, CSS and JavaScript), and building completely static web materials with no dependence on any server-side technologies.

The Endings project divides digital projects into five primary components: data, products, processing, documentation, and release management. We aim at longevity primarily for data and products, but believe that this goal requires careful attention to processing, documentation and release management. We are developing preservation principles for of these factors, and this presentation will discuss key components of the principles along with their justification and practicality.

Many of these principles are uncontroversial. For instance, principle 1.1, “Data is stored only in formats which conform to open standards and which are amenable to processing (TEI XML, GML, ODF, TXT)” would not be surprising to anyone. Others are more demanding and are likely to meet strong resistance from some members of a project team; programmers may be unsettled by the demand that there be “no dependence on external libraries: no JQuery, no AngularJS, no Bootstrap,” or puzzled by the requirement that “every page contains all the components it needs, so that it will function without the rest of the site if necessary, even though this means duplicating information across the site.” This “ruthless” set of maxims can make rapid development and deployment more difficult, but the principle of “hard now, easy later” is the only real guarantee of digital longevity for projects which, while they may be curated, are never likely to be actively maintained over the long term.

5. Paper Four: Balancing Innovation & Persistence in Digital Scholarly Publications

In the past few years, several university presses have been awarded funding by the Andrew W. Mellow Foundation to meet the needs of digital humanists and social scientists who are pushing the bounds of traditional print publishing practices and seeking to output their arguments in a form that matches their methodologies. These multimodal, multilinear, open-access, web-based publications follow a parallel editorial and production workflow as traditional scholarly monographs, and as peer-reviewed scholarly works, carry the same weight in consideration of tenure and promotion for the scholars who create them. Stanford University Press is pushing farther than others by allowing authors to choose their own web-based platforms or builds rather than offering ones designed in-house. This openness introduces challenges for preservation and persistence of the publication. But rather than limit the potential for innovative expression, SUP is investing in exploring possibilities in digital preservation for complex interactive works.

This program has seen the publication of four unique projects to date that could not possibly be rendered as print books but whose place in the scholarly record is just as important to their fields of study. It is important then that they can endure the specific threats that web-based digital content faces. The publisher needs to mitigate the potential fragility of these formats by keeping a handle on the complexity, but they must also support the innovative and courageous strides scholars are making as they rightly challenge the politics and limitations of traditional print publishing model. Essentially, they must balance innovation with durability.

To intercept the dangers to persistence introduced in digital formats, SUP implemented a careful multi-phase set of guidelines for preservation and archiving. These start with pre-production technical recommendations that encourage but do not force authors to choose platforms and design elements that are durable and archivable. They also initiate a three-pronged preservation strategy during production and immediately after initial publication that can ensure viable access to and experience of the work once inevitable obsolescence creeps in to the live hosted product. These include documentation and digital repository deposit, web archiving, and emulation.

While devoting resources not usually deployed in a university press context, SUP is also talking to authors before, during, and after the development and publication of their work and learning that expectations for their works’ longevity vary significantly. While most acknowledge and even embrace the risks and ephemerality of the digital, they also put trust in the Press to ensure the kind of persistence associated with traditional publications. Some expect to share the responsibility of persistence and others would be satisfied with a time stamp that covers academic milestones related to tenure and publishing expectations. A candid dialogue about existing and developing preservation practices in such programs will hopefully both assure scholars that publishers are aggressively pursuing preservation as part of the responsibility of publication, and also invite DH authors to share their expectations and concerns with the challenges digital presentation formats present.

Appendix A

Bibliography
  1. Arneil, S. and Holmes, M. (2017). Archiving form and function: preserving a 2003 digital project. Paper presented at the DPASSH Conference 2017: Digital Preservation for Social Sciences and Humanities, Brighton.
  2. Ciula, A., Nyhan, J. and Moulin, C. (2013). ESF Science Policy briefing on research infrastructures in the digital humanities: landscapes, ecosystems and cultures . Lexicon Philosophicum, 1: 277–87.
  3. Hodder, I. (2014). The Entanglements of Humans and Things: A Long-Term View. New Literary History, 45(1): 19–36.
  4. Holmes, M. and Takeda, J. (2017). Beyond Validation: Using Programmed Diagnostics to Learn About, Monitor, and Successfully Complete Your DH Project. Paper presented at the Digital Humanities 2017, Montreal https://dh2017.adho.org/abstracts/140/140.pdf.
  5. Holmes, M. and Takeda, J. (2018). Why do I need four search engines? Paper presented at the Japanese Association for Digital Humanities Conference, Tokyo https://conf2018.jadh.org/files/Proceedings_JADH2018.pdf#page=58.
  6. Holmes, M. (2017). Selecting Technologies for Long-Term Survival. Paper presented at the SHARP Conference 2017: Technologies of the Book, Victoria, BC https://github.com/projectEndings/Endings/raw/master/presentations/SHARP_2017/mdh_sharp_2017.pdf.
  7. Maxwell, J. W., Bordini, A. and Shamash, K. (2017). Reassembling Scholarly Communications: An Evaluation of the Andrew W. Mellon Foundation’s Monograph Initiative (Final Report, May 2016). The Journal of Electronic Publishing, 20(1) doi:10.3998/3336451.0020.101. http://hdl.handle.net/2027/spo.3336451.0020.101 (accessed 26 April 2019).
  8. Jackson, S. J. (2014). Rethinking Repair. In Gillespie, T., Boczkowski, P. J. and Foot, K. A. (eds), Media Technologies. The MIT Press, pp. 221–40.
  9. King’s Digital Lab (2019). KDL’s pragmatic approach to managing 100 Digital Humanities projects, and more... King’s Digital Lab https://www.kdl.kcl.ac.uk/our-work/archiving-sustainability/ (accessed 26 April 2019).
  10. Mulliken, J. (2018). 3 Approaches to the Preservation of Interactive Scholarly Works. SupDigital http://blog.supdigital.org/3-approaches-to-the-preservation-of-interactive-scholarly-works/ (accessed 26 April 2019).
  11. Nicholson, C. (2018). Keeping the lights on. Research Europe: 13.
  12. Nowviskie, B. (2015). Digital Humanities in the Anthropocene. Digital Scholarship in the Humanities, 30(suppl_1): i4–15 doi:10.1093/llc/fqv015.
  13. Rusbridge, C. (2007). Arts and Humanities Data Service decision. Digital Curation Centre http://www.dcc.ac.uk/news/arts-and-humanities-data-service-decision (accessed 26 April 2019).
  14. Russell, A. and Vinsel, L. (2016). Hail the Maintainers. Aeon https://aeon.co/essays/innovation-is-overvalued-maintenance-often-matters-more (accessed 9 June 2017).
  15. Smithies, J., Westling, C., Sichani, A.-M., Mellen, P. and Ciula, A. (2019). Managing 100 Digital Humanities Projects: Digital Scholarship & Archiving in King’s Digital Lab. Digital Humanities Quarterly, 12(4).
  16. Smithies, J. (2017). The Digital Humanities and the Digital Modern. Basingstoke: Palgrave Macmillan https://www.palgrave.com/gb/book/9781137499431.
  17. Smithies, J., Sichani, A.-M. and Westling, C. (2017). Preserving 30 years of Digital Humanities Work: The Experience of King’s College London Digital Lab. Paper presented at the DPASSH Conference 2017: Digital Preservation for Social Sciences and Humanities, Brighton.
  18. Smithies, J. (2019). The Continuum Approach to Career Development: Research Software Careers in King’s Digital Lab. King’s Digital Lab https://www.kdl.kcl.ac.uk/blog/rse-career-development/ (accessed 26 April 2019).
  19. Straumsheim, C. (2015). Researchers, university press directors emboldened by Mellon foundation interest in academic publishing. Inside Higher Ed https://www.insidehighered.com/news/2015/02/25/researchers-university-press-directors-emboldened-mellon-foundation-interest (accessed 26 April 2019).
  20. Waters, D. J. (2016). Monograph Publishing in the Digital Age. The Andrew W. Mellon Foundation https://mellon.org/resources/shared-experiences-blog/monograph-publishing-digital-age/ (accessed 26 April 2019).