Italian Trulli

Project Endings: Early Impressions From Our Recent Survey On Project Longevity In DH

Stewart Arneil (sarneil@uvic.ca), University of Victoria, Canada and Martin Holmes (mholmes@uvic.ca), University of Victoria, Canada and Greg Newton (gregster@uvic.ca), University of Victoria, Canada

Despite the thousands of digital projects launched during the past 20 years, experts warn of a new “digital dark age” (Cerf, 2015; Davis, 2016) as our ability to produce digital information continues to outpace our capacity to preserve and access that knowledge for the long term, even (or especially) when using content management systems (Montoya, 2016).

Project Endings is a collaboration between the Humanities Faculty and the University Library which aims to provide practical solutions to issues attendant on ending a project and archiving the digital products of research, including not only data but also interactive applications and web-based publications. Project Endings endeavours to align the aims of faculty researchers producing projects and the archivists who will eventually be responsible for curating their work.

The project divides digital projects into five primary components: data, products, processing, documentation, and release management. We aim at longevity primarily for data and products, but believe that this goal requires careful attention to processing, documentation and release management. We are developing preservation principles for all of these factors, using practice-based methods (Holmes, 2017; Arneil and Holmes, 2017; Holmes and Takeda, 2018), diagnostic tools (Holmes and Takeda, 2017), and scholarly research as listed in the project bibliography at https://hcmc.uvic.ca/endings/EndingsListofReferences2015.pdf .

The project conducted a survey on the LimeSurvey platform consisting of 30 questions https://hcmc.uvic.ca/endings/survey.html to discover how project leaders dealt with the issues of long-term sustainability for each of the five primary components. We promoted the survey to Canadian and international professional communities and received 128 responses. 25 detailed interviews were run with a sample of the respondents to get more information on the issues raised by the survey results.

Results of the survey show that concerns about longevity for digital humanities projects are not exaggerated. 57% of survey respondents did not consider an endpoint for their project, despite the fact that project management principles include declarations of goals, timelines, and milestones (Zanduis and Stellingwerf, 2013). In the light of this, perhaps it is not surprising that 54% did not have long-term preservation plans. These findings suggest that many researchers do not distinguish between products generated to exploit the features of the processing environment and products generated to survive after active work on the project ends or independent of development work in the project. Furthermore, only 32% considered “benchmarks for assessing progress” and 41% included precise timelines in their plans.

In a group of projects that were for the most part (74%) less than 10 years old and 58% still in progress, 22% reported that project outputs stopped working due to software obsolescence. This is in a field of projects in which 74% started with born-digital data. If a failure occurs during the active life of the project it might be repairable, but repair is much less likely if the project has ended.

The value of using a standardized data model is not universally recognized, with 14% of survey respondents not using one at all and 26% making up their own. Although a home-made data model is by definition not standardized, it may still be viable for a long time if well documented. 60% claimed to have a clearly documented data model, but 90% of those that had documentation considered it to be partial or inadequate, so it appears that a project’s data model is well documented in only about 50% of cases.

HTML is the most popular standard output for DH projects (68% of respondents used it), despite the continued popularity of PDF (45%), XML (38%), and various binary media formats (>65%). Javascript is considered by many (30%) to be a major technology in their project. HTML and Javascript are robust long-term (Holmes, 2017), but if they are produced in a project only on-the-fly by a content management system (CMS) or database, then the longevity of the output is dependent on that of the CMS or database. 34% of the respondents used WordPress or Drupal, 31% used PHP/SQL databases, 38% used XML/XSLT/XQuery systems, and 41% used “other” software services and libraries. Some projects used more than one of these.

Lack of ongoing funding was cited by 38% of respondents as the main obstacle to long term preservation. Perhaps more surprisingly, 33% of respondents rated either lack of expertise or bad technology choices as their main obstacle, which may explain the results reported above regarding software obsolescence. Early results from the interviews suggest that CMS and other software libraries and services are the likeliest sources of software failure over time. We hope that further analysis of the interviews will tell us whether a more expert assessment of software and output choices would have mitigated the issue of lack of ongoing funding.

While a reassuringly high 42% of respondents reported that university services were responsible for long-term maintenance of the project’s work, an alarming 45% reported that this responsibility fell to the Principal Investigator or nobody, demonstrating either significant vulnerability or great confidence.

Our survey results suggest that there is a limited use of project management (“What is PRINCE2?”, 2018; Sedlmayer et al., 2015) and software lifespan principles in DH projects. Results further suggest that there is a need for an improved understanding by researchers of specific attributes of a project which are likely to facilitate long-term viability of the project data, outputs and documentation at minimal cost for those charged with preservation. Blurring the lines between data, processing, outputs and the management of those components over time can result in vulnerabilities for long term preservability which may not be apparent until it is too late.

With all of this in mind Project Endings is working on a suite of recommendations that will provide guidance on project structure and management with long term viability as the goal. We are offering an online interactive questionnaire that assesses the long-term viability of each component in a project and provides recommendations for improving the prospects for long-term survival. Behind each question is the empirical evidence provided by survey/interview participants as well as the combined experience of the Project Endings team. The questionnaire is intended primarily to be a thought-provoking activity for project leaders and principal investigators. An early draft of the questionnaire is available at https://hcmc.uvic.ca/endings/questionnaire.htm .

Appendix A

Bibliography
  1. Arneil, S. and Holmes, M. (2017). Archiving form and function: preserving a 2003 digital project. Brighton, U.K.
  2. Cerf, V. (2015). Google’s Vint Cerf warns of ‘digital Dark Age’ http://www.bbc.com/news/science-environment-31450389 .
  3. Davis, R. C. (2016). Die Hard: The Impossible, Absolutely Essential Task of Saving the Web for Scholars. Skidmore College, Saratoga Springs, U.S.A. https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1077&context=jj_pubs .
  4. Holmes, M. (2017). Selecting Technologies for Long-Term Survival. Victoria, Canada https://github.com/projectEndings/Endings/raw/master/presentations/SHARP_2017/mdh_sharp_2017.pdf .
  5. Holmes, M. and Takeda, J. (2017). Beyond Validation: Using Programmed Diagnostics to Learn About, Monitor, and Successfully Complete Your DH Project. Montreal, Canada https://dh2017.adho.org/abstracts/140/140.pdf .
  6. Holmes, M. and Takeda, J. (2018). Why do I need four search engines?. Tokyo, Japan https://conf2018.jadh.org/files/Proceedings_JADH2018.pdf#page=58 .
  7. Montoya, R. D. (2016). Advocating for Sustainability: Scaling-Down Library Digital Infrastructure. Journal of Library Administration, 2016(56:5): 603–20 doi:10.1080/01930826.2016.1186969.
  8. Sedlmayer, M., Coesmans, P., Fuster, M., Schreiner, J. G., Gonçalves, M., Huynink, S., Jaques, T., et al. (eds). (2015). Individual Competence Baseline for Project, Programme & Portfolio Management. International Project Management Association http://products.ipma.world/wp-content/uploads/2016/03/IPMA_ICB_4_0_WEB.pdf#page=27 .
  9. Zanduis, A. and Stellingwerf, R. (2013). ISO21500: Guidance on Project Management – a Pocket Guide. Van Haring Publishing https://www.vanharen.net/Samplefiles/9789087538095SMPL.pdf#page=21 .
  10. What is PRINCE2? Projects in Controlled Environments https://www.prince2.com/uk/what-is-prince2 .