Italian Trulli

"The Role Of Data Archives In The Humanities At The University Of Cologne"

Brigitte Mathiak (bmathiak@uni-koeln.de), University of Cologne, Data Center for the Humanities (DCH), Germany und Katja Metzmacher (katja.metzmacher@uni-koeln.de), University of Cologne, Data Center for the Humanities (DCH), Germany und Patrick Helling (patrick.helling@uni-koeln.de), University of Cologne, Data Center for the Humanities (DCH), Germany und Jonathan Blumtritt (jonathan.blumtritt@uni-koeln.de), University of Cologne, Data Center for the Humanities (DCH), Germany

There are three groups of stakeholders, when it comes to research data: Those who make data, those who use data and those who build infrastructure to match those two. In the literature, we find a lot of research on how to build infrastructure and how to share data (often written by the same group of people), yet there is relatively little research (but see Caria and Mathiak, 2018, Kern and Mathiak, 2015, Porter, 2016, Tenopir et al., 2011,2015, Warwick et al., 2008) on what the third group, the users, or rather re-users, actually want and what they do. Most of these studies do also not focus on the Humanities. While for other area of studies, research data sharing and reuse through data archives or journals is far more institutionalised (for different subject culture, see Tenopir, 2011, 2015), this is not so in the subjects covered by Humanities.

1. Methodology

To study the practices and attitudes towards data sharing and reusing of researchers in the field of Humanities, we did an online survey on research data management practices and needs. It consisted of three sections on describing the data worked with or produced, reusing and sharing experiences, practices and attitudes, and knowledge and needs in the area of research data management. It was a follow up study of our research data management survey 2016 (Mathiak and Kronenwett, 2017) adding a part on reusing and sharing experiences, practices and attitudes that was partly adapted from a survey conducted by the Specialised Information Service Social and Cultural Anthropology (Imeri and Danciu, 2017).

The online survey was available between 06 June and 08 July 2018. It was conducted by the Data Center for Humanities (DCH), University of Cologne, in collaboration with the Cologne Competence Center for Research Data Management (C3RDM) and the Deans of the Faculties for Arts and Humanities and Human Studies. 1 It was consisted of 36 closed questions, categorial items always offered the possibility to add additional ones. The sample consists of 268 data sets, some of them did not answer all questions. The sample covers all subject groups and departments of two humanities faculties. For this paper, we are focussing on the questions on reusing and sharing experiences, practices, and attitudes.

2. Results

2.1. Reuse of Data

Over 80% of our participants indicate the scientific benefit of searchable and reusable research data with rather high/high/very high for their field of study (cf. fig. 1).

Figure 1: Rating of the scientific benefit of searchable and reusable data for the scholars’ field of study (N=240).

2.2. Purpose of reuse

There are three aspects being rated highest in relation to the individual field of study: Over 85% of the participants have specified that reuse of data is important for reconstructing results, generating questions and comparison with similar data. In contrast only 59% indicate they would reuse data for reconstruction purposes personally. And even generating new questions and comparison with similar data is rated lower within the personal perspective (cf. fig. 2).

Figure 2: Purposes, scholars would like to reuse data for (N=167).

2.3. Access to reusable data

If researchers reuse data, only 22% have found it in an archive, while the more common way to find data is personal contact, either within the own research group (56%), personal contact (26%) or even complete strangers (34%) (cf. fig. 3). Over all less than 4% of our sample scholars categorically rejected the use of secondary research data.

Figure 3: Experience with secondary data use by source (N=219).

2.4. Handling research data

Only 34% have stored data in an archive, at least 72% consider doing so and only 0,5% cannot imagine storing data in an archive. Nevertheless only one quarter of the 34% that have stored their data in a data archive do so in a openly accessible way (cf. fig. 4).

Figure 4: Accessibility of stored data in data archives (N=70).

2.5. Reasons for not storing

The main reasons for not storing data in a data archive is a lack of knowledge that this is possible (38%) and not finding an appropriate one (24%) (cf. fig. 5).

Figure 5: Reasons for not having used a data archive so far (N=142).

2.6. Conditions for an adequate archive

When asked what conditions an archive would have to fulfill in order for researchers to save data there, the most important factors rated highest in very important would be data security, followed by factors of archive’s confidentiality and professionality (nearly congruent). Looking of the combined important rates (very important/important/rather important) manageable effort for data curation, explicit agreements on licensing and usage, and the quotability of data are also ranked higher than 90%. Explicit (precise) agreements on licensing and usage of data came next. The next important factors were data must be clearly quotable and there should be specific security mechanisms for single information as well as coverage of the additional costs for data curation and storing by research funding organisations.

The factors that were rated least important (combining the important rates) even though still more than 70% indicated them as being very important/ rather important/ important, were sophisticated access restriction, information on who uses the data for what and indexing the data set in different systems (looked at least important of all items).

Figure 6: Rating of factors if considering archiving own data in a data archive (N=208).

If a data archive would fulfill all requirements, nearly 50% of all the scholars answered that all data should be stored. Surprisingly, all research data was ranked highest (cf. fig. 7). That indicates a basic willingness to store data.

Figure 7: What kind of data should be stored in a data archive - if it fulfills all requirements (N=213).

2.7. Information sources for the evaluation of data archives

Finally, even though data archives see themselves as information broker between producers and data reusers: if researchers decide on concrete archives to use, recommendations of colleagues and scientific organisations are most influential, followed by the popularity and reputation of the organisation that funds the archive. (cf. fig. 8). Networks seem to play the major role for choosing a data archive.

Figure 8: Important factors for choosing a data archive (N=215).

3. Conclusions

Sharing data is common and important in the Humanities, but that doesn’t mean that the data ends up in a data archive. Instead, most sharing happens in research groups, with personally known colleagues and even strangers, while data archives only broker 20% of the research data transactions. This is similar to what other disciplines have found (cf. Fecher et al, 2015 for an excellent survey of this topic). For those who could share, there is a conflict of interest between adding to the knowledge commons and self-interest.

In recent years, several policies have been set in place to encourage data sharing outside of the social network, e.g. the requirement of third-party funding agencies to submit a data management plan (European Commission, 2012, NSF, 2018) and through journal data submission policies (McCullough, 2009 and Savage and Vickers, 2009). Bibliometric studies have shown that sharing research data increases citation rate (Piwowar et al., 2007). Yet, in the humanities, these are not the decisive motivations for publishing data. Third-party funding is not as prominent as in the “hard” sciences and neither is pressure to publish in high-ranking journals or getting cited a lot. As a consequence, data archives cannot rely on scholars to seek them out for data deposits. But there are other options.

Scholarly communities are key when it comes to finding and sharing data, but too much of the data gets lost, due to insufficient storage policies. Data archives can help with storing and managing data, but they have to be integrated in the community as indicated by Fig. 8. Awareness of suitable archives is still not as high as one would like (cf. fig. 5), which also can be improved with community-based measures.

For our data center, we have decided to concentrate on the PhD students in our graduate school. What we have found is that they have quite different questions and problems than more senior scholars. While experienced scholars usually have a setup of tools and research data from previous projects as well as an established network of collaborators, PhD students have to cold-start their research in most cases. This gives us the opportunity to introduce them to the possibilities of re-using and sharing research data, while also educating them on digital tools and data management in general.

In our talk, we will introduce some more of the results from the survey and discuss more thoroughly the mechanisms of data sharing and non-sharing. We will also report on our experiences with addressing PhD students and discuss some of the other policies to raise awareness.

Appendix A

Bibliography
  1. Caria, F. and Mathiak, B. (2018). Nutzertests an kritischen Editionen – Print oder Digital? In Vogeler, G. (ed.). Kritik der digitalen Vernunft. Abstracts zur Jahrestagung des Verbandes Digital Humanities im deutschsprachigen Raum, 26.02. - 02.03.2018 an der Universität zu Köln, veranstaltet vom Cologne Center for eHumanities (CCeH). Universität zu Köln, Köln. doi: 10.18716/KUPS.8085
  2. European Commission. (2012). Scientific data: open access to research results will boost Europe’s innovation capacity. http://europa.eu/rapid/press-release_IP-12-790_en.htm?locale=en Last access: 27.11.2018.
  3. Fecher, B., Friesike, S., and Hebing, M. (2015). What drives academic data sharing?. PloS one , 10 (2), e0118053.
  4. Kern D. and Mathiak, B. (2015). Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval? International Conference on Theory and Practice of Digital Libraries : 197-208.
  5. Mathiak, B. and Kronenwett, S. (2017). A Survey on Research Data at the Faculty of Arts and Humanities of the University of Cologne. In Digital Humanities 2017 Conference Abstracts : 294 - 98, https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf , Last access: 27.11.2018.
  6. McCullough, BD. (2009). Open Access Economics Journals and the Market for Reproducible Economic Research. Econ Anal Policy 39: 118–26.
  7. NSF (2018). Dissemination and Sharing of Research Results. https://www.nsf.gov/bfa/dias/policy/dmp.jsp . Last access: 27.11.2018.
  8. Piwowar, H.A., Day, R.S. and Fridsma, D.B. (2007). Sharing detailed research data is associated with increased citation rate. PLoS ONE 2(3): e308.
  9. Porter, D. (2013). Medievalists and the Scholarly Digital Edition. Scholarly Editing 34: 1-26.
  10. Sabine I. and Danciu, I. (2017). Open Data. Forschungsdatenmanagement in den ethnologischen Fächern. Auswertung einer Umfrage des Fachinformationsdienstes Sozial-und Kulturanthropologie an der Universitätsbibliothek der Humboldt-Universität zu Berlin 2016 . Teil I: Statistiken, http://www.evifa.de/cms/ueber-evifa/forschungsdatenmanagement/ , Last access: 27.11.2018.
  11. Savage CJ, Vickers AJ. (2009). Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4: e7078. doi: 10.1371/journal.pone.0007078 PMID: 19763261.
  12. Tenopir, C. et al. (2011). Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6): e21101. doi: 10.1371/journal.pone.0021101.
  13. Tenopir, C. et al. (2015). Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists worldwide. PLoS ONE 10(8): e0134826. doi:10.1371/journal.pone.0134826.
  14. Warwick, C. et al. (2008). If You Build It Will They Come? The LAIRAH Study: Quantifying the Use of Online Resources in the Arts and Humanities through Statistical Analysis of User Log Data. Literary and Linguistic Computing 23(1):85-102.
Notes
1.

Data Center for the Humanities (DCH), http://dch.phil-fak.uni-koeln.de , Accessed 27.11.2018; Cologne Competence Center for Research Data Management (C3RDM), https://fdm.uni-koeln.de , Accessed 27.11.2018.