Italian Trulli

Art History and Big Data: Complex Collaborations between Institutions and Researchers

Anne Helmreich (ahelmreich@getty.edu), Getty Research Institute, United States of America and Koenraad Brosens (koen.brosens@kuleuven.be), KU Leuven and Charles van den Heuvel (charles.van.den.heuvel@huygens.knaw.nl), Huygens Institute and Saskia Scheltjens (S.Scheltjens@rijksmuseum.nl), Rijksmuseum and Sandra van Ginhoven (SVanginhoven@getty.edu), Getty Research Institute, United States of America and Emily Pugh (epugh@getty.edu), Getty Research Institute, United States of America

INTRODUCTION

This panel contributes to “Complexities” by examining the topic of art history and big data in complex collaborations between institutions and researchers. With art museums and cultural heritage institutions now digitizing their collections, moving towards open access policies, and other relevant trends, scholars of art history and material culture can develop datasets at an unprecedented scale. At first glance, when compared to data in the natural and social sciences, big data in the cultural heritage community appears very different. Nevertheless, this panel argues for the relevancy of the framework of big data for cultural heritage and art historical institutions and their data. Most critically, these data are sufficiently large and complex that they cannot be run on local laptops, or managed by a solo researcher, as has been the research practice in cultural and art histories up to now.

The case of art history is of particular interest for the digital humanities in the age of big data given the tension between the methodology of pattern recognition associated with big data and the strong art historical disciplinary tradition of close readings and contextualization of singular objects. This big data holds the promise of allowing scholars to study the history of works of art and the lives of artists collectively and to test critically the conclusions of previous generations of individual researchers who attempted to identify significant patterns in the study of the arts using relatively small data sets.

Adopting the framework of big data also aids in identifying significant issues for advancing the field of cultural heritage studies. Researchers in this sector, with this relatively new access to unprecedented amounts of data, are facing important questions regarding data standardization and data modeling, while recognizing the challenges of ambiguity in historical humanities data, in order to curate and to preserve this data sustainably within large institutional infrastructures and servers. This data will become even bigger when it is disseminated as Linked Open Data within the Semantic Web, which is a common aim shared across the panel. These complex problems must be addressed to make optimal use of these costly digitization and research infrastructure programs.

Furthermore, the emergence of large data sets in cultural heritage institutions require a necessary critical reflection on epistemological, methodological, and analytical/hermeneutical issues concerning their use in research and education. We aim for a discussion around these topics that is not only relevant for art/cultural historians but also for digital humanists who are seeking to assemble analogous research datasets or to utilize such datasets in the Linked Open Data environment. We have identified several complexities as emerging from institutions investing in building big data infrastructures that are intended to serve researchers and students as well as broader disciplines and even potentially general audiences. We focus on complexities in data curation, user interfaces, and the skills needed and training of humanities researchers.

Data Curation : Academic projects or individual researchers are in need not only of data, but also of data curation that supports specific research questions. Yet, cultural heritage institutions often have a wider public mission to produce data sets for the general public, and sometimes regard data curation for specific research purposes as a contradictory to their aim to provide “objective” data available for all.

User Interfaces : Academic projects and individual researchers need to analyse, annotate, and contextualize big data provided by these cultural heritage institutions for their own research and to store the results locally. Collection infrastructures often provide access to their big-data via api’s and sparql endpoints. But not every project or individual researcher will have the equipment to run and align the data for their own purposes. Moreover, most humanities researchers lack the skills to query these big data sets using sparql. The focus of these big data interfaces on analyzing patterns does not allow for hermeneutic approaches that require data handling from multiple perspectives in continuous, iterative processes. The annotation and contextualization of incomplete and ambiguous data that result from bringing heterogeneous datasets together require the co-development of user interfaces by computer scientists, information specialists and humanities scholars. These interfaces need to include the provenance of this data and to express differences in data-quality.

Skills and Training : What would it entail for cultural heritage institutions to be open to modeling and curating their (meta-)data together with researchers and to offer them user interfaces to enrich their data? What knowledge and skills will researchers need to acquire? For example, they would presumably need to work with standards, vocabularies, and ontologies to ensure their knowledge and expertise is computer-readable and interoperable. This will require changes in the working practices and education of humanities researchers who still have a strong focus on the peer review publication model and whose training may not have prepared them for such perspectives on working with data. It also requires rethinking how we acknowledge various contributions to these complex collaborations.

Big Data Art Histories Projects in Europe and US: Sharing Expertise

This panel brings together projects in Europe and the US (listed below) that engage with art histories and big data, in particular Linked Open Data, in academic and cultural heritage institutions. These projects share an engagement with questions of art history and cultural histories (such as the behavior of the creative industries in historically important centres) as well as human-computer interaction and the digital humanities in general.

While initiated separately, as the projects have developed, the participants have come to recognize a common need and interest in developing research infrastructures that rely upon the preparation of data for research in standardized (and interoperable) ways in accordance and compliant with the standards developed/used by the large organizations that support digital humanities research, such as Dariah, CLARIN, CLARIAH etc. For the art historical and cultural history community, the formulation of shared data standards is not only a question for academically based research projects but also for museums and libraries. How might, then, standards developed for museums, such as CIDOC/CIDOC-CRM, translate to the academic research community? What other ways of working with cultural heritage materials can be extended across the academic, archive, and museum communities and their admittedly inherently different original infrastructures? How do we negotiate and leverage our institutional legacies with respect to data, and its standardization, management and presentation while also looking forward to a shared future of Linked Open Data and large datasets? How might such efforts contribute to the larger effort to develop (inter-)national research infrastructures for digital humanities research?

PROJECTS :

Golden Agents: Creative Industries and the Making of the Dutch Golden Age ,” a partnership of the Huygens ING, Meertens Instituut, University of Amsterdam, University of Utrecht, Vrije University Amsterdam, Rijksmuseum, KB National Library of the Netherlands, City Archives of Amsterdam, RKD Netherlands Institute for Art History, and Lab1100, aims to establish a sustainable infrastructure for the study of the interactions between producers and consumers and between the various branches of the creative industries across the long Golden Age of the Dutch Republic. Hereto, it uses a combination of semantic web and multi-agent technologies to link datasets of the production of the creative industries to essential indexed archival resources such 2 million scans of digitized notarial acts, including probate inventories, and other related archival documents, in order to investigate the consumption of cultural goods in all layers of society. The discussion on the complexities of this project will be focused on the identification of name- and geo-entities and creation of user-interfaces for data-alignment and for the representation of the provenance, completeness, and level of certainty of heterogeneous data. (Charles van den Heuvel)

Project Cornelia ” (https://projectcornelia.be), an interdisciplinary research project funded by the University of Leuven and the Flemish Science Foundation (Belgium), examines the creative communities and industries located in seventeenth-century Antwerp and Brussels, a period of intense productivity particularly in the fields of painting and tapestry. “Project Cornelia” propagates slow digital art history. It develops hybrid, novel, and transferable strategies to analyze, explore, model, present, and visualize complex data (i.e. biggish, ambiguous and incomplete archival data) with a threefold aim: to revisit traditional/ analogue art historical questions; to raise new questions that could have not been asked let alone addressed before the digital turn; to bring together both art historians and computer scientists as they develop ways to navigate the complexities of the data universe presented by cultural heritage data. (Koenraad Brosens)

The Getty Provenance Index ,” a major research endeavor of the Getty Research Institute, assembles together data relevant to the ownership, transfer, and exchange of works of art as documented in archival inventories, auction sales catalogues, and dealer’ stock books. It currently holds over 1.7 million records that are being transformed into Linked Open Data (LOD) in order to make this information more accessible and usable, and to support research on the dynamic relationships that governed the mobility of cultural artefacts from the early modern period to the mid-twentieth century. The discussion will focus on the challenges of moving from a transcription-based to an event-based data model and developing two interface systems (one for users and another for editors), as well as challenges stemming from linking and modeling disparate data (i.e. maintaining unique people identifiers, using existing or developing controlled vocabularies within the LOD ecosystem, balancing issues of completeness and usability regarding the data model, and communicating gaps and levels of completeness in the data). (Sandra van Ginhoven)

Ed Ruscha’s Streets of Los Angeles Archive ” is a research project of the Getty Research Institute built upon the archive of over half a million images of Los Angeles streets created from 1974–2010 by Ed Ruscha. Focusing on streets such as Hollywood Boulevard, Melrose Avenue, and most famously, “the Sunset Strip,” Ruscha generated a staggering collection of images that document the changing urban landscape of Los Angeles. Since the majority of images exist as unprocessed negatives and film contact sheets—that is, inherently unstable media subject to rapid deterioration and degradation—digitization enabled both preservation and access, but also created new challenges. The scale of the digital archive presents complexities: how to allow researchers to search 130,000 digital images in a way that is intuitive, comprehensible, and retains the nature of the archive (as opposed to a collection of individual images)? What tools could be leveraged to conduct analysis on hundreds of thousands of images? This initiative investigates new approaches to tackling the complexities presented by large-scale image-based archives, including how to generate metadata in tandem with image digitization, the enrichment of image metadata with geospatial coordinates, and how to make such information accessible to researchers. (Anne Helmreich)

The Department of Research Services at the Rijksmuseum has embarked on the transformative project to align all collection information and data of the Rijksmuseum. This work entails data cleaning, metadata alignment and data modeling in order to move data out of its former silos into a linked format that will enable new connections to be observed across the museum’s collections and beyond the institution’s walls. (Saskia Scheltjens)

Format: After an introduction by Anne Helmreich, each project representative will present a brief overview of their respective projects. Then the panelists turn to overarching questions and themes which emerged as initial outcomes of their recently organized Lorentz workshop : Art Histories and Big data (15-19 October 2018, ca. 50 art historians, computer scientists and information specialists). The ensuing discussions, which we will open up to the digital humanities community at large, will inform the process of developing a white paper as an output of this workshop. The following topics are intended to discuss the emergence of large data sets in cultural heritage institutions in tandem with reflections on the epistemological, methodological, and analytical/hermeneutical issues concerning their use.

DISCUSSION:

1) How can academic and cultural heritage institutions with different missions and publics collaborate in developing new infrastructures that support the use of big data in art historical/cultural heritage studies? University-based research projects, for example, tend to produce data sets designed to support specific research questions whereas institutions need to realize projects in more generic ways that support multiple communities. Technical issues of data storage and retrieval, data modeling, data alignment, and complex problems of the representation of data quality and provenance of heterogeneous datasets with incomplete and ambiguous data will be addressed.

2) How can scholars analyse, annotate, and contextualize big data provided by these cultural heritage institutions for their own research and store the results locally? How might these enhanced datasets be associated with their data origins? These and related questions provide insight into the interfaces scholars need to interact with big data. How can we produce generic user interfaces that harness the capacities of Linked Open Data while sustaining current research hypotheses and stimulating new research questions? How can institutions that develop infrastructures for research optimally leverage the expertise of individual scholars?

3) How can the scholars of the future be best prepared? What are the implications of big data for education in art and cultural histories? How can we formulate common research questions, and develop curricula to train and prepare students for the study of art histories/cultural heritage in the digital era?