In this paper, we present a strategy for the integration of existing heterogeneous language resources such as texts and dictionaries by connecting these resources and making them available for internal projects and third-party applications through (Web) APIs. We describe our approach in the context of the C-SALT initiative ( Cologne South Asian Languages and Texts 1 ), which gathers projects and resources hosted at the University of Cologne covering South Asian languages. To illustrate the potential use of our approach, we first introduce VedaWeb, a web-based platform that provides access to ancient Indian texts composed in Vedic Sanskrit, the oldest form of ancient Indo-Aryan. Then we describe the C-SALT APIs for dictionaries 2 . These APIs make several large Pāli and Sanskrit dictionaries available online. Building on that, we present the architecture behind these APIs, and finally we summarize by analyzing the potential role of APIs in Digital Humanities (DH) projects.
The cornerstone of VedaWeb is a digital edition of the Rigveda, one of the oldest and most important texts of the Indo-European language family, which comprises approx. 160,000 words. VedaWeb can be accessed either via a web application 3 or directly via an API 4 . VedaWeb provides several layers of linguistic and philological information, alongside various editions of the text of the Rigveda. A search function with multiple linguistic parameters is available (including lemma, word form, morphological and metric information), which allows to execute queries across different levels of annotation by means of complex, combined search criteria. Besides the annotated version of the text, further layers include the display of translations (including Geldner, 2003; Grassmann, 1876; Griffith, 1896, Renou, 1956-1969) as well as commentaries to the Rigveda (Oldenberg, 1909/1912, Renou, 1956-1969). Parallel to the morphological annotations, all of these additional information layers can be accessed via full-text search as well as a more structured search function. The possibility to combine these multiple layers is crucial for enabling novel perspectives on the data, e.g. by means of quantifying feature combinations or by identifying context-dependent phenomena such as different types of constructions. VedaWeb is meant to advance research in all areas of Vedic studies, for example in syntax (e.g. referential null objects (Keydana & Luraghi 2012), non-configurationality (Reinöhl, 2016)), morphology (e.g. the Vedic vr̥kī -type (Widmer, 2007), ya -presents (Kulikov, 2012)) or word formation (e.g. compounds (Scarlata & Widmer, 2015)).
An important feature of VedaWeb is the enrichment of the Rigveda text by linking each word with entries from the standard dictionary for the Rigveda by Hermann Grassmann (Grassmann, 1873). Instead of encapsulating the data in the application, our approach is to leave the resource ‘in place’ and obtain the data via the C-SALT APIs for Sanskrit Dictionaries 5 .
The C-SALT APIs for Dictionaries 6 have been developed to provide access to existing lexicographic resources in Pāli and Sanskrit without doubling work or hosting efforts. The dictionaries available via these APIs are also accessible through traditional monolithic web applications, like the Critical Pāli Dictionary Online 7 , and the Cologne Digital Sanskrit Dictionaries 8 , which are a product of a major Sanskrit digitization project (Kapp & Malten, 1997).
The basis of the APIs and of the VedaWeb application are versions of the texts and dictionaries encoded in TEI 9 -XML 10 . We employ a TEI schema 11 developed initially for the three most complex Sanskrit dictionaries (Apte,1920; Böhtlingk & Roth, 1855-1875; Monier-Williams 1899). By using one TEI schema, we not only achieve data persistence, but we also achieve a consistent structure for all dictionaries. While software such as frontend applications or APIs change over time, TEI offers the DH community the safest way to assure data persistence. For this reason, all the data accessed through APIs is ultimately based on TEI files. The different C-SALT projects use different technologies as ‘middleware’ between TEI and endpoints and also different Web API technologies: REST (Fielding, 2009) and GraphQL 12 . Independently of the technology employed, our APIs focus on performance and on providing well-documented access to curated linguistic data.
Developing APIs means the separation of concerns. In the specific case of APIs: Well-curated data that should be efficiently accessed, through a clearly defined structure. For web applications this means : Focusing on a specific user target, employing, if required, multiple APIs. We have described the potential use of APIs for lexicographic resources. There are several advantages to making the data accessible through APIs instead of encapsulating the data within the application. Instead of forcibly homogenizing diverse data sets into a general data model, it is more efficient to provide a common interface for accessing them. This also opens up opportunities to employ the different resources in the context of other applications. The main goal in developing C-SALT is to keep all resources as modular as possible, so that they can be used and reused in different research scenarios. In the case of VedaWeb, this currently applies to the dictionaries involved, but we see the potential to transfer the concept onto the other information layers as well, in particular the Rigveda text and its translations. In general, we believe that an API based approach to digital resources and data in the Digital Humanities provides efficient access to data and encourages the reuse of available resources. It thus facilitates novel uses by other researchers while avoiding repetition of work and unnecessary redundancy of resource instances. Applications are transient, but the knowledge, represented by the data, may stay and be reused.
Text Encoding Initiative: https://tei-c.org/
Extensible Markup Language
https://github.com/cceh/c-salt_dicts_schema