Italian Trulli

ISEBEL an Intelligent Search Engine for Belief Legends

Distributed around the globe more databases of folktales, including belief legends, have come into existence. Combining them might open up new and exciting research possibilities. ISEBEL (Meertens Institute, 2019a) is a project aiming to create a search engine that makes exactly this possible by providing unified search over the participant's database, while dealing intelligently with the various languages. The following databases are currently providing their stories to the project:

  1. the Dutch Folktale Database (Meertens Institute, 2018),
  2. the Evald Tang Kristensen Collection (UCLA, 2018), and
  3. the WossiDia archive (Wossidlo Institute, 2018) (Rostock).

We are looking out to and working on including other interesting archives and languages throughout Europe - especially a few more Nordic countries already have shown interest, i.e., Iceland, Norway, Sweden and Estonia.

In the project a set of metadata elements about the stories in the databases have been identified - apart from the text, there is information about narrator, time and place of narrating (including geocodes), keywords/tags and the URL leading to the story and metadata in the original database. These elements are part of a core metadata schema, which already provides hooks to be easily extended with more (database specific) information. The databases use this to provide metadata about their stories. The central catalogue of ISEBEL (Meertens Institute, 2019b) runs a harvester periodically, which collects this metadata. The technical basis of this communication is provided by the OAI-PMH protocol.

The collected metadata is loaded and indexed by CKAN (CKAN Association, 2018), an open source data portal. The harvested data from each ‘data provider’ are stored as individual XML files by the harvester. The XML files, which contain single stories in each one of them, are imported using local API calls to CKAN. Researchers can then search for the data through CKAN and its underlying indexer, namely Solr (Apache Software Foundation, 2018).  

The project is still under heavy development to reach its design goals, e.g., to provide researchers with cross-language search results rather than in only one language. In order to achieve this goal, the translation and keywords extraction must work together with keywords mapping to interconnect the stories and keywords in different languages.  The domain specific keywords are the keywords specific to one language, which are either manually attached to the stories by the data provider, difficult to translate using machine translation or even not available in other language(s). Those keywords will be manually maintained and mapped to the computer generated keywords, discussed next, to make the mapping complete.

The computer generated keywords are those keywords extracted from the English story translated from the local language of the data provider. According to the test runs, keywords quality extracted from translated stories are much better than translated keywords. The main reason is that the entire story provides more context for an adequate translation.

At the moment ISEBEL is focussing on the specific genre of the traditional belief legend, mainly because all three databases have this folktale genre in common (almost 60.000 legends in total). These legends mainly deal with traditional folk belief in the supernatural, like ghosts, hauntings, devils, witches, wizards, spells, werewolves, nightmares, giants, trolls, goblins etc., as well as stories about hidden treasures, famous robbers, underground passages and sunken castles. The information on folktales collected from the initial three databases make it possible to analyze folktales and traditional folk belief in a large coastal region of the North Sea and the Baltic Sea, where especially Migratory Legends are interesting. Research can include geographic dispersion. For instance, the dispersion of legends about mermaids will most likely show narratives in the direct coastal areas. Another possible research could be gender related: what is the difference between male and female repertoires? How close do the legends play around the own home, and is there a difference here between male and female narrators?

Appendix A

Bibliography
  1. Apache Software Foundation (2018). Apache Solr . lucene.apache.org/solr/ (accessed on 26 November 2018).
  2. CKAN Association (2018). CKAN, the world’s leading Open Source data portal platform . ckan.org (accessed on 21 November 2018).
  3. Meertens Institute (2018). Nederlands VolksverhalenBank . verhalenbank.nl (accessed on 21 November 2018).
  4. Meertens Institute (2019a). ISEBEL . isebel.eu (accessed on 25 April 2019).
  5. Meertens Institute (2019b). ISEBEL central catalogue . search.isebel.eu (accessed on 25 April 2019).
  6. UCLA (2018). The Evald Tang Kristensen Collection . etkspace.scandinavian.ucla.edu (accessed on 21 November 2018).
  7. Wossidlo Institute (2018). WossiDiA: The Digital Wossidlo Archive . wossidia.de (accessed on 21 November 2018).
QiQing Ding (qiqing.ding@di.huc.knaw.nl), KNAW Humanities Cluster, Netherlands, The and Theo Meder (theo.meder@meertens.knaw.nl), Meertens Institute, Netherlands The and Menzo Windhouwer (menzo.windhouwer@di.huc.knaw.nl), KNAW Humanities Cluster, Netherlands, The