I’d like to use my first blog post to advertise the ultimate story book – which also happens to be one of the most successful works of information-based science: the Aarne-Thompson index.
Briefly, this index tries nothing less than to identify and categorize every folklore tale. Although the first version of the index is more than a hundred years old, it has stayed as a useful tool for folklore research ever since.
While I am no folklorist, it seems to me that parts of the success of the Aarne-Thompson index stem from time-less design decisions for organizing data. As these ideas extend beyond any single field of science, they shall be the topic of this post on data science.
Foremost the basic unit of the Aarne-Thompson index is general and tailored to an intrinsic property of the studied objects: It categorizes tales by their narratives.
I suspect that a further reason for the choice of narratives has been to prevent conflicts. While the index has been criticized for using narratives rather than the underlying intentions of contained motifs, or the historic roots of individual tales, this criticism also shows that narratives are not the most exciting area of research in folklore. Thus the index does not codify and thus interfere with ongoing scientific discourse and unanswered and perhaps debated research questions. Similarly, there would be little incentives for individual researchers to promote a re-edit of the index according to their own research results and their interpretations thereof.
At the same time, the unit of narratives allows subtle discrimination without being too specific. For instance, “A race with the trickster’s son”, “A climbing contest with a squirrel”, and “A race is won by a look-alike helper”, are different entries within the index. Each of those entries contains multiple tales, which had been discovered in different countries. Although all of these tales could also be described by the term “A contest won by betrayal”, the latter would be too generic, and consequently does not constitute an entry within the Aarne-Thompson index.
Equally important, the definition of an individual index entry is easy to understand: One would not have to have a PhD in folklore to correctly file a new tale describing a climbing competition against a squirrel, or to use the Aarne-Thompson index to find tales with such a narrative. Since the Aarne-Thompson index contains around 2,000 different narratives, there is a very high chance that a suitable narrative is already included. Indirectly this also means that the discovery of novel narratives, and thus the potential formulation of a novel entry of the index, would be restricted to the small subset of folklorists, which are experts in the domain of narratives.
Perhaps even more importantly, the Aarne-Thompson index avoids rigorously forcing the objects of folklore research into an artificial frame (such as an index). For instance, the very last few pages of the index contain folktales, where one might debate, if they had a narrative. One example is “Shall I tell it again”, where one specific tale would go: “Once there was a cat, with its paws made of cloth, and its eyes turned back. Do you want me to tell it again?”.
In line with this thinking, the Aarne-Thompson index also respects major earlier classification principles, but allows some reasonable exceptions: For the major part, the index adheres to the separation between folktales, and stories, which were once believed to be true. Being an index of folklore, the Aarne-Thompson index is missing most of the latter stories. As noted by the folklorist D.L. Ashliman, the index however contains religious stories lacking a definition of time and space, and stories that highly resemble invented tales, such as accounts of vampires.
Finally, the design of the index promotes its extension. When Aarne created the first version of the index, he centered his research on central Europe. While this was an obvious shortcoming of his work, I admire the way he would deal with the practical restrictions and limitations of research: Rather than hiding the bias of his work, or including a poorly performed survey of more data sources, Aarne made the bias of his research evident. This choice would encourage a continuation of his work beyond the scope, which he could have achieved by himself. Consequently, his index has received two major updates after his lifetime (which would also extend the list of authors and thus the name of the index).
From the practical perspective of a researcher in a field with a rapid turnover of databases, and versions thereof, I find it particularly astonishing that Aarne would create a framework that could be updated without the need to worry about backwards-compatibility or subtle particularities of earlier versions. For instance, his index contains multiple mechanisms to deal with ambiguity without requiring discussions at the level of entries. Thus, also at this occasion, the design of the index reduces the risk that later scientific debates would compromise the structure of the index – and the potential emergence of incompatible versions of it. Unfortunately, a detailed discussion of these mechanisms to deal with ambiguity would extend beyond the scope of a single blog post.
Perhaps the most direct trace for a clever design of the index towards a later extension are gaps, which are kept for narratives that are not yet included in the index. Each entry of the index has a specific number, and entries with similar narratives are listed close to each other, and thus have a similar number. Yet the index unavoidably contains major conceptual shifts between certain narratives (e.g. at the transition from magic tales to religious tales). At these transition regions the index contains large gaps in the numbering scheme. These gaps have allowed the integration of completely new narratives (or perhaps more accurately: previously not filed narratives) without enforcing a re-numbering of existing entries or a need to place new narratives afar from similar narratives.
While the Aarne-Thompson index contains even more design decisions, I shall here discontinue my appraisal of the qualities of this index. Instead I’d like to directly encourage any reader of this blog to open a commented edition, such as the one by D. L. Ashliman, and to start reading the narrative of (almost) every known tale, and to discover how Aarne has created a lasting information-based research tool.
Image credit: The New Ruffian, Creative Commons.