Digital editing of the cuneiform texts from Haft Tappeh

Cuneiform texts are transliterated and made available digitally Vanessa Liebler for the i3mainz, CC BY SA 4.0

The project Digital editing of the cuneiform texts from Haft Tappeh is dedicated to the transliteration and digital availability of more than 600 cuneiform texts from Haft Tappeh, Iran. The aim of the project is the elaboration and further development of a digital workflow, taking into account existing tools, international standards and computer linguistic analysis methods.


Haft Tappeh (the ancient city of Kabnak) is located in southwestern Iran in the province of Khuzestan, about 15 km southeast of the ancient city of Susa. Its geographical position made Haft Tappeh an important site in Bronze Age history and culture. To date, large-scale excavations have uncovered more than 1,400 text fragments of cuneiform tablets in the Babylonian language, along with the architectural remains of a palace. Most of them are administrative documents, whose final linguistic editing is still pending.

In the first phase of the project, funded by the German Research Foundation (DFG), the 600 to 650 texts excavated by Behzad Mofidi-Nasrabadi of Johannes Gutenberg University Mainz will be digitally edited and prepared for machine reading. The processing with contemporary methods and the open availability of the results intend to enable the investigation of paleography, lexis, syntax, tablet formats, text categories, bureaucratic protocol and modus operandi of the important text corpus beyond the confines of Ancient Near East Studies.

For this purpose, the i3mainz is developing a digital workflow for cuneiform tablets that starts with the existing 3D data and photographs of the tablets and digitally processes the contents using transliteration and computer-linguistic and semantic annotation. The focus here is not on the creation of a new portal to make cuneiform data available, but rather on the production of FAIR data that can be integrated into other existing repositories or those under construction. The acronym FAIR stands for findable, accessible, interoperable, reusable, and it refers to the internationally accepted principles for making research data available. The tools developed are made available in a Git repository, making it easier to replicate the workflows in other projects. Consequently, not only the data is findable, accessible, interoperable and usable, the software is as well.


In February 2021, colleagues from the fields of Assyriology and Computational Linguistics who deal with digital editions, their infrastructural prerequisites, philological and linguistic requirements and concepts relevant to information science exchanged ideas at the DFG-funded virtual workshop “Status quo and current developments in digital cuneiform editions” organised by i3mainz. The aim of the workshop was to jointly develop solution strategies for the digital edition of cuneiform texts.

It was preceded by a workshop for young scholars with almost 60 participants from the Initiative for Digital Cuneiform Studies (IDCS), which was founded for this purpose. It was organised by the project members Eva Huber, Tim Brandes and Timo Homburg and funded by the programme “Small Subjects - Visibly Innovative” of the German Rectors’ Conference. The proceedings of the workshop were submitted for publication in the journal of the Cuneiform Digital Library Initiative at the end of 2021.

A number of other networking activities followed, including a coordination meeting with the Cuneiform Digital Library Initiative (CDLI) to agree details on the provision of the digital edition data from the Haft Tappeh project. By the end of 2021, the image data had been consolidated to such an extent that their transfer to the repository of Heidelberg University Library could be prepared. This was based on the metadata schema designed by a cross-project team of the i3mainz to document the creation process of 3D objects.

Initiated by a workshop of the CDLI, a deeper cooperation with regard to a formalisation of cuneiform palaeography has been taking place since August 2021. The impetus came from the publication in October 2021 of PaleoCodage, an encoding system that had already been developed in a previous project phase. These formalisations are not only interesting for the CDLI, but also for the W3C Ontolex-Lemon Working Group, especially its subgroup Multimodality. This group is concerned with the formalisation of dictionaries for different languages and writing systems and would like to develop a data model for the representation of writing systems across language boundaries.

Using the example of around 30,000 cuneiform tablets from the Hilprecht Collection, the Haft Tappeh team tested how annotations can be realised on two-dimensional image media. The result is a web application under development which, as “CuneiformAnnotator”, not only provides a basis for integrating these technologies into the “CuneiformWorkbench”, but also enables the classification of various cuneiform characters. The tool was tested in a crowdsourcing process via the platform Zooniverse as part of courses in Ancient Oriental Studies and by external cooperation partners. At the NFDI4Culture Plenary in November 2021, the Haft Tappeh team presented the concept of annotations on 3D models in a short talk entitled “Rich and sustainable annotations on 3D objects”.

A practical project within the framework of the inter-university Master’s programme Digital Methodology in the Humanities and Cultural Studies was dedicated to the back-projection of already existing 2D to 3D annotations. The results of the practical project are currently under review in the CDLI Journal and the Journal for Open Data in Archaeology (JOAD). At the Linked Pasts VII Symposium in December 2021, the ontology model of the Haft Tappeh project was presented in a poster contribution.

A second practical project was dedicated to the development of similarity metrics on 3D models of cuneiform tablets. The aim was to create a digital fingerprint of the 3D scans, which can be compared with objects in larger data repositories. The similarity to so-called reference bodies such as standard spheres or cuboids was measured and related to the 3D scans. Combined with features such as find locations or text content, the scans can be taken into account for further insights.

The 2022 project year was characterised by the expansion of international collaborations, for example with the Cuneiform Digital Library Initiative(CDLI). As part of the Google Summer Of Code, students developed technologies for image annotation. The highlight was the Securing Data in Mesopotamia: New Technologies for Secured Cuneiform Texts workshop in Leiden in March. Kai-Christian Bruhn and Timo Homburg presented the results of the Haft Tappeh project and discussed the prospects for future cooperation with those present.

Together with the international DANES Network for Digital Cuneiform Research, founded in 2022, members of the Haft Tappeh Project 2023 developed the data model for the representation of the digital paleography of the Haft Tappeh tablets in Wikidata

In the summer of 2023, another scanning campaign took place in Iran to scan cuneiform tablets that had not yet been recorded in 3D and to document them within the project period.

The existing corpus was examined for personal names and other content and enriched with the published archaeological data. The former were published in the FactGrid database as linked open data. As part of the preparations for the publication of the corpus of annotated cuneiform panel paintings, MaiCuBeDa, an annotation dataset from the Hilprecht Collection, was published in the summer.