From the Physical to the Digital Library
Creation and Application of a HTR Model for the Transcription of Giono’s Annotated Books
As part of the author’s archive, Giono’s physical library is a source of extratextual information that should be taken into account for the interpretation of his late novels, particularly in regards to his sociopolitical views. The volumes containing the reading marks represents, in fact, an intertextual context for the novels, and the presence of numerous political works contrasts with the authorial image associated with a disengagement starting from the end of Second World War. Within the digitalization pipeline, which aims at the publication of the digital edition on the web of a selected number of political texts, the present article focusses on the extraction of machine-readable text from the image files, describing how the transcription process is carried out automatically by the creation and application of a HTR model with Transkribus. We will provide a description of the ground-truth material inserted, the parameters set and the training of the model, the results of multiple trainings as well as examples of the transcriptions. The resulting model is ready to be used for future transcriptions, enabling the efficient digitalization of a great number of volumes from the author’s library as well as other documents from his archive.
RiCognizioni is published under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask further permissions both to author or journal board.