LANGUAGE TECHNOLOGIES & DIGITAL HUMANITIES CONFERENCE 2022 | SDJT – Slovensko društvo za jezikovne tehnologije

September 15-16 2022

https://www.sdjt.si/jtdh-2022/en
Slovenska stran:
https://www.sdjt.si/jtdh-2022

The Slovenian Language Technologies Society (SDJT), the Centre for Language Resources and Technologies at the University of Ljubljana (CJVT), the Institute of Contemporary History (INZ) and the research infrastructures CLARIN.SI and DARIAH-SI organised the conference “Language Technologies and Digital Humanities” on 15th and 16th September 2022. The biennial conference “Language Technologies” was first held in 1998, with the thematic expansion to Digital Humanities introduced in 2016.

COVID-19 Related Information

Invited speakers

Benoît Sagot

“Large-scale language models: challenges and perspective” [Video]

Abstract: The emergence of large-scale neural language models in Natural Language Processing (NLP) research and applications has improved the state of the art in most NLP tasks. However, training such models requires enormous computational resources and training data. The characteristics of the training data has an impact on the behaviour of the models trained on it, depending for instance on the data’s homogeneity and size. In this talk, I will speak about how we developed the large-scale multilingual OSCAR corpus. I will describe the lessons we learned while training the French language model CamemBERT, the first large-scale monolingual model for a language other than English, especially in terms of the influence of size and heterogeneity of the training corpus. I will also sketch out a few research questions related to biases in large-scale language models, with a focus on the impact of tokenisation and language imbalance, in the context of the BigScience initiative. I will conclude with my thoughts on the future of language models and their impact on NLP and other data processing fields (speech, vision).

Bio: Benoît Sagot, Directeur de Recherches (Senior Researcher) at Inria, is the head of the Inria project-team ALMAnaCH in Paris, France. A specialist in natural language processing (NLP) and computational linguistics, his research focuses on language modelling, language resource development, machine translation, text simplification, part-of-speech tagging and parsing, computational morphology and, more recently, digital humanities (computational historical linguistics and historical language processing). He has been the PI or co-PI of a number of national and international projects, and is the holder of a chair in the PRAIRIE institute dedicated to research in artificial intelligence. He is also the co-founder of two start-ups where he uses his expertise in NLP and data mining for the automatic analysis of employee survey results.

Eetu Mäkelä

“Designing computational systems to support humanities and social sciences research” [Video]

Abstract: From the viewpoint of the humanities and social sciences, collaborations with computer scientists often fail to deliver. In my research group, we have tried to understand why this is, and what to do about it. In this talk, I will discuss three key elements that we have discovered:
Often, datasets in the humanities and social sciences are not neatly representative of the object of interest. Systems need to provide ways in which to evaluate and counter the biases, confounders and noise in the data. Often, there is also a large gap between what is in the data, and what would be of interest. This gap needs to be bridged using algorithms, but care must be given that a) what the algorithm produces actually matches the interest and b) that its application does not introduce bias of its own (also interestingly, algorithm performance metrics of interest here often differ from those generally used in NLP/computer science). On a process level, collaboration between researchers from different disciplines is hard due to discrepancies in expectations relating to all facets of research, from research questions through methodology to the publication of results. Projects and systems need to acknowledge this, and be designed to facilitate iterative movement in the right direction.

Bio: Eetu Mäkelä is an associate professor in Human Sciences–Computing Interaction at the University of Helsinki, and a docent (adjunct professor) in computer science at Aalto University. At the Helsinki Centre for Digital Humanities, he leads a research group that seeks to figure out the technological, processual and theoretical underpinnings of successful computational research in the humanities and social sciences. Additionally, he serves as a technological director at the DARIAH-FI infrastructure for computational humanities and is one of three research programme directors in the datafication research initiative of the Helsinki Institute for Social Sciences and Humanities. For his work, he has obtained a total of 19 awards, including multiple best paper awards in conferences and journals, as well as multiple open data and open science awards. He also has a proven track record in creating systems fit for continued use by their audience.

Pre-conference tutorials

On Wednesday, September 14th, 2022, two workshops were held as part of the conference:

Topic modelling parliamentary debates before and during the COVID-19 pandemic

The tutorial introduces researchers in the humanities and social sciences to text mining and shows the value of such approaches for research in the aforementioned fields. The tutorial presents the particularities of parliamentary discourse and topic modelling by answering concrete research questions. The practical example is based on the freely accessible corpus of parliamentary debates ParlaMint and the Orange data mining software. No programming knowledge is required, but the participants will need their own laptop with Orange installed.

Lecturer: Ajda Pretnar Žagar

Boost your research with CLARIN.SI

This tutorial will introduce the CLARIN.SI research infrastructure, which facilitates the creation, processing, archiving, and reuse of language data from books, newspapers, social media, interviews, etc. We will demonstrate how to use the digital repository to find existing language resources relevant for your research questions as well as the most common tools to analyse them. We will also present the rich knowledge base and funding instruments offered by CLARIN that researchers can benefit from when dealing with legal, standardisation, annotation and other issues related to their language data. The tutorial is ideal for novice and experienced researchers from linguistics but also other fields that rely on collecting and analysing written and spoken language materials, such as literary studies, translation studies, history, media studies, anthropology, and sociology, who would like to become more familiar with the CLARIN.SI research infrastructure. [PDF]

Lecturers: Jakob Lendardič and Kristina Pahor de Maiti

Presentation of the interim results of the “Development of Slovene in a Digital Environment – Language Resources and Technologies” project

On Friday 16 September, at the end of the conference, a presentation of the project “Development of Slovene in a Digital Environment – Language Resources and Technologies” took place, in which the leaders of the work packages presented the current results of the project. [Video]

Thematic areas of the conference

The conference aims to bring together researchers from various backgrounds and methodological frameworks. The main topics will include but are not limited to:

Speech and other mono- and multilingual language technologies
Digital linguistics: translation studies, corpus linguistics, lexicology and lexicography, standardisation
Digital humanities and historical studies, ethnology, literary studies, musicology, cultural heritage, archaeology, and fine arts
Digital humanities in education and digital publishing

We welcome submissions that present guidelines, research, good practices, projects and results in these areas. The conference will also include invited lectures, a student section, and roundtables on topics related to the conference. The official languages of the conference will be Slovene and English.

Important dates

~~May 15th, 2022: Deadline for submission of papers and extended abstracts~~
~~May 30th, 2022: Extended deadline for submission of papers and extended abstracts~~
~~June 30h, 2022: Notification of acceptance~~
~~August 15th, 2022: Submission of final papers~~
~~August 16th, 2022: Registration deadline~~
~~September 16th-16th, 2022: Conference~~

Instructions for authors

The authors are invited to submit either a full paper or an extended abstract describing work to be presented at the conference. The extended abstract will be published in the book of abstracts and the full papers in the conference proceedings, both of which will be published on the conference website under the Creative Commons license at the beginning of the conference. We leave it up to the authors whether to submit their contributions anonymized or not.

The official languages of the conference are Slovene and English.

The extended abstracts should be 2-4 pages long and the full papers 6–8 pages, formatted according to the conference guidelines:

extended abstract: example, Word template
full paper: example, Word template, LaTeX template
templates are also available for papers written in Slovene; you can find them on the Slovene page of the conference.

The contributions are collected using EasyChair by clicking on this link.

The authors of full papers should indicate if it is a student contribution by adding “student paper” to the list of keywords. All the co-authors of student papers should be students. These papers will be presented in a separate student session and will be eligible for the best student paper award.

Organisation

For more information please contact the Organising Committee at the following e-mail address (mojca.sorn@inz.si)

Organising committee

Mojca Šorn, chair
Ana Cvek
Kaja Dobrovoljc
Jerneja Fridl
Katja Meden
Mihael Ojsteršek
Nataša Rozman

Programme committee

Steering committee

Darja Fišer (chair), Faculty of Arts, University of Ljubljana and Institute for Contemporary History
Simon Dobrišek, Faculty of Electrical Engineering, University of Ljubljana
Tomaž Erjavec, Jožef Stefan Institute
Andrej Pančur, Institute for Contemporary History
Matej Klemen (student section), Faculty for Computer Science and Informatics, University of Ljubljana
Aleš Žagar (student section), Faculty for Computer Science and Informatics, University of Ljubljana

Members of the programme committee

Špela Arhar Holdt, Faculty of Arts, University of Ljubljana
Petra Bago, Faculty of Arts, University of Zagreb
Vuk Batanović, Faculty of Electrical Engineering, University of Belgrade
Zoran Bosnić, Faculty of Computer and Information Science, University of Ljubljana
Narvika Bovcon, Faculty of Computer and Information Science, University of Ljubljana
Václav Cvrček, Institute of the Czech National Corpus, Charles University in Prague
Jaka Čibej, Faculty of Arts, University of Ljubljana
Helena Dobrovoljc, Fran Ramovš Institute of the Slovenian Language, ZRC SAZU
Kaja Dobrovoljc, Faculty of Arts, University of Ljubljana
Jerneja Fridl, ZRC SAZU
Polona Gantar, Faculty of Arts, University of Ljubljana
Vojko Gorjanc, Faculty of Arts, University of Ljubljana
Jurij Hadalin, Institute of Contemporary History
Miran Hladnik, Faculty of Arts, University of Ljubljana
Ivo Ipšić, University of Rijeka
Mateja Jemec Tomazin, Fran Ramovš Institute of the Slovenian Language, ZRC SAZU
Alenka Kavčič, Faculty of Computer Science, University of Ljubljana
Iztok Kosem, Faculty of Arts, University of Ljubljana
Simon Krek, Artificial Intelligence Laboratory, Jožef Stefan Institute
Jakob Lenardič, Faculty of Arts, University of Ljubljana
Nikola Ljubešić, Department of Knowledge Technologies, Jožef Stefan Institute
Nataša Logar, Faculty of Social Sciences, University of Ljubljana
Matija Marolt, Faculty of Computer and Information Science, University of Ljubljana
Sanda Martinčić Ipšić, University of Rijeka
Maja Miličević Petrović, University of Bologna
Dunja Mladenić, Artificial Intelligence Laboratory, Jožef Stefan Institute
Matija Ogrin, Institute of Slovene Literature and Literary Sciences, ZRC SAZU
Matevž Pesek, Faculty of Computer Science, University of Ljubljana
Dan Podjed, Institute of Slovenian Ethnology, ZRC SAZU
Senja Pollak, Department of Knowledge Technologies, Jožef Stefan Institute
Ajda Pretnar Žagar, Faculty of Computer Science, University of Ljubljana
Marko Robnik Šikonja, Faculty of Computer and Information Science, University of Ljubljana
Tanja Samardžić, University of Zurich
Miha Seručnik, Milko Kos Historical Institute, ZRC SAZU
Mirjam Sepesy Maučec, Faculty of Electrical Engineering and Computer Science, University of Maribor
Marko Stabej, Faculty of Arts, University of Ljubljana
Branislava Šandrih Todorović, Faculty of Philology, University of Belgrade
Mojca Šorn, Institute of Contemporary History
Janez Štebe, Faculty of Social Sciences, University of Ljubljana
Simon Šuster, University of Melbourne
Daniel Vasić, University of Mostar
Darinka Verdonik, Faculty of Electrical Engineering and Computer Science, University of Maribor
Andrej Žgank, Faculty of Electrical Engineering and Computer Science, University of Maribor
Jerneja Žganec Gros, Alpineon d.o.o.
Branko Žitko, Faculty of Science, University of Split