Participation at the International Conference LREC 2026 and the Workshop CMLC-12

From 11 to 16 May 2026, the fifteenth edition of the Language Resources and Evaluation Conference (LREC 2026) took place in Palma de Mallorca, Spain — a leading international forum in the field of language resources and natural language processing, organised by the European Language Resources Association (ELRA).

Dr. Ivelina Stoyanova from the Department of Computational Linguistics at the Institute for Bulgarian Language presented three collaborative papers covering different results from the project’s work.

On 11 May 2026, within the 12th Workshop Challenges in the Management of Large Corpora (CMLC-12), organised as part of LREC 2026, two contributions by Svetla Koeva and Ivelina Stoyanova were presented.

The presentation titled IfGPT, a Large Dataset Representing Bulgarian, with the Bulgarian National Corpus as Its Core introduced the large-scale dataset being developed within the IfGPT project.

A poster titled Recent Developments of the Bulgarian National Corpus was also presented, outlining the latest developments of the Bulgarian National Corpus — one of the leading language resources for the Bulgarian language, maintained and developed at the Department of Computational Linguistics of the Institute for Bulgarian Language.

In the main programme of LREC 2026, a poster on the topic of Bulgarian Massive Multitask Language Understanding Benchmark was presented, a joint work by Svetla Koeva, Ivelina Stoyanova, Dimiter Georgiev, Svetlozara Leseva, Valentina Stefanova, Maria Todorova, Tsvetana Dimitrova, Hristina Kukova, Mihaela Moskova and Tinko Tinchev. The poster was accompanied by a video presentation. The work introduces the MMLU-BG resource — a Bulgarian benchmark for evaluating general knowledge in large language models, developed within the IfGPT project.

The presented works generated lively discussion, given the timeliness of evaluating language models for low-resource languages such as Bulgarian.