Slide
Project
Infrastructure for Fine-tuning Pre-trained Large Language Models
Slide
Beneficiary
Slide
Start date: 12.12.2024
End date: 30.05.2026
Duration: 17.5 months
Slide
Total budget: BGN 437,446.38
Amount of EU funding:
BGN 437,446.38 (100%)
Slide

Main goal
To develop a freely accessible infrastructure for the selection and pre-processing of large datasets for Bulgarian as well as tailored data for specific industries and fine-tuning suitable freely available large language models for specific purposes.

previous arrowprevious arrow
next arrownext arrow

Selection and pre-processing of large datasets for Bulgarian as well as tailored data for particular industries and fine-tuning suitable freely available large language models for specific purposes.

Specification of the criteria for their evaluation, comparison and selection of large language models.

Developing a component of the Infrastructure for the collection, filtering, anonymisation and reduplication of large, diverse and high quality text data for Bulgarian.

Developing a component of the Infrastructure for the fine-tuning of pre-trained large language models for Bulgarian.

Developing a component of the Infrastructure for evaluating the fine-tuning of large language models for Bulgarian.

Reaching Technology Readiness Level 7 of the Infrastructure for Fine-Tuning Pre-Trained Large Language Models.

Open access to the results of the project for the industry, the academia and the wide public.

News

previous arrow
156aa3a9cebc393889ff7d84733039c3
Prof. Svetla Koeva: It is important how AI legislation will be handled
next arrow