Selection and pre-processing of large datasets for Bulgarian as well as tailored data for particular industries and fine-tuning suitable freely available large language models for specific purposes.
Specification of the criteria for their evaluation, comparison and selection of large language models.
Developing a component of the Infrastructure for the collection, filtering, anonymisation and reduplication of large, diverse and high quality text data for Bulgarian.
Developing a component of the Infrastructure for the fine-tuning of pre-trained large language models for Bulgarian.
Developing a component of the Infrastructure for evaluating the fine-tuning of large language models for Bulgarian.
Reaching Technology Readiness Level 7 of the Infrastructure for Fine-Tuning Pre-Trained Large Language Models.
Open access to the results of the project for the industry, the academia and the wide public.