Objectives

The project aims to develop a freely accessible infrastructure for the selection and pre-processing of large datasets for Bulgarian as well as tailored data for particular industries and fine-tuning suitable freely available large language models for specific purposes.

For achieving the main goal, several objectives are set:

  • To provide a detailed description of the characteristics of large language models and a specification of the criteria for their evaluation, comparison and selection.
  • To develop an infrastructure component for the collection, filtering, anonymisation and reduplication of large, diverse and high quality text data for Bulgarian.
  • To develop an infrastructure component for the fine-tuning of pre-trained large language models for Bulgarian.
  • To develop a component of the Infrastructure for evaluating the fine-tuning of large language models for Bulgarian.
  • To reach Technology Readiness Level 7 of the Infrastructure for Fine-Tuning Pre-Trained Large Language Models by integrating all components into a prototype demonstrating the operation of the Infrastructure for Fine-Tuning Large Language Models in a real-world environment.