Deliverables
Through federating Member States efforts, this action will directly contribute to preserving the linguistic and cultural diversity in Europe while effectively implementing the European Common Data Infrastructure and Service MCP’s objectives in the area of language technologies. By providing the necessary data and model adaptation capacities, the action will have a strong impact on the deployment of large language foundation models and their applications such as generative AI. This federated effort will be established around two work strands.
First work strand – Data collection & Fine-Tuning
The first work strand will support the language data collection and the adaptation of existing large language foundation models to specific languages, domains or industries so as to support the onboarding of the latest language technologies by the European actors.
Scope:Scope
Data:
Leveraging on the Common European Language Data Space and other relevant Data Spaces, this activity will, in compliance with the applicable legislation (e.g. Copyright and GDPR), gather the necessary language data (text, audio, image and other modalities) from a broad array of European industrial, academic and institutional actors, and provide data in sufficient quality and quantity that can be made available to build large language foundation models, ensuring a coherent coverage of all the official languages of the Member States as well as the most socially and economically relevant ones. This will also include providing data required to adapt such large language foundation models to specific languages, domains or industries. The action will also provide a repository of existing European Large Language foundation models as well as models adapted to specific languages, domains or industries. Once sufficiently advanced, the consortium may consider working on a future copyright infrastructure and related issues to allow efficient use of language and other data, while taking into account the interests of the rightsholders.
Fine-tuning:
The EuroHPC Joint Undertaking would provide access to their facilities for the adaptation and fine tuning of the models when necessary. The consortium that will carry out this action should be composed by representatives of Member States; public and private organisations, SMEs, RTOs; entities with access to large compute capacities; public and/or private data providers, such as the media or publishing industry.
Please Log In to See This Section