September 27, 2023

Unbabel Releases New Open-Source Large Language Model, The First LLM Fine Tuned to Predict Translation Quality

Tuesdays with Trailblazers ft. Avesta Hojjati, VP of Engineering, DigiCert

Tuesdays with Trailblazers ft. Mathias Golombek, CTO, Exasol

The state-of-the-art large language model (LLM) leads to increasingly accurate translations, supporting global expansion while saving time and money for businesses

Unbabel, the AI-powered Language Operations (LangOps) platform that helps businesses deliver multilingual customer experience at scale, today announced the public release of its new open-source Large Language Model (LLM), CometKiwi XL and XXL, specially designed to predict translation quality. As the first LLM fine tuned for translation quality, CometKiwi offers exceptional performance and unprecedented quality estimation capabilities – contributing to increasingly accurate translations and improved cost savings while supporting global expansion.

Even with its recent rapid advancements, AI translation is not always 100% accurate, especially in complex use cases. Additionally, with the explosion of machine translation products trained on different data sets, some models perform better than others for certain use cases. There is no best MT product in the market. This puts up barriers for businesses trying to efficiently translate and localize content for a global audience. Unbabel’s CometKiwi solves these problems by detecting and scoring translation quality, letting businesses determine when human intervention is appropriate and necessary, and when they can benefit from the efficiency of machine automation.

“By making these models available to the public, our goal is to promote collaboration, facilitate knowledge sharing, and drive further advancements in quality estimation techniques,” said João Graça, Co-Founder and Chief Technology Officer of Unbabel. “We firmly believe that CometKiwi will make a significant contribution to the growth and innovation of the machine translation field as a whole.”

Supporting up to 100 languages, CometKiwi XL and XXL are the largest QE systems ever released with 3.5B and 10.7B parameters respectively. Named after its open-source predecessors, OpenKiwi and COMET, both LLMs achieved first place in the WMT 2023 QE shared task, which included high resource language pairs such as Chinese-English and English-German as well as low resource language pairs such as Hebrew-English, English-Tamil, English-Telugu, among others.