Core42 Sets New Benchmark for Arabic Large Language Models with the Release of Jais 30B
Press Releases
Nov 09, 2023
Latest Jais model iteration shows stronger performance across content generation, summarization, Arabic-English translation
ABU DHABI, UAE, Nov. 9, 2023 /PRNewswire/ — Core42, a G42 company and the UAE-based national-scale enabler for cloud and generative AI, announced the launch of Jais 30B, the newest and most proficient version of its open-source Arabic Large Language Model (LLM). Featuring 30 billion parameters, this new iteration of Jais follows the release in August 2023 of the 13 billion parameter model, underscoring Core42’s commitment to provide a rich linguistic and culture-focused generative AI experience for the over 400 million Arabic speakers worldwide.
Jais, born from the collaboration between Inception – now converged into Core42 -, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), the world’s first graduate research university dedicated to AI, and Cerebras Systems, immediately set a benchmark in the Arabic LLM landscape. The model was trained on the Condor Galaxy-1 (CG-1) – one of the world’s fastest AI supercomputers, with 4 exaFLOPS of training compute, 54 million cores, and 64-nodes – built by G42 in partnership with Cerebras Systems. Jais 13B went from concept to fine-tuned, leading open-source model in less than four months. Notably, the production training run for Jais 13B was completed in 21 days on CG-1.
The new Jais 30B model was trained on a substantially larger dataset than its predecessor, made of 126 billion Arabic tokens, 251 billion English tokens, and 50 billion code tokens and shows an increased performance across all key indicators. It offers 160% longer and more detailed answers in Arabic and a 233% increase in English, reflecting significant improvements in language generation. The model also presents better performance in summarization (53% in Arabic and 85% in English) and formatting (130% in Arabic and 134% in English). Jais 30B performance is now on par with monolingual English models and outperforms most open-source models in Foundation Model evaluations.
Jais 30B’s enhancements have been tested and validated using heuristic, cross-model comparison, and human evaluations, showing that the responses of the model’s fine-tuned iterations outperform those of Jais 13B 96% of the time in Arabic and 97% in English.
Reaffirming its dedication to responsible and safe AI practices, the developing team has also further enhanced its processes and policies to guardrail biases and the production of hateful or harmful content by the model, a process made easier by its open-source release.
Jais’s versatility and unique capabilities in the Arabic language domain have already shown promise in applications across various sectors including telecommunications, energy, education, healthcare as well as innovative solutions for the marketing communications industry.
Dr. Andrew Jackson, EVP, Chief AI Officer, Core42, said: “The launch of Jais 30B marks another significant milestone for Core42 and represents a giant leap forward for the Arabic-speaking world in harnessing the potential of generative AI. This release underscores the powerful synergy between Core42’s technological leadership, our extensive partner ecosystem, and our shared dedication to pushing the boundaries of what’s possible in the field of AI. I eagerly anticipate close collaboration with our customers and partners to explore new applications and continually enhance the model’s capabilities, as we intensify our efforts to create top-quality LLMs for various other languages.”
Andrew Feldman, CEO and co-founder, Cerebras Systems said: “Less than eight weeks after we introduced Jais 13B to the global Arabic-speaking community, the Core42 and Cerebras teams have delivered a new state-of-the-art LLM that is more than double in size. Jais 30B leverages the incredible, massive compute of Condor Galaxy 1 to set another record in bilingual performance and impressively fast training time.”
Jais 30B is available for download on Hugging Face.
Hugging Face foundational model: https://huggingface.co/core42/jais-30b-v1
Hugging Face chat model: https://huggingface.co/core42/jais-30b-chat-v1
To know more about how Jais 30B was trained and benchmarks against other models, you can read the model’s blog post on G42’s website: https://www.g42.ai/resources/publications/Jais-30B
About Core42
Core42 is the UAE-based national-scale enabler for cloud and generative AI, combining G42’s expertise across multiple technology domains into a common platform for Enterprise AI. Building on a comprehensive set of capabilities across cloud infrastructure and services, data and AI, high-performance computing, and digital services, our mission is to empower organizations, industries, and nations through the transformative power of AI.
For further information, visit www.core42.ai
View original content to download multimedia:https://www.prnewswire.com/news-releases/core42-sets-new-benchmark-for-arabic-large-language-models-with-the-release-of-jais-30b-301983174.html
SOURCE G42