There’s a brand new AI mannequin household on the block, and it’s one of many few that may be reproduced from scratch.
On Tuesday, Ai2, the nonprofit AI analysis group based by the late Paul Allen, launched OLMo 2, the second household of fashions in its OLMo collection. (OLMo’s brief for “Open Language Mannequin.”) Whereas there’s no scarcity of “open” language fashions to select from (see: Meta’s Llama), OLMo 2 meets the Open Supply Initiative’s definition of open supply AI, which means the instruments and knowledge used to develop it are publicly accessible.
The Open Supply Initiative, the long-running establishment aiming to outline and “steward” all issues open supply, finalized its open supply AI definition in October. However the first OLMo fashions, launched in February, met the criterion as effectively.
“OLMo 2 [was] developed start-to-finish with open and accessible coaching knowledge, open-source coaching code, reproducible coaching recipes, clear evaluations, intermediate checkpoints, and extra,” AI2 wrote in a weblog put up. “By brazenly sharing our knowledge, recipes, and findings, we hope to offer the open-source neighborhood with the sources wanted to find new and revolutionary approaches.”
There’s two fashions within the OLMo 2 household: one with 7 billion parameters (OLMo 7B) and one with 13 billion parameters (OLMo 13B). Parameters roughly correspond to a mannequin’s problem-solving abilities, and fashions with extra parameters usually carry out higher than these with fewer parameters.
Like most language fashions, OLMo 2 7B and 13B can carry out a variety of text-based duties, like answering questions, summarizing paperwork, and writing code.
To coach the fashions, Ai2 used a knowledge set of 5 trillion tokens. Tokens symbolize bits of uncooked knowledge; 1 million tokens is the same as about 750,000 phrases. The coaching set included web sites “filtered for prime quality,” tutorial papers, Q&A dialogue boards, and math workbooks “each artificial and human generated.”
Ai2 claims the result’s fashions which might be aggressive, performance-wise, with open fashions like Meta’s Llama 3.1 launch.
“Not solely can we observe a dramatic enchancment in efficiency throughout all duties in comparison with our earlier OLMo mannequin however, notably, OLMo 2 7B outperforms LLama 3.1 8B,” Ai2 writes. “OLMo 2 [represents] the perfect fully-open language fashions to this point.”
The OLMo 2 fashions and all of their parts could be downloaded from Ai2’s web site. They’re underneath Apache 2.0 license, which means they can be utilized commercially.
There’s been some debate just lately over the security of open fashions, what with Llama fashions reportedly being utilized by Chinese language researchers to develop protection instruments. Once I requested Ai2 engineer Dirk Groeneveld in February whether or not he was involved about OLMo being abused, he informed me that he believes the advantages in the end outweigh the harms.
“Sure, it’s attainable open fashions could also be used inappropriately or for unintended functions,” he mentioned. “[However, this] method additionally promotes technical developments that result in extra moral fashions; is a prerequisite for verification and reproducibility, as these can solely be achieved with entry to the total stack; and reduces a rising focus of energy, creating extra equitable entry.”