A company creating math benchmarks for AI didn’t disclose that it had acquired funding from OpenAI till comparatively just lately, drawing allegations of impropriety from some within the AI group.
Epoch AI, a nonprofit primarily funded by Open Philanthropy, a analysis and grantmaking basis, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a check with expert-level issues designed to measure an AI’s mathematical abilities, was one of many benchmarks OpenAI used to demo its upcoming flagship AI, o3.
In a put up on the discussion board LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t knowledgeable of OpenAI’s involvement till it was made public.
“The communication about this has been non-transparent,” Meemi wrote. “For my part Epoch AI ought to have disclosed OpenAI funding, and contractors ought to have clear details about the potential of their work getting used for capabilities, when selecting whether or not to work on a benchmark.”
On social media, some customers raised issues that the secrecy might erode FrontierMath’s fame as an goal benchmark. Along with backing FrontierMath, OpenAI had entry to lots of the issues and options within the benchmark — a truth Epoch AI didn’t expose previous to December 20, when o3 was introduced.
In a reply to Meemi’s put up, Tamay Besiroglu, affiliate director of Epoch AI and one of many group’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, however admitted that Epoch AI “made a mistake” in not being extra clear.
“We had been restricted from disclosing the partnership till across the time o3 launched, and in hindsight we must always have negotiated more durable for the flexibility to be clear to the benchmark contributors as quickly as doable,” Besiroglu wrote. “Our mathematicians deserved to know who might need entry to their work. Regardless that we had been contractually restricted in what lets say, we must always have made transparency with our contributors a non-negotiable a part of our settlement with OpenAI.”
Besiroglu added that whereas OpenAI has entry to FrontierMath, it has a “verbal settlement” with Epoch AI to not use FrontierMath’s downside set to coach its AI. (Coaching an AI on FrontierMath could be akin to educating to the check.) Epoch AI additionally has a “separate holdout set” that serves as a further safeguard for impartial verification of FrontierMath benchmark outcomes, Besiroglu mentioned.
“OpenAI has … been absolutely supportive of our determination to keep up a separate, unseen holdout set,” Besiroglu wrote.
Nonetheless, muddying the waters, Epoch AI lead mathematician Ellot Glazer famous in a put up on Reddit that Epoch AI hasn’t be capable to independently confirm OpenAI’s FrontierMath o3 outcomes.
“My private opinion is that [OpenAI’s] rating is legit (i.e., they didn’t practice on the dataset), and that they haven’t any incentive to lie about inner benchmarking performances,” Glazer mentioned. “Nonetheless, we will’t vouch for them till our impartial analysis is full.”
The saga is but one other instance of the problem of creating empirical benchmarks to guage AI — and securing the mandatory sources for benchmark growth with out creating the notion of conflicts of curiosity.