OpenInfer has raised $8 million in funding to redefine AI inference for edge functions.
It’s the mind little one of Behnam Bastani and Reza Nourai, who spent practically a decade of constructing and scaling AI techniques collectively at Meta’s Actuality Labs and Roblox.
By means of their work on the forefront of AI and system design, Bastani and Nourai witnessed firsthand how deep system structure allows steady, large-scale AI inference. Nevertheless, right now’s AI inference stays locked behind cloud APIs and hosted techniques—a barrier for low-latency, non-public, and cost-efficient edge functions. OpenInfer modifications that. It needs to agnostic to the forms of gadgets on the edge, Bastani stated in an interview with GamesBeat.
By enabling the seamless execution of huge AI fashions straight on gadgets—from SoCs to the cloud—OpenInfer removes these boundaries, enabling inference of AI fashions with out compromising efficiency.
The implication? Think about a world the place your telephone anticipates your wants in actual time — translating languages immediately, enhancing photographs with studio-quality precision, or powering a voice assistant that actually understands you. With AI inference operating straight in your gadget, customers can anticipate sooner efficiency, higher privateness, and uninterrupted performance regardless of the place they’re. This shift eliminates lag and brings clever, high-speed computing to the palm of your hand.
Constructing the OpenInfer Engine: AI Agent Inference Engine

Since founding the corporate six months in the past, Bastani and Nourai have assembled a crew of
seven, together with former colleagues from their time at Meta. Whereas at Meta, they’d constructed Oculus
Hyperlink collectively, showcasing their experience in low-latency, high-performance system design.
Bastani beforehand served as Director of Structure at Meta’s Actuality Labs and led groups at
Google targeted on cellular rendering, VR, and show techniques. Most lately, he was Senior
Director of Engineering for Engine AI at Roblox. Nourai has held senior engineering roles in
graphics and gaming at business leaders together with Roblox, Meta, Magic Leap, and Microsoft.
OpenInfer is constructing the OpenInfer Engine, what they name an “AI agent inference engine”
designed for unmatched efficiency and seamless integration.
To perform the primary aim of unmatched efficiency, the primary launch of the OpenInfer
Engine delivers 2-3x sooner inference in comparison with Llama.cpp and Ollama for distilled DeepSeek
fashions. This increase comes from focused optimizations, together with streamlined dealing with of
quantized values, improved reminiscence entry by enhanced caching, and model-specific
tuning—all with out requiring modifications to the fashions.
To perform the second aim of seamless integration with easy deployment, the
OpenInfer Engine is designed as a drop-in alternative, permitting customers to modify endpoints
just by updating a URL. Current brokers and frameworks proceed to operate seamlessly,
with none modifications.
“OpenInfer’s developments mark a significant leap for AI builders. By considerably boosting
inference speeds, Behnam and his crew are making real-time AI functions extra responsive,
accelerating improvement cycles, and enabling highly effective fashions to run effectively on edge
gadgets. This opens new prospects for on-device intelligence and expands what’s attainable in
AI-driven innovation,” stated Ernestine Fu Mak, Managing Companion at Courageous Capital and an
investor in OpenInfer.
OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inference
on giant fashions—outperforming business leaders on edge gadgets. By designing inference from
the bottom up, they’re unlocking greater throughput, decrease reminiscence utilization, and seamless
execution on native {hardware}.
Future roadmap: Seamless AI inference throughout all gadgets
OpenInfer’s launch is well-timed, particularly in mild of latest DeepSeek information. As AI adoption
accelerates, inference has overtaken coaching as the first driver of compute demand. Whereas
improvements like DeepSeek cut back computational necessities for each coaching and inference,
edge-based functions nonetheless battle with efficiency and effectivity on account of restricted processing
energy. Operating giant AI fashions on client gadgets calls for new inference strategies that
allow low-latency, high-throughput efficiency with out counting on cloud infrastructure,
creating vital alternatives for firms optimizing AI for native {hardware}.
“With out OpenInfer, AI inference on edge gadgets is inefficient because of the absence of a transparent
{hardware} abstraction layer. This problem makes deploying giant fashions on
compute-constrained platforms extremely tough, pushing AI workloads again to the
cloud—the place they grow to be expensive, sluggish, and depending on community situations. OpenInfer
revolutionizes inference on the sting,” stated Gokul Rajaram, an investor in OpenInfer. Rajaram is
an angel investor and at present a board member of Coinbase and Pinterest.
Specifically, OpenInfer is uniquely positioned to assist silicon and {hardware} distributors improve AI
inference efficiency on gadgets. Enterprises needing on-device AI for privateness, value, or
reliability can leverage OpenInfer, with key functions in robotics, protection, agentic AI, and
mannequin improvement.
In cellular gaming, OpenInfer’s know-how allows ultra-responsive gameplay with real-time
adaptive AI. Enabling on-system inference permits for decreased latency and smarter in-game
dynamics. Gamers will get pleasure from smoother graphics, AI-powered customized challenges, and a
extra immersive expertise evolving with each transfer.
“At OpenInfer, our imaginative and prescient is to seamlessly combine AI into each floor,” stated Bastani. “We intention to ascertain OpenInfer because the default inference engine throughout all gadgets—powering AI in self-driving automobiles, laptops, cellular gadgets, robots, and extra.”
OpenInfer has raised an $8 million seed spherical for its first spherical of financing. Traders embody
Courageous Capital, Cota Capital, Essence VC, Operator Stack, StemAI, Oculus VR’s Co-founder and former CEO Brendan Iribe, Google Deepmind’s Chief Scientist Jeff Dean, Microsoft Experiences and Units’ Chief Product Officer Aparna Chennapragada, angel investor Gokul Rajaram, and others.
“The present AI ecosystem is dominated by a number of centralized gamers who management entry to
inference by cloud APIs and hosted companies. At OpenInfer, we’re altering that,” stated
Bastani. “Our identify displays our mission: we’re ‘opening’ entry to AI inference—giving
everybody the power to run highly effective AI fashions domestically, with out being locked into costly cloud
companies. We consider in a future the place AI is accessible, decentralized, and really within the fingers of
its customers.”