The conversation in Silicon Valley is shifting. While Large Language Models (LLMs) like ChatGPT have dominated the digital sphere, delivering incredible utility for white-collar tasks, the consensus among AI pioneers is that these models are approaching a ceiling. The true race for machine “superintelligence” and the massive economic opportunity it represents now centres on a different architecture: World Models.
For venture capitalists and technology leaders, understanding this shift is crucial, as the potential market for World Models is estimated to be almost the size of the global economy, a staggering $100 trillion if we can create intelligence that understands and operates in the physical world.
What Exactly is a World Model?
Today’s AIs are book smart. Everything they know they learned from available language, images and videos. To evolve further, they have to get street smart. That requires “world models.”
Christopher Mims, WSJ
World models (or world simulators) are AI systems designed to gain “street smarts” rather than just “book smarts”. They are internal models of an environment that an AI agent learns and uses to predict what will happen next.
Unlike LLMs, which are trained primarily on text and language, World Models are developed to navigate the physical domain by learning from data streams of real or simulated environments, including videos and robotic inputs.
In essence, a world model allows an AI agent to “imagine” or simulate its world internally. It functions as a core prediction engine: taking the current state of the world and a proposed action, and predicting the resulting next state. This mechanism is viewed as critical because it encapsulates the understanding of the dynamics of the real world, including physics, time, motion, force, and spatial properties.
At their core, seminal World Models often comprise three parts:
- Vision Model (V): Compresses raw observations (like images) into a concise, compact latent representation.
- Memory Model (M): Predicts how the environment will change over time, using a recurrent neural network (RNN) to carry context and forecast the next state.
- Controller (C): The decision-maker, a smaller neural network that takes the compact state and context to choose the next action.
Why This Architecture is Crucial for AGI and Robotics
The focus on world models signals a high-stakes bet on solving fundamental challenges LLMs struggle with, particularly those involving logical consistency and spatial reasoning.
World models are important because they enable:
- Efficient and Safe Learning: Rather than costly or dangerous real-world trial-and-error, agents can learn by simulating vast scenarios in their “imagination”. This process, often called reinforcement learning, is key for training embodied AIs.
- Reasoning and Planning: Proponents, like Meta’s Yann LeCun (one of the “godfathers” of modern AI), argue that LLMs will never achieve human-like ability to reason and plan. World models aim to bridge this gap by enabling an AI to determine a sequence of actions needed to achieve a goal, not just based on observed patterns, but based on a deeper understanding of cause-and-effect.
- The Rise of Physical AI: World models are considered an important step for advancing self-driving cars, robotics, and other so-called AI agents that must operate safely in the real world. Nvidia CEO Jensen Huang asserts that the next major growth phase will come with “physical AI,” driven by these new models.
The Leading Players in the World Model Race
The global race for superintelligence has pivoted, with several giants and well-funded startups making massive investments:
- Google DeepMind: They are pioneering research with general-purpose systems like Genie 3, which generates open-world virtual landscapes from text prompts. Unlike previous video generation models, Genie 3 generates video frame-by-frame, considering past interactions to maintain environmental consistency and allows for real-time interaction and navigation. Google’s video generation model, Veo 3, also relies on world models for physical accuracy.
Take a look at this demo of Google’s Genie 3 by DeepMind:
- Meta: The company trains its V-JEPA models on raw video content to replicate how children learn passively by observing the world. Meta’s long-term AI projects, spearheaded by Yann LeCun, rely heavily on this architecture.
- Nvidia: Leveraging its long history in video game simulation, Nvidia views World Foundation Models (WFMs) as critical to their future growth. Their Omniverse platform creates and runs simulations, assisting their push toward robotics. Nvidia also offers Cosmos, a platform of state-of-the-art WFMs designed to accelerate the development of physical AI systems.
- World Labs: Founded by AI pioneer Fei-Fei Li, this startup has already raised $230 million to focus on large world models. It is developing a model to generate video game-like 3D environments from a single image.
- MBZUAI (Mohamed bin Zayed University of Artificial Intelligence): Unveiled PAN, an ambitious model aiming to revolutionize reasoning by predicting the next world state and simulating infinitely diverse realities. PAN excels at long-horizon simulations and multi-agent systems.
But it’s not just the big tech companies or AI research centres who are active in this space. For example, Toronto-based Waabi constructed an entire world, called Waabi World, just to train AIs to drive trucks. Another example is Israeli startup Ottopia, which ,everages generative and simulation technology for remote vehicle operation, relying on real-time sensor fusion and 3D modelling of environments. AAI, or “Double AI”, the secretive startup founded by Amnon Shaashua, which also co-founded Mobileye, is rumoured to be working on this as well. Despite being in stealth, it’s already a Unicorn, following an undisclosed round of funding by Lightspeed Ventures. Another Israeli startup in this space is Decart AI, valued at $1.3 billion just a year from its founding. At Remagine Ventures, we invested in Playo AI, which has strong capabilities in the world generation space, but remains in stealth.
The Connection to Gaming: A Necessary Virtual Playground
World models are used to create virtual environments that can be simulated and interacted with are core features in most games. These models allow AI agents to understand, plan, and act within game worlds by simulating outcomes and predicting possible futures. Game engines themselves are often seen as practical implementations of world models, providing multimodal feedback (graphics, physics, audio) that mimics the real world.
For investors looking for near-term returns, the entertainment industry is a direct application of world models. Critically, gaming is not just an application; it is the crucible where the next generation of AI will be forged:
- Game Design: Developers use world models to generate dynamic, interactive levels, planets, or storylines that respond to player choice and behaviour.
- Simulated Training Environments: Training sophisticated, embodied AI requires safe, virtual spaces where they can fail repeatedly and learn what they need to achieve their goals. Gaming provides these “realistic virtual playgrounds”. World models help train AI agents safely by letting them practice in rich, simulated environments before deploying to real-world tasks (e.g., robotics, autonomous vehicles), with gaming engines often used as the testbed.
- AI Agents and NPCs: Advanced AI agents powered by world models can plan and strategise like human players, providing smarter opponents or more lifelike companions.
- Data Acquisition: Massive sources of data, such as gameplay and user actions captured by services like Medal.tv, are invaluable for training frontier AI labs seeking to build AGI and AI that can pilot robots. Niantic, for example, has mapped 10 million locations by gathering information through games like Pokémon Go, using this real-world data to build its world model.
- Interactive Content Generation: World models are enabling a new wave of generative media. They are used to create interactive and realistic scenes.
- Startups like Runway use world models to create gaming settings, complete with personalized stories and characters generated in real time.
- Models can generate realistic 3D worlds on demand.
- World models inherently solve problems that plague earlier video generation models, such as maintaining realistic physics and consistency over time.
There are two main approaches to the use of ‘World Models’ in gaming: 1) A video-based approach (which generate frame sequences in response to input but struggle with persistence and consistency) and 2) native 3D approaches (which build explicit spatial structure and support real-time manipulation).
In the context of gaming, world models promise to lower the barrier to creating dynamic, richly interactive game worlds that can evolve, remember, and respond to player actions, blurring the line between designer-crafted levels and AI-generated universes.
Startup Opportunities: Where VC Money Should Flow
The challenge of building world models requires huge amounts of data and computing power, and is considered an unsolved technical challenge. This creates ripe opportunities for innovative startups:
- Synthetic Data Generation & Curation: Physical AI systems (like autonomous vehicles and factory robots) require visually, spatially, and physically accurate data. WFMs are essential tools for generating this high-quality, pre-labeled synthetic data efficiently at scale, filling gaps in real-world datasets. Startups focused on advanced data curation (filtering, annotation, deduplication) and tokenization for multimodal data will be essential infrastructure players.
- Specialized Simulation Platforms: While Nvidia and Google build general models, startups can focus on domain-limited, high-fidelity simulations. In gaming, Third Dimension and Intangible developed 3D world models to scale game creation. Other verticals include healthcare, manufacturing, and industrial safety.
- Real-Time Interaction and Embodied Agents: Developing the Controller component—the sophisticated decision-maker—that leverages the outputs of foundation world models to navigate and execute complex, long-horizon tasks is a rich area. This includes building generalist agents (like Google DeepMind’s SIMA agent research) that interact robustly with WFM-created environments.
- Gaming and Metaverse Infrastructure: Developing tools that rapidly generate consistent, interactive 3D environments, character behaviors, and narratives in real-time for gaming and extended reality applications will capture significant value.
World models empower AI to learn by “dreaming” and planning, moving beyond simple prediction toward true reasoning. As LLMs saturate the digital domain, WFMs are the key to unlocking the physical domain—a shift that promises to revolutionize industries from robotics to manufacturing and represents the most valuable bet in frontier AI today.
As the field of World Models evolves, it is increasingly seen as central for enhancing the reliability, adaptability, and reasoning capabilities of AI systems, pushing the limits of what autonomous agents can achieve in both real and virtual worlds.
- Weekly Firgun Newsletter – May 8 2026 - May 8, 2026
- Israel’s 2026 National AI Strategy - May 7, 2026
- The Chokepoint Thesis: Moats, Affordance and Diffusion - May 7, 2026

