A new study reveals that artificial intelligence models trained on physics simulations develop internal representations of physical concepts that can be directly manipulated, much like turning a dial to adjust a machine's behavior. This finding, from researchers at the University of Cambridge, NYU, and the Flatiron Institute, shows that these AI systems don't just memorize patterns but learn abstract principles that can be controlled causally. By extracting and injecting simple 'concept directions' into the model's activation space, scientists can now steer simulations to add or remove features like vortices, diffusion, or even change their speed, opening new possibilities for interactive scientific exploration and model auditing.
The key is that physics foundation models, similar to large language models, encode physical concepts as linear directions in their internal activation space. The researchers focused on Walrus, a transformer model pretrained on The Well, a massive 15TB collection of physics simulations spanning fluid dynamics, astrophysics, and biological systems. They found that by computing the difference between average activations from two contrasting physical regimes—such as vortex flow versus laminar flow—they could isolate a single direction representing that physical feature. Injecting this direction back into the model during inference allowed them to causally control the simulation's output, effectively adding or suppressing the targeted feature with precision.
Ology involved four main steps, adapted from techniques used in language model interpretability. First, the team selected contrasting simulation files from The Well, such as shear flow datasets with and without vortices. They then extracted activations from Walrus during forward passes, specifically from the final processor block before the decoder, where abstract representations are likely stored. Next, they calculated 'delta' concept directions by averaging and subtracting these activations across groups. Finally, they injected these directions back into the model using a steering function that modifies activations with a scaling factor, testing the causal impact on predictions. This approach builds on the linear representation hypothesis, which posits that features are represented linearly in activation space, and leverages activation steering to demonstrate control.
, Detailed in figures throughout the paper, show striking visual effects. For example, in Figure 2, negative injection of the vortex direction progressively suppressed vortical structures in shear flow, transforming them into smooth laminar flow as the steering strength increased. Conversely, Figure 3 demonstrates that positive injection induced vortices in laminar regimes, with well-formed structures appearing at moderate steering strengths. The researchers also isolated other concepts: diffusion steering (Figure 4) altered fluid interfaces to be more diffuse or sharp, while speed steering changed the temporal progression of simulations, causing vortices to form earlier or later. Most remarkably, concept directions transferred across unrelated physical systems: the vortex direction, derived from shear flow, induced rotational features in Euler quadrant flows and even transformed chemical gliders into spiral patterns in Gray-Scott reaction-diffusion systems (Figure 6), suggesting the model learned abstract, domain-general representations.
These have significant for AI-enabled scientific and model interpretability. They increase confidence that physics foundation models learn genuine physical principles rather than superficial correlations, as the steerable features align with human-understandable concepts. This opens avenues for counterfactual exploration—asking 'what if' scenarios in simulations—real-time error correction, and auditing model understanding through targeted interventions. For instance, researchers could test how a model responds to adding diffusion to a system or speeding up a process, providing insights into its internal reasoning. The transferability of concepts across domains hints at the potential for developing general-purpose scientific AI tools that can apply learned principles flexibly, much like how humans abstract laws from specific examples.
However, the study acknowledges limitations, particularly regarding the physical realism of steered . The paper notes that when spatial dimensions are included in transfer steering, outcomes can appear less physically plausible, such as mirrored field changes in positive and negative directions. In contrast, spatially-averaged directions tend to produce more natural-looking . Additionally, the distance of initial conditions from the desired regime affects success; forcing features into very different conditions requires higher steering strengths that may distort other fields. The researchers emphasize that further work is needed to definitively assess the physical consistency of these interventions and to explore layer-dependence, as features in intermediate layers might offer different steering capabilities. Despite these open questions, the study provides early evidence that scientific models can be made interpretable and controllable, bridging a gap between AI and fundamental physics understanding.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn