AI Attacks on Self-Driving Cars Often Fail in Practice

TL;DR

Adversarial patches can fool AI into stopping vehicles, but steering attacks are blocked by built-in safety systems, complicating autonomous driving thr...

A new study exposes critical gaps in how we assess the security of autonomous vehicles against AI-driven attacks. Researchers from Carnegie Mellon University have systematically tested adversarial machine learning attacks on self-driving car systems, revealing that many previously feared threats may be less effective in practice than assumed. Their work, conducted using the CARLA driving simulator and its public leaderboard of top-performing AI agents, shows that while some attacks can reliably halt vehicles, others are easily neutralized by the non-AI components that real-world driving systems rely on for safety. This finding s the common focus on attacking isolated AI models and underscores the need to evaluate entire driving pipelines to understand true risks.

The key finding is that adversarial patches—crafted images placed in the environment to mislead AI—can successfully stop autonomous vehicles, but attempts to steer them off course often fail. The researchers evaluated two attack objectives: stopping the vehicle and causing it to steer erroneously, such as veering into opposing lanes or missing turns. They tested these against three open-source driving agents from the CARLA Leaderboard: TCP, NEAT, and Rails. For stopping attacks, optimized patches that accounted for lighting, color, and resolution effects in the simulator consistently caused all three agents to bring the vehicle to a halt, triggering route completion failures. However, steering attacks proved far less effective, as agent-specific modules like PID controllers and GPS-based rules frequently overruled the malicious AI predictions.

Ology leveraged the CARLA simulator and its leaderboard to create a holistic, reproducible evaluation framework. Unlike prior work that attacked ML models in isolation or used highly customized systems, this study inserted adversarial patches into the simulator at runtime using texture streaming, without modifying any agent code. The patches were optimized using a white-box threat model, where attackers have access to the target ML model's parameters, and were designed to minimize specific loss functions for stopping or steering. Critical adjustments included projecting patches to simulate real-world appearances, perturbing colors to account for lighting, and blurring to mimic camera resolution effects. This allowed the team to test attacks across multiple driving scenarios, lighting conditions, and locations in CARLA's Town 2 environment.

Analysis, detailed in figures from the paper, shows stark differences between attack types. For stopping attacks, as shown in Figure 3, a lighting-optimized patch successfully stopped the TCP agent, whereas naive or random patches failed. Similarly, Figures 4a and 4b demonstrate successful stops against Rails and NEAT agents. In contrast, steering attacks often manipulated the ML model's predictions—such as inducing a left-turning aim angle—but did not translate into actual steering commands. Figures 5 and 6 illustrate this disconnect: adversarial patches altered predicted aim angles, but the vehicle's steering commands remained unchanged due to GPS-based rules and PID controllers. Only when researchers manually disabled these rules, as shown in black lines in the figures, did some steering attacks succeed, highlighting how agent design can mitigate threats.

Are significant for both AI security and autonomous vehicle development. This research suggests that focusing solely on hardening ML models may be insufficient; safety also depends on the broader system architecture, including non-AI modules like controllers and rules. The study proposes creating a standardized leaderboard for evaluating adversarial robustness in driving agents, similar to the CARLA Leaderboard, to foster stronger attacks and more resilient designs. It also points to future directions, such as exploring cross-modal attacks that target sensor-fusion models combining RGB and LiDAR, and expanding threat models to consider agent-specific code and configurations.

Limitations of the work include its reliance on the CARLA simulator, which, while widely used, may not fully capture real-world complexities. The study evaluated only three agents and focused on RGB-based attacks, leaving out multi-modal scenarios. Additionally, the threat model assumed white-box access to ML models and non-adaptive patches, which may not reflect all real-world attacker capabilities. The researchers note that steering attacks were impossible in at least 30% of locations in Town 2 due to GPS-based rules alone, indicating that environmental factors can further constrain vulnerabilities. These gaps underscore the need for continued research in more diverse and high-fidelity settings.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn