AIResearch AIResearch
Back to articles
Hardware

Transformer-Guided DRL Cuts eVTOL Training Time by 75%

New AI method slashes data needs for urban air mobility trajectory optimization, boosting efficiency and accuracy.

AI Research
November 20, 2025
3 min read
Transformer-Guided DRL Cuts eVTOL Training Time by 75%

Urban air mobility is poised to revolutionize city transportation, with electric vertical take-off and landing (eVTOL) aircraft offering a quiet, efficient alternative to ground traffic. However, a major hurdle is the high energy consumption during takeoff, which demands optimal trajectory designs to minimize power use. Traditional control s like dynamic programming struggle with the complex, nonlinear dynamics of eVTOL systems, while deep reinforcement learning (DRL) can handle these s but often requires massive amounts of training data, slowing down development. In a breakthrough, researchers have integrated transformers—neural networks known for sequence modeling—with DRL to drastically reduce training time and improve energy efficiency, potentially accelerating the adoption of eVTOLs in crowded urban skies.

To tackle the training inefficiency of DRL in eVTOL trajectory optimization, the team developed a transformer-guided DRL framework. This approach uses a transformer model trained on 1,000 optimal takeoff trajectories generated via NASA's Dymos framework, which simulates eVTOL dynamics under varying flight conditions like acceleration limits and efficiency factors. The transformer, with its self-attention mechanism, learns temporal patterns in control variables—power and wing angle—and generates action proposal distributions at each time step. These distributions guide the DRL agent, specifically a soft actor-critic (SAC) algorithm, by narrowing the action space to realistic, energy-efficient options, reducing the exploration burden. The DRL agent then selects actions as z-scores from these distributions, operating in a Gymnasium-based environment that enforces takeoff constraints, such as reaching 305 meters altitude and 67 m/s horizontal velocity, while penalizing energy use through a reward function designed to encourage minimal consumption and safe flight paths.

Demonstrate a significant leap in performance: the transformer-guided DRL agent learned optimal takeoff trajectories in just 4.57 million time steps, a 75% reduction compared to the 19.79 million steps required by vanilla DRL. In terms of energy accuracy, the new achieved 97.2% alignment with simulation-based optimal references, slightly outperforming vanilla DRL's 96.3%. For instance, in a verification case, the transformer-guided approach consumed 1,740 watt-hours over 21.5 seconds, closely matching the reference trajectory's 1,693 watt-hours over 19.6 seconds, while all s successfully met the takeoff conditions without violating safety constraints. This efficiency gain highlights how the transformer's ability to model sequential control data streamlines the learning process, enabling faster convergence to near-optimal solutions in complex, constrained environments.

Of this research extend beyond eVTOLs to broader fields like robotics and autonomous systems, where efficient training is critical for real-world deployment. By cutting data requirements, transformer-guided DRL could lower computational costs and accelerate the development of AI-driven control systems, making urban air mobility more viable and sustainable. It also opens doors for applications in cybersecurity, where rapid adaptation to dynamic threats is essential, and in general AI, by demonstrating how hybrid models can overcome sample inefficiency in reinforcement learning. As cities grapple with congestion, this innovation could pave the way for safer, greener aerial transportation, supported by regulatory advancements from bodies like the FAA.

Despite its successes, the study has limitations, such as its focus on a specific eVTOL model and a limited set of flight conditions, which may not capture all real-world variabilities like weather or obstacle avoidance. Future work will test the framework under diverse scenarios and incorporate additional safety constraints, potentially exploring alternative DRL or transformer architectures to enhance robustness. The integration of sequential modeling with reinforcement learning marks a promising step toward data-efficient AI, but further validation is needed to ensure scalability and reliability in practical urban environments, where unexpected factors could impact performance.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn