The Next Frontier in Mobility AI: How Trajectory Foundation Models Are Unlocking Smarter Cities

In the sprawling digital ecosystems of modern cities, every Uber ride, food delivery, and scooter trip generates a trail of spatio-temporal data—sequences of GPS coordinates and timestamps known as trajectories. These datasets hold the key to optimizing everything from traffic flow to urban planning, but until recently, analyzing them required specialized, task-specific models that struggled with generalization across diverse applications. Enter trajectory foundation models (TFMs), a burgeoning subclass of spatio-temporal foundation models inspired by the success of large language models, which promise to learn universal representations from massive unlabeled trajectory data. As detailed in a comprehensive 2025 tutorial paper by researchers from Aalborg University, Chongqing University of Posts and Telecommunications, and East China Normal University, TFMs are poised to revolutionize intelligent transportation by enabling a single model to handle tasks like travel time estimation, traffic analysis, trajectory similarity computation, and trajectory generation, all while reducing reliance on costly manual annotations. This shift mirrors the transformative impact of foundation models in NLP and computer vision, but applied to the complex, dynamic world of human and vehicle movement, where spatial patterns intertwine with temporal rhythms to create rich yet noisy data streams.

Ology behind TFMs hinges on self-supervised learning (SSL) techniques, which allow models to extract high-quality representations from vast volumes of unlabeled trajectory data without human intervention. The paper categorizes these approaches into four primary learning paradigms: contrastive learning, generative learning, generative contrastive learning (a hybrid approach), and causal learning. Contrastive learning-based TFMs, such as TrajRL and MM-Path, construct multiple views of the same trajectory—for example, by augmenting GPS data with satellite images—and use contrastive objectives to pull similar representations closer while pushing dissimilar ones apart, thereby capturing global trajectory patterns. Generative learning s, like RED and Toast, employ masked autoencoders or similar architectures to reconstruct corrupted or masked parts of trajectories, focusing on local structural details and enhancing robustness against data sparsity. Hybrid models, such as LightPath and START, combine these strategies to leverage both global and local information, though balancing the objectives remains a . Causal learning approaches, exemplified by TrajCL, integrate causal inference to mitigate spurious correlations from geospatial context, improving interpretability and generalization in tasks like trajectory classification.

From the surveyed literature demonstrate that TFMs significantly outperform traditional task-specific models across a wide range of downstream applications. For travel time estimation, models like PIM and RED have shown strong generalization by learning from map-matched or raw GPS trajectories, with RED's Transformer-based masked autoencoder achieving notable accuracy gains. In traffic analysis, multi-modality s like MM-Path, which integrates GPS trajectories with image trajectories from satellite data, outperform single-modality approaches by capturing richer environmental context. Trajectory similarity computation benefits from frameworks like TrajRL, which uses multi-faceted temporal features and contrastive learning to improve retrieval accuracy in large-scale datasets. Trajectory recovery and generation tasks, supported by models like GTR and ControlTraj, leverage generative techniques to infer missing data or synthesize realistic movement patterns, useful for data augmentation and simulation. The paper highlights that multi-modality s generally excel by combining data types—GPS, grid, textual, and image trajectories—though current implementations often limit themselves to two modalities, constraining their ability to handle more complex real-world scenarios with heterogeneous data sources.

Of TFMs extend far beyond academic research, offering tangible benefits for urban mobility, sustainability, and responsible AI deployment. By enabling more accurate travel time predictions and traffic flow modeling, these models can reduce congestion, lower emissions, and support greener urban planning initiatives, aligning with the paper's emphasis on sustainability-oriented AI. In logistics and ride-sharing services, TFMs could optimize routing and fleet management, while in public safety, they might enhance anomaly detection in movement patterns. The integration of causal learning also addresses ethical concerns by reducing biases from geospatial confounders, promoting fairness and transparency in applications like insurance pricing or law enforcement. Moreover, the shift toward foundation models reduces the need for extensive labeled datasets, lowering barriers for smaller organizations and fostering innovation in smart city technologies. As the authors note, this evolution toward "spatio-temporal general intelligence" could catalyze cross-disciplinary collaborations between data management communities and industry practitioners, driving the next wave of intelligent transportation systems.

Despite their promise, TFMs face significant limitations that must be addressed to realize their full potential. The paper identifies key s, including the noise and sparsity inherent in raw GPS data, the errors introduced by map-matching processes, and the difficulty of scaling multi-modality s to more than two data types. Hybrid approaches struggle with balancing generative and contrastive objectives, and current models often overlook long-range mobility patterns or fine-grained local details, depending on their design. Scalability to large, noisy datasets remains underexplored, and the environmental impact of training massive foundation models raises sustainability concerns, prompting calls for more efficient architectures and green optimization techniques. Additionally, the static nature of most pre-training processes limits adaptability to evolving data or new domains, highlighting the need for continued pre-training strategies that incorporate incremental learning without catastrophic forgetting.

Looking ahead, the paper outlines several promising research directions that could propel TFMs into new frontiers. Quantum trajectory data representation emerges as a cutting-edge avenue, leveraging quantum computing's superposition and parallelism to create compact, expressive encodings that enhance scalability and energy efficiency, potentially unlocking novel learning paradigms unattainable in classical settings. Responsible foundation models for trajectories emphasize the need for fairness, transparency, and reduced carbon footprints, advocating techniques like knowledge distillation and adaptive fine-tuning to ensure ethical deployment. Continued pre-training of TFMs proposes lifelong learning frameworks that incrementally incorporate new data, maintaining relevance in dynamic mobility environments. The authors also hint at intersections with quantum AI for spatio-temporal modeling, suggesting that future work could blend these advancements to build more robust, sustainable, and intelligent systems. As trajectory data continues to explode in volume and variety, TFMs stand at the cusp of transforming how we understand and optimize movement in an increasingly connected world, provided researchers tackle these open s with innovation and responsibility.

Source: Yang, S.B., Sun, Y., Cheng, Y., Lin, Y., Torp, K., Hu, J. 2025. Spatio-Temporal Trajectory Foundation Model: Recent Advances and Future Directions. In ACM Conference Proceedings.

The Next Frontier in Mobility AI: How Trajectory Foundation Models Are Unlocking Smarter Cities

Original Source

About the Author

Guilherme A.