AIResearch AIResearch
Back to articles
Data

Hidden Markov Models Predict Tourist Movements from Social Media Data

In an era where social media has become a digital diary for travelers, researchers are leveraging this vast trove of geotagged data to predict tourist behavior with unprecedented accuracy. A new study…

AI Research
March 26, 2026
3 min read
Hidden Markov Models Predict Tourist Movements from Social Media Data

In an era where social media has become a digital diary for travelers, researchers are leveraging this vast trove of geotagged data to predict tourist behavior with unprecedented accuracy. A new study from an international team at Léonard de Vinci Research Center and Tianjin University demonstrates how Hidden Markov Models (HMMs) can forecast where tourists will visit next, using Paris as a testing ground. This approach represents a significant leap in tourism marketing, offering potential applications from personalized recommendations to urban planning. The research tackles of understanding tourist movements through the digital traces they leave on platforms like TripAdvisor, Booking, and Instagram, where users frequently share photos and reviews with embedded geographical information.

Ology begins with a sophisticated data analysis phase that defines what constitutes a tourist and a tourist stay. According to the researchers, a tourist stay is a succession of days where a tourist publishes at least one comment per day, with stays merged if the break between comments is seven days or less. From over 1 million reviews posted in Paris between 2013 and 2019, the team extracted 11,471 sequences of visited places, focusing on six major landmarks: Arc de Triomphe, Notre Dame Cathedral, Eiffel Tower, Luxembourg's Garden, Quai de Seine, and Louvre's Museum. These sequences were then modeled using a Frequency Prefix Tree (FPT), which hierarchically represents the order and frequency of visited locations, but this initial tree was too large for practical use with 599 nodes.

To overcome this complexity, the researchers developed a novel machine learning approach called the Relaxed Alergia algorithm, a form of grammatical inference adapted for big data. This algorithm recursively merges compatible nodes in the FPT by comparing the relative frequencies of visited places, ultimately reducing the structure to a more manageable Hidden Markov Model. The resulting HMM contains 37 nodes with 18 end nodes, where each node represents a state in the tourist's journey and arcs between nodes are weighted with probabilities of transitioning from one location to another. The team validated this model by comparing the probability of sequences in the original data set to their probability in the HMM, finding an initial mean absolute percent error (MAPE) of 20.8%, which was further refined through an update process using the Baum-Welch algorithm.

Show that the updated HMM significantly improves prediction accuracy, reducing the MAPE to 8.9% with errors ranging from 1.2% to 25%. For example, the model can predict suffixes (future visited places) based on a tourist's current sequence, such as forecasting a visit to the Louvre after seeing the Arc de Triomphe. In tests, the MAPE for these predictions dropped from 36% in the initial HMM to 9.1% after updating, demonstrating the model's adaptability and relevance. The researchers note that while some anomalies, like the placement of the Louvre in sequences, initially caused higher errors, the update process effectively compensated for these, making the HMM a robust reflection of real-world tourist behavior.

This research has profound for the tourism industry, offering a flexible tool that can be tailored to specific tourist profiles based on nationality, age, or gender. Unlike traditional s like deep learning or data mining, which struggle with short sequences or lose rare behaviors, the HMM approach preserves all possible movements and adapts to new data. Future work aims to compare this with deep learning techniques and develop multiple decision tools for different tourist segments. The study underscores how machine learning can transform social media data into actionable insights, potentially revolutionizing how destinations manage demand and engage visitors. Demessance et al., 2025, Hidden Markov Model for Tourism.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn