Artificial intelligence is taking a significant step toward human-like interaction with computers. Researchers have developed Game-TARS, a system that learns to play video games and perform digital tasks using the same keyboard and mouse inputs as people. This approach moves beyond specialized programming for each game, enabling the AI to adapt to new environments without custom code. The findings, detailed in a recent paper, show that Game-TARS achieves near-human performance in various games and outperforms leading models like GPT-5 and Gemini-2.5-Pro, suggesting a path toward more versatile AI assistants for everyday computing.
Game-TARS operates by processing screen images and generating low-level actions such as mouse movements and key presses, mimicking how humans control computers. The system was trained on a massive dataset of 500 billion tokens, including 20,000 hours of gameplay and other computer-use trajectories. A key innovation is its 'sparse-thinking' strategy, where the AI reasons internally only at critical decision points, balancing depth with efficiency. This method reduces computational costs while maintaining high performance, as shown in experiments where it used fewer tokens per step compared to alternatives.
The training involved a continual pre-training phase with a decaying loss function to prioritize learning from novel actions over repetitive ones, enhancing the model's ability to handle complex tasks. In post-training, techniques like reinforcement fine-tuning and multimodal prompts improved instruction following and long-term memory. For instance, the AI can retain and compress past interactions using a dual-layer memory system, allowing it to manage tasks requiring extended planning, such as navigating open-world games or executing multi-step commands.
Results from benchmarks like Minecraft and web games demonstrate Game-TARS's robustness. In Minecraft, it achieved success rates up to 72% in exploration tasks, doubling the performance of previous state-of-the-art models. On unseen games, it nearly matched human players in racing and survival scenarios, highlighting its generalization capabilities. The paper notes that scaling the training data and model size consistently improved outcomes, reinforcing that simple, unified representations combined with large-scale data are effective for building generalist agents.
This advancement has implications for automating routine computer tasks, from software use to data management, potentially increasing productivity in fields like coding and GUI automation. However, the study acknowledges limitations, such as the model's performance dependency on the quality and diversity of training data, and areas where reasoning may still lag behind human intuition in unpredictable scenarios. Future work could explore integrating more real-world data to bridge these gaps, paving the way for AI that seamlessly assists in digital environments.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn