A breakthrough in artificial intelligence research demonstrates that robots can learn complex skills from entirely simulated data, matching the performance of models trained on expensive, real-world datasets. This finding, from researchers at Shanghai AI Laboratory and Peking University, s the long-held assumption that physical robot interactions are essential for training capable AI systems. By creating a massive, high-fidelity synthetic dataset called InternData-A1, the team has shown that simulation can provide a scalable and accessible alternative to resource-intensive real data collection, potentially democratizing advanced robotics research.
The core is that a Vision-Language-Action (VLA) model pre-trained exclusively on the synthetic InternData-A1 dataset performs comparably to the leading model, known as π0, which was trained on a closed-source, large-scale real-robot dataset called the π-dataset. The researchers used the same architecture as π0, pre-training a new model from scratch using only InternData-A1, and then evaluated it across a broad suite of tasks. In simulations, the synthetic-pretrained model achieved a 60.0% average success rate across 49 tasks in easy settings and 26.5% in hard settings, outperforming the official π0 model by 5% and 6.5%, respectively. In real-world tests on five regular tasks and four dexterous tasks involving three different robot embodiments, the InternData-A1 model matched the performance of π0, with an average success rate of 63% across all nine tasks, as detailed in Figure 5 of the paper. This marks the first evidence that synthetic data alone can rival the strongest real-world data for pre-training generalist robot policies.
Ology centers on InternData-A1, a dataset comprising over 630,000 trajectories and 7,433 hours of simulated robot interaction. It covers 4 robotic embodiments, 18 distinct skills, 70 tasks, and 227 scenes, including manipulation of rigid, articulated, deformable, and fluid objects. The data was generated through a fully autonomous, decoupled simulation pipeline that separates environment construction, skill composition, domain randomization, and rendering. As illustrated in Figure 3, users can design tasks by retrieving assets like robots and objects from a library and composing modular atomic skills—such as pick, place, or handover—via simple configuration commands. The pipeline uses the CuRobo motion planner to interpolate actions and includes extensive domain randomization, such as perturbing camera views and lighting, to enhance visual diversity. Optimizations, including stage decoupling and dynamic resource scheduling, allowed the system to produce 209.7 hours of robot data per day on 8 RTX 4090 GPUs at a cost below $0.003 per episode, enabling scalable synthesis with minimal manual tuning.
Extend beyond matching baseline performance. In sim-to-real transfer experiments, the model demonstrated surprising zero-shot capability, achieving success rates over 50% on ten selected tasks without any real-world fine-tuning. For example, tasks like closing a microwave or sweeping trash achieved 87% and 60% success rates, respectively, using only 500 simulated episodes for post-training, as shown in Figure 7. Analysis revealed that for simpler tasks involving basic skills like pick-and-place, 200 simulated episodes could match the performance of 200 real episodes, while more complex tasks required up to 1,600 simulated episodes to achieve parity—a simulation-to-real data ratio within 8:1. The paper also includes ablation studies, summarized in Table 4, showing that all components of the dataset—pick-and-place tasks, articulation manipulation, base tasks, and long-horizon tasks—contribute meaningfully to pre-training effectiveness, with removal of any part leading to performance drops.
Are significant for the field of embodied AI, as they lower the barrier to large-scale robotic data creation. Real-world data collection is costly, requiring skilled operators, specialized hardware, and extensive labor, making it inaccessible to most research groups. InternData-A1, by contrast, is open-sourced along with its generation pipeline, offering a reproducible and affordable supplement for training generalist robot models. This could accelerate innovation in areas like household assistance, logistics, and manufacturing, where robots need to handle diverse objects and tasks. The success in sim-to-real transfer, particularly for dexterous tasks like folding clothes or unscrewing caps, suggests that high-fidelity simulation can bridge the gap to real-world deployment, reducing reliance on physical trials.
However, the research acknowledges limitations. Due to constraints in physics simulators, highly dexterous tasks such as tying shoelaces or threading a needle remain challenging to simulate accurately. The paper notes that future work will need to expand task diversity and dexterity to further establish simulation as a cornerstone for VLA models. Additionally, while the dataset shows strong performance, it may not capture all nuances of real-world dynamics, and the sim-to-real gap, though minimized, still exists for the most complex interactions. These factors highlight areas for improvement as simulation technology advances.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn