AI Tracks Surgical Needle Paths Using Only Video

TL;DR

New AI forecasts precise needle movements in surgery from endoscopic video alone, cutting prediction errors by over 50% versus existing methods.

A new AI system can predict the future path of a surgical needle during suturing procedures by analyzing video footage, offering a potential boost to robotic-assisted surgery without relying on complex robotic data. Developed by researchers from the University of Macau and other institutions, the approach treats the needle tip as an agent navigating through pixel space, learning from expert demonstrations to anticipate movements step by step. This innovation addresses a critical in minimally invasive surgery, where accurate trajectory forecasting can enhance safety and efficiency by providing real-time guidance to surgeons, especially those with less experience. By operating directly from visual inputs, bypasses the need for kinematic signals from specific robotic platforms, making it applicable to a wide range of surgical videos and potentially improving access to high-quality care.

The key finding of the research, detailed in a paper titled "SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space," is that framing needle trajectory prediction as a sequential decision-making problem significantly improves accuracy. The SutureAgent system reduces Average Displacement Error (ADE) by 58.6% compared to the strongest baseline, Behavioral Cloning, as shown in experiments on a kidney wound suturing dataset. This metric, which measures the average point-wise distance between predicted and actual trajectories in pixels, dropped from 128.15 to 52.95 under one testing scenario. also excels in Final Displacement Error (FDE) and Discrete Fréchet Distance (FD), indicating better endpoint accuracy and overall shape fidelity, crucial for tasks like suturing where precise targeting is essential.

To achieve this, the researchers employed a goal-conditioned offline reinforcement learning framework that converts sparse annotations into dense reward signals. Each surgical trajectory in the dataset includes only nine manually annotated keyframes, which are insufficient for direct supervised learning. The team used cubic spline interpolation to generate dense per-frame reference positions from these keyframes, assigning confidence scores based on proximity to annotated frames. The SutureAgent model combines an observation encoder, which processes 128x128 RGB crops centered on the needle tip along with a guidance heatmap, with a Transformer to capture temporal dependencies. It then uses a Conservative Q-Learning (CQL) policy to autoregressively predict future waypoints, with actions defined by discrete directions and continuous step magnitudes, conditioned on goals derived from expert data during training.

, Based on a dataset of 1,158 trajectories from 50 patients, demonstrate robust performance across various metrics. In tests with six observed keyframes and three predicted ones, SutureAgent achieved an ADE of 52.95 pixels, FDE of 80.22 pixels, and FD of 81.97 pixels, outperforming all baselines including diffusion-based and imitation learning s. Under more challenging conditions with three observed and six predicted keyframes, it maintained advantages with a 31.3% reduction in ADE over Behavioral Cloning. Statistical analysis, including violin plots and cumulative distribution functions, shows that 90% of SutureAgent's predictions fall below an ADE of 100 pixels, compared to 45% for the next-best , highlighting its consistency and lower error variance. Qualitative visualizations in Figure 2 reveal that the model produces smoother, more accurate trajectories that closely match ground-truth paths in diverse surgical scenes.

Of this work extend to real-world surgical assistance, where it could support anticipatory planning and safer motion execution in robot-assisted procedures. By relying solely on endoscopic video, avoids the "kinematic bottleneck" that limits transferability between different robotic systems, making it suitable for existing clinical archives. This aligns with the trend toward task-level autonomy in surgery, helping surgeons operate under cognitive constraints like limited attention and time pressure. The researchers note that future work will involve extending the framework to broader laparoscopic procedures and validating it through ex-vivo experiments on robotic platforms, paving the way for practical integration into surgical workflows.

Despite its successes, the study has limitations outlined in the paper. The model requires sparse keyframe annotations, which, though less burdensome than dense labeling, still necessitate expert input. During inference, ground-truth future positions are unavailable, so the system uses polynomial extrapolation from observed points as pseudo-guidance, which may introduce errors in complex scenarios. Additionally, the evaluation is currently limited to a kidney wound suturing dataset, and generalization to other surgical tasks or environments remains untested. The offline reinforcement learning approach, while stable, depends on the quality of expert demonstrations and may not fully capture all nuances of real-time surgical dynamics, suggesting areas for further refinement in future research.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn