In remote environments where bandwidth is severely limited—such as space exploration or battlefield operations—transmitting high-quality visual data has been a major challenge. Now, researchers have developed an AI system that can send images using just 0.001 bits per pixel, achieving comparable performance to traditional methods while using only 10% of their bandwidth. This breakthrough could enable real-time visual analysis and decision-making in scenarios where communication resources are scarce.
The key finding is that the system, called Generative Semantic Coding (GSC), can reconstruct images for accurate analysis without requiring pixel-perfect reproduction. Instead of transmitting entire images, it sends only the most essential visual information—specifically, a small number of channels selected based on structural similarity—along with a text caption. This allows the receiver to generate a reconstructed image that preserves the details needed for tasks like depth estimation, object detection, and semantic segmentation.
The methodology integrates text guidance from a multi-modal large language model with structural information extracted from the original image. The system uses an encoder to generate a representation of the image, then dynamically selects the most significant channels—typically between 1 and 16—to transmit. These channels, combined with the text caption, guide a diffusion-based image generation process at the receiver end. The approach builds on controllable diffusion models, using a trainable module to inject encoded guidance and ensure structural consistency in the reconstructed image.
Results from three fundamental computer vision tasks demonstrate the system's effectiveness. For depth estimation on the KITTI dataset, using just 0.0069 bits per pixel achieved a δ1 score of 0.796, outperforming PerCo313 at 0.0329 bpp. On semantic segmentation with CityScapes, the method achieved 85.25% accuracy at 0.0011 bpp, compared to 61.87% for PICS at 0.0038 bpp. For object detection on COCO2017, it reached 0.894 precision at 0.0044 bpp, while MS-ILLM350 achieved 0.857 at 0.0496 bpp. These results show that the system maintains high performance across tasks while using significantly less bandwidth than existing methods.
The context for this work is critical in real-world applications where transmission resources are limited but analysis needs are high. For example, in planetary exploration, a robot on Mars might have limited power and bandwidth for sending data back to Earth, while the receiving station has abundant computational resources. Similarly, in battlefield scenarios, drones or robots need to transmit visual information for navigation and decision-making without requiring high-bandwidth connections. This method enables accurate visual analysis under these extreme conditions.
Limitations include the potential for redundant or noisy information in the transmitted channels, which may not always improve performance. The paper notes that using more channels doesn't necessarily lead to better results, and the top channel selection doesn't significantly enhance performance in some cases. Future work will focus on eliminating these redundant elements to achieve a more flexible balance between compression efficiency and analysis quality.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn