AI Makes Compressed Videos Sharper and Faster

TL;DR

A new AI model restores detail in compressed videos in real time, improving quality without slowing down playback or increasing file size.

In an era where 4K and 8K video streaming is becoming the norm, the relentless push for higher compression ratios to save bandwidth and storage has often come at a steep cost: visual artifacts like blurring, blocking, and ringing that degrade the viewer's experience. While deep learning has offered solutions, most existing s are 'non-blind,' requiring precise knowledge of encoding parameters like Quantization Parameters (QPs) and training separate models for each setting. This approach falls short in real-world scenarios—such as video transcoding or streaming with Digital Rights Management (DRM)—where QPs may be unavailable, leaving a gap for more adaptive technologies. Enter a groundbreaking study from researchers at Nanjing University of Information Science and Technology and Tampere University, which introduces a blind quality enhancement framework that not only tackles these limitations but also dynamically optimizes computational efficiency, promising smoother, higher-quality video without the usual trade-offs.

The proposed centers on two innovative components: a Degradation Representation Learning (DRL) module and a hierarchical termination mechanism. The DRL module employs a pretrained encoder to extract multi-scale degradation representations from compressed video frames, using a dual-supervision strategy that combines contrastive learning and classification. Contrastive learning, via InfoNCE loss, pulls together similar artifact regions and pushes apart dissimilar ones, enhancing the discrimination of local distortion patterns, while classification learning with cross-entropy loss imposes semantic constraints to stabilize distortion level representations. This disentangles degradation from content, providing fine-grained, spatially aware guidance for artifact removal. In parallel, the hierarchical termination mechanism adapts computational resources based on the detected degradation level—lightly compressed regions undergo fewer processing stages, while heavily distorted ones receive more, cutting inference time by up to 50% for low-QP videos like QP22 compared to high-QP ones like QP42. The artifact reduction itself leverages a dual-branch architecture: one branch uses Multi-Swin Transformers for global context modeling, and the other applies multi-scale dilated convolutions for local detail recovery, ensuring comprehensive spatiotemporal dependency exploitation.

Experimental on the MFQEv2 dataset, using HEVC and VVC compression standards, underscore 's superiority. In blind quality enhancement, it achieved a 110% PSNR improvement over the state-of-the-art blind at QP22, jumping from 0.31 dB to 0.65 dB, and led in SSIM metrics at higher QPs like QP37 and QP42. Notably, it maintained robust performance on unseen QPs (e.g., QP20, 25, 30, 35, 40), with minimal drops—only 0.03 dB in PSNR for HEVC QP35 versus seen QP37—while competitors like FBCNN showed degradations. Efficiency metrics revealed that the hierarchical termination halved inference time for QP22 (0.6 hours vs. 1.2 hours for QP42) and reduced TFLOPs by an average of 73.7% compared to FBCNN across resolutions. Qualitative assessments further highlighted its edge, with visualizations showing better preservation of textures and details, such as natural skin shading and streamlined horse tails, where other s produced over-smoothed or blurry outputs.

Of this research are profound for industries reliant on high-quality video, from streaming services and online education to surveillance and virtual reality. By enabling a single model to handle various compression levels blindly, it reduces deployment costs and enhances adaptability in unpredictable environments. The dynamic computational allocation could lower energy consumption and latency, critical for real-time applications, while the improved visual continuity—evidenced by suppressed frame-to-frame quality fluctuations—promises a more immersive user experience. As video data continues to explode, this approach paves the way for smarter, more efficient AI-driven enhancements that don't sacrifice performance for practicality, potentially influencing future standards in video codecs and streaming protocols.

Despite its advancements, the study acknowledges limitations, such as the dependency on pretraining the DRL module, which may require extensive data, and the focus on specific datasets like MFQEv2, leaving generalizability to other compression types or real-world noise unverified. Future work could explore integrating this framework with emerging video formats or expanding it to handle additional artifacts beyond compression. Nonetheless, represents a significant leap in blind video quality enhancement, balancing innovation with practical efficiency, and sets a new benchmark for AI in multimedia processing.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn