Face Super-Resolution Now Works at Any Scale and Resolution

TL;DR

New AI research sharpens blurry face images at any input size or scale, delivering sharper results without retraining for each case.

Face super-resolution (FSR), the process of enhancing low-resolution facial images to high-resolution versions, is a critical technology with wide-ranging applications from surveillance to digital forensics. However, existing s have been hamstrung by a fundamental limitation: they typically work only at fixed up-sampling scales and are highly sensitive to variations in input image size. This rigidity makes them impractical for real-world scenarios where facial images come in diverse resolutions, such as in security footage or social media. A new study from researchers at National Chiao Tung University introduces ARASFSR (Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution), a that overcomes these constraints using implicit neural representation networks. This breakthrough enables super-resolution at any desired scale while maintaining robustness across different input resolutions, potentially revolutionizing how we process facial images in unconstrained environments.

The ARASFSR framework employs an implicit image function that maps coordinates to RGB values, allowing for continuous image representation at arbitrary resolutions. At its core, takes 2D deep features, local relative coordinates, and up-sampling scale ratios as inputs to predict RGB values for each target pixel. To address the spectral bias problem where neural networks struggle to capture high-frequency details, the researchers developed a local frequency estimation module that predicts high-frequency information about facial texture. This module uses an encoder-decoder architecture to estimate image-specific high-frequency latent codes in a probabilistic manner, enhancing the network's ability to reconstruct fine facial details like skin texture and hair strands.

Additionally, the system incorporates a global coordinate modulation module that provides crucial facial structure guidance. Unlike patch-based approaches that lack global context, this module uses positional encoding to map normalized coordinates to higher dimensions, then modulates the implicit function through periodic activations. This gives the network a global view of facial landmarks, allowing it to leverage prior knowledge about facial structure regardless of input size variations. The architecture also includes a skip connection to preserve high-frequency information and uses Charbonnier loss for training stability. Together, these components enable the system to handle both arbitrary up-sampling scales and varying input resolutions without retraining.

Quantitative evaluations demonstrate ARASFSR's superiority over existing s. On the CelebAHQ dataset, ARASFSR achieved PSNR scores of 40.1954 dB at ×1.5 scale and 32.4799 dB at ×8 scale when using EDSR as encoder, outperforming state-of-the-art implicit representation s like LIIF, LTE, and DIINN. showed particular strength in out-of-distribution scenarios, maintaining performance even at scales far beyond those seen during training. In real-world tests on surveillance-style datasets like CelebAHQ-NN-JPEG and SCface, ARASFSR produced clearer facial landmarks with fewer artifacts compared to existing s, which often suffered from misplaced eyes or distorted facial features when input resolution changed.

Of this research extend across multiple domains where facial image quality matters. In security and surveillance, it could enhance low-quality footage from cameras at varying distances, improving identification accuracy. For digital forensics, it offers tools to recover details from degraded evidence. The entertainment industry could use it for restoring archival footage or enhancing visual effects. More fundamentally, ARASFSR demonstrates how implicit neural representations can be adapted for domain-specific tasks by incorporating structural priors, suggesting similar approaches could benefit other computer vision applications requiring flexibility in scale and resolution.

Despite its advancements, has limitations that warrant further investigation. The research primarily focused on aligned facial images, leaving unaligned or heavily occluded faces as s for future work. The computational requirements of implicit representation networks, while not explicitly quantified in the paper, may present practical deployment hurdles in resource-constrained environments. Additionally, while shows robustness to input size variations, extreme resolution differences or severe compression artifacts might still degrade performance. The authors acknowledge these areas for improvement while emphasizing their 's significant step toward practical, flexible face super-resolution.

Reference: Tsai et al. (2025) Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution with Implicit Representation Networks. arXiv preprint.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn