AI Reduces Flickering in VR Videos

TL;DR

A new method clusters errors and groups visual elements to render smoother, more realistic AR and VR scenes, reaching top benchmark quality.

As augmented and virtual reality technologies become more prevalent, the demand for hyperrealistic immersive content is growing. Traditional s for creating such content often struggle with dynamic scenes, leading to visual artifacts like flickering and blurry details that break the sense of immersion. Researchers from Seoul National University have developed a novel approach that addresses these issues by improving how AI reconstructs moving objects in 3D space, resulting in smoother and more accurate videos for applications in entertainment, training, and virtual experiences.

The key finding of this research is that existing 4D Gaussian Splatting s, which are used to render dynamic scenes, often fail to accurately reconstruct moving objects due to ambiguous pixel correspondences and inadequate splat densification in dynamic regions. The new introduces two complementary components: elliptical error clustering to pinpoint areas needing correction, and grouped 4D Gaussian splatting to ensure consistent mapping between splats and dynamic objects. This approach significantly improves temporal consistency and perceptual rendering quality, as demonstrated by a 0.39dB increase in PSNR on the Technicolor Light Field dataset compared to previous state-of-the-art s.

Ology involves a two-step process. First, the researchers train a Grouped 4D Gaussian Splatting model, which uses shared dynamic transforms for groups of splats to improve object-splat correspondence across frames. This is achieved by decomposing each splat's transform into a group-level motion and a relative static transform, with dynamic groups identified based on displacement vectors and spatial overlap. Second, an error correction stage is triggered periodically, where rendering errors are detected and clustered into elliptical shapes using DBSCAN based on spatial locality and color similarity. These error clusters are then categorized into missing-color or occlusion types through cross-view color consistency analysis, allowing for targeted corrections via backprojection addition or foreground splitting.

Show substantial improvements in both quantitative metrics and visual quality. On the Technicolor Light Field dataset, achieved a PSNR of 34.04, DSSIM1 of 0.0401, and LPIPS of 0.0809, outperforming previous approaches like Ex4DGS and STG. Qualitative comparisons, as shown in Figure 5, reveal enhanced details in dynamic areas such as clearer teeth, sharper clothing stripes, and improved car window boundaries. also demonstrated better temporal stability, with a tPSNR of 37.60 compared to 37.43 for the baseline, reducing flickering and jitter in rendered videos. Visualization of the error correction process, illustrated in Figures 6 and 7, shows how elliptical clustering and splat addition effectively address missing colors and occlusions.

Of this research are significant for industries relying on high-quality dynamic scene rendering, such as virtual reality, film production, and simulation training. By improving temporal consistency and detail reconstruction, enables more realistic and immersive experiences without the visual artifacts that can cause discomfort or break immersion. The publicly available source code supports further research and practical applications, potentially accelerating advancements in real-time rendering technologies. This work addresses a critical bottleneck in dynamic novel view synthesis, making it easier to incorporate real-world scenes into computer graphics for hyperrealistic content.

However, has limitations. It is best suited for rigid or geometrically contiguous deformations and assumes fixed colors per splat with keyframed motion, which may not handle significant appearance changes well. Translucent objects and volumetric effects, such as flames, remain challenging, as noted in the paper. Additionally, the approach requires additional training time and memory compared to baselines, with training time increasing from 1.90 to 3.21 hours and memory usage from 2.98 to 3.37 GB on an NVIDIA RTX A6000 GPU. Despite these constraints, the foundational techniques provide a flexible framework that could be adapted for more complex representations in future work.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn