AI Builds 3D Scenes from Just a Few Photos

TL;DR

A new method lets AI reconstruct 3D scenes from minimal images by generating its own virtual viewpoints, boosting accuracy without extra data.

Creating detailed 3D models from a handful of photos has long been a major in computer vision, limiting applications in virtual reality, digital preservation, and robotics. Traditional s often struggle with blurry textures, distorted geometry, or overfitting when only a few input images are available, as they lack sufficient viewpoints to understand a scene fully. Now, researchers from Zhejiang Sci-Tech University and China Jiliang University have developed CuriGS, a curriculum-guided framework that enhances 3D Gaussian Splatting to tackle this problem by intelligently expanding the training data during the learning process. This approach allows AI to generate more realistic and geometrically consistent 3D reconstructions from extremely sparse inputs, such as just three photos, opening doors for more efficient and accessible 3D modeling in real-world scenarios.

The key finding of this research is that CuriGS significantly improves the quality of 3D scene reconstruction from limited photos by introducing and managing virtual viewpoints, called student views, during training. The framework generates these pseudo-views by slightly perturbing the positions of the original camera poses, creating new angles that the AI can use to learn better scene geometry and appearance. By progressively increasing the diversity of these virtual viewpoints and only promoting the highest-quality ones into the training set, CuriGS helps the model avoid overfitting to the few available images and enhances its ability to generalize to unseen views. This in sharper details, reduced texture drift, and more accurate geometry compared to previous state-of-the-art s, as demonstrated across multiple benchmark datasets.

Ology behind CuriGS involves a three-stage process centered on a curriculum schedule that guides the AI's learning. First, for each original camera view, known as a teacher view, the system generates groups of student views with different levels of perturbation—small adjustments to the camera's angle and position—simulating new viewpoints around the original ones. During training, the curriculum starts with student views that have minimal perturbations to ensure stability, then gradually unlocks groups with larger perturbations, exposing the model to increasingly diverse viewpoints over time. At each iteration, a subset of student views is randomly sampled from the active group and optimized using depth-correlation and co-regularization constraints to enforce geometric consistency, while a multi-signal metric evaluates their quality based on structural similarity, perceptual similarity, and image quality.

From extensive experiments show that CuriGS outperforms existing s in rendering fidelity and geometric consistency under sparse-view conditions. On the LLFF dataset with only three input views, CuriGS achieved a PSNR of 21.10 dB and an SSIM of 0.732, the highest among compared 3DGS variants, indicating better pixel-level accuracy and structural similarity. Visual comparisons in Figure 4 reveal that CuriGS produces sharper details and reduces texture drift compared to baselines like FSGS and CoR-GS. For large-scale scenes in the MipNeRF-360 dataset with 24 views, it attained a PSNR of 24.10 dB and an SSIM of 0.759, with qualitative in Figure 5 showing improved perceptual fidelity and reduced color shifts. On the object-centric DTU dataset with three views, CuriGS achieved the highest SSIM of 0.870 and preserved fine geometric structures better than other s, as illustrated in Figure 6.

Ablation studies further validate the effectiveness of the curriculum guidance, as shown in Table 2. Without it, performance dropped significantly; for example, on the DTU dataset with three views, PSNR decreased from 22.65 dB to 18.46 dB, and SSIM fell from 0.947 to 0.922. This demonstrates that the progressive introduction and selective promotion of student views are crucial for mitigating overfitting and enhancing generalization. The framework's ability to maintain steady performance gains throughout training, as visualized in Figure 7, contrasts with models without curriculum guidance that show deteriorating metrics due to memorization of limited viewpoints.

Of this research are substantial for fields requiring efficient 3D reconstruction from limited data, such as virtual reality, where quick scene capture from few angles can enhance user experiences, or cultural heritage preservation, where detailed models must be created from scarce historical photographs. By enabling high-quality reconstructions with fewer input images, CuriGS reduces the time, cost, and equipment needed for 3D modeling, making it more accessible for small-scale projects or real-time applications. The curriculum-guided approach also sets a new direction for virtual-view learning in AI, potentially inspiring similar strategies in other tasks like medical imaging or autonomous navigation, where data scarcity is a common .

However, the study acknowledges limitations, including dependence on the quality of generated student views and the need for careful hyperparameter tuning in the curriculum schedule. The framework's performance may vary with scene complexity or extreme sparsity, and it relies on pretrained depth estimation for regularization, which could introduce biases if the estimator is misaligned with the domain. Future work could explore adaptive curriculum designs or integrate additional cues like semantic information to further improve robustness and applicability across diverse real-world scenarios.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn