AI Turns One Image Into Consistent 3D Vector Graphics

TL;DR

This method generates multi-view vector graphics with matched geometry and color, cutting path errors by 26.5% and color shifts by 83.5%.

Designers and artists often struggle to create consistent vector graphics from multiple viewpoints, requiring time-consuming manual adjustments that can lead to geometric and stylistic inconsistencies. A new AI framework addresses this by automatically generating multi-view Scalable Vector Graphics (SVGs) from a single input, offering a scalable route for asset creation and semantic vector editing. This approach bridges generative modeling with structured vector representation, enabling practical applications like turntable-style visualization and multi-view logo generation without the need for extensive retraining or manual intervention.

The researchers developed a three-stage framework that produces multi-view SVGs with strong geometric and color consistency from a single SVG input. First, the input SVG is rasterized and lifted to a 3D representation using the Trellis model, which renders the object under target camera poses to produce multi-view images. This step ensures geometry plausible novel views by leveraging 3D Gaussian splatting for efficient rendering. Next, a spatial memory mechanism extends the temporal memory of Segment Anything 2 (SAM2) to establish part-level correspondences across neighboring views, yielding cleaner and more consistent vector paths and color assignments. Finally, during raster-to-vector conversion, path consolidation and structural optimization reduce redundancy while preserving boundaries and semantics, resulting in compact and editable SVGs.

In ology, the framework begins by converting the input SVG into a raster image and rendering 3D consistent multi-view rasters using Trellis, with a lightweight LoRA-tuned FLUX model applied for appearance harmonization to increase intra-region homogeneity and boundary contrast. The refined rasters are processed by a Spatial SAM2 module, which replaces temporal adjacency with spatial nearest-neighbor traversal on the viewing sphere to maintain part-level consistency. This involves uniformly sampling camera viewpoints, using a pseudo-sequential traversal to minimize geometric discontinuities, and applying a residual loop to detect missing regions in uncovered foreground areas. The segmented rasters are then vectorized per part using VTracer, followed by vector-domain consolidation that sparsifies colors, cleans micro paths, and aligns colors to a reference palette from the input SVG using the CIEDE2000 color-difference metric.

Show that this outperforms Adobe Illustrator's Turntable (Beta) in both quantitative and qualitative measures. Quantitatively, as reported in Table 1, the approach reduces the path-count deviation from the input SVG by 26.5%, lowers the variation in color usage between adjacent viewpoints by 83.5%, and decreases the average number of paths by 11.6%. Qualitatively, Figure 2 illustrates that Adobe Turntable exhibits geometric inconsistencies, structural confusion, and color drift, whereas the new maintains coherent geometry, clear part separation, and consistent appearance across views. Ablation studies in Figures 3 and 4 confirm that Spatial SAM2 enhances segmentation robustness compared to original SAM2 tracking mode, which suffers from error accumulation, and segmentation-free baselines that fail to reconstruct parts reliably.

Of this work are significant for creative workflows, as it enables scalable generation of editable vector assets for applications like turntable visualization and multi-view icon design. By producing SVGs with fewer paths and more stable colors, improves editability and reduces manual effort, potentially streamlining design pipelines in industries such as advertising and digital art. The framework's ability to maintain cross-view consistency without retraining makes it a practical tool for real-world use, though it currently focuses on single-object inputs and may face s with intricate topologies or open-path structures.

Limitations of the approach include constraints from current segmentation models, particularly on objects with intricate topology or numerous fine-grained components, as noted in the paper's conclusion. The framework does not yet extend to scene-level vector graphics requiring instance reasoning or world-coordinate camera transformations, and vectorization remains most reliable for closed, well-formed paths. Future work could address these issues by developing stronger segmentation priors and more robust cross-view correspondence strategies, but for now, represents a substantial step forward in bridging raster-based synthesis with structured vector representations.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn