In the rapidly evolving landscape of generative AI, where large language models have repeatedly matched or surpassed human performance in verbal creativity tasks, a new study exposes a critical frontier where machines still fall short: visual imagination. Research published by a team of neuroscientists and computer scientists from institutions including the Bellvitge Biomedical Research Institute and the Computer Vision Center reveals that when it comes to generating creative images from abstract prompts, human artists and even non-experts consistently outperform AI systems. This finding s the prevailing narrative of AI's creative supremacy and suggests that visual creativity depends on distinctly human capacities that current models cannot readily replicate. The study, which directly compared human participants with a state-of-the-art Stable Diffusion model, provides compelling evidence that the human-AI gap in creativity is far from closed, particularly in domains requiring perceptual nuance and contextual sensitivity beyond mere computational recombination.
The researchers employed a rigorous four-phase ology centered on the Test of Creative Imagery Abilities (TCIA), a validated assessment tool for visual creativity. In Phase I, they recruited 27 visual artists and 26 non-artists, presenting them with 12 abstract stimuli and asking them to mentally generate images before selecting one to draw on paper, resulting in 660 human-generated images. Phase II involved generating comparable images using a fine-tuned Stable Diffusion XL Base 1.0 model under two conditions: Human-Inspired GenAI, where prompts included specific ideas from human participants, and Self-Guided GenAI, with only basic, abstract prompts. This design allowed the team to investigate the impact of human guidance on AI creativity, addressing criticisms about AI's lack of true creative agency. The final dataset comprised 1,000 images across four categories, which were then evaluated in Phases III and IV by 255 human raters and GPT-4o, respectively, using five creativity dimensions: Liking, Vividness, Originality, Aesthetics, and Curiosity.
Revealed a clear and consistent creativity gradient: Visual Artists > Non-Artists ≥ Human-Inspired GenAI > Self-Guided GenAI. Human-generated images were rated significantly higher than AI-generated ones overall, with visual artists achieving the highest scores. Crucially, the Human-Inspired GenAI condition, which incorporated human ideas into prompts, performed on par with non-artists, while the Self-Guided GenAI group lagged substantially behind. According to the paper, pairwise comparisons showed significant differences between most categories, except between Non-Artists and Human-Inspired GenAI, where the difference was non-significant. This highlights the marked effect of human prompting on AI's creative output. Meanwhile, GPT-4o's ratings diverged sharply from human judgments: while human raters displayed conservative, discriminating scores across categories, GPT-4o assigned more generous ratings with less discrimination, sometimes even favoring AI-generated images. Factor analysis confirmed that human and AI raters operated on fundamentally different patterns, with human ratings showing greater consistency and agreement.
These have profound for our understanding of AI creativity and its limitations. The study suggests that AI's previous successes in verbal divergent thinking tasks—where it often outperforms humans—may not generalize to visual domains due to fundamental differences in how text and images are processed. Text is sequential and symbolic, easily captured by probability-based transformer architectures, whereas images are high-dimensional and spatial, requiring complex visual understanding. The researchers argue that human creativity is deeply embedded in evolutionary biology, involving real-world interaction, contextual awareness, and perceptual skills that AI currently lacks. The improvement seen with human-guided prompting indicates that AI can exhibit creative abilities when properly directed but does not yet achieve autonomous human-level performance. This s the notion of AI as an independent creative agent and underscores the importance of human-AI collaboration in creative workflows.
Despite its robust design, the study acknowledges several limitations. The sample sizes, while adequate, were relatively modest, particularly for the human artist and non-artist groups. The research focused on a specific type of visual creativity task—imagining from abstract stimuli—which may not capture all aspects of visual creativity, such as narrative or emotional expression. Additionally, the use of a single AI model (Stable Diffusion) and a single AI rater (GPT-4o) limits generalizability to other architectures or multimodal systems. The paper notes that future studies should explore a wider range of AI models, more diverse creative tasks, and longitudinal assessments to track how these gaps evolve with advancing technology. Nevertheless, the study provides a critical benchmark for evaluating AI creativity beyond verbal tasks, highlighting the need for broader, more nuanced measures that account for the complexity of human creative cognition.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn