AI Flags Construction Hazards Without Training Data

TL;DR

This AI method combines text and images to spot job site safety risks with high accuracy, no fine-tuning needed, cutting accident prevention costs.

Construction sites are among the most dangerous workplaces, with falls, equipment strikes, and electrocutions causing numerous injuries and fatalities each year. Traditional safety checks rely on manual inspections, which are slow and can miss hidden risks, but a new AI approach could change that. This research introduces a multimodal framework that combines large language and vision-language models to automatically detect hazards from accident reports and site images, providing a faster, data-driven way to enhance safety without the need for expensive training or specialized datasets.

In the first case study, the researchers used GPT-4o-mini to analyze OSHA accident reports, extracting details like injury types and causes, and classifying incidents into 43 categories such as falls or electrocutions. The model processed 100 reports from a larger dataset of 28,000, achieving 89% accuracy in categorizing accidents based on a taxonomy derived from safety standards. This textual pipeline transformed unstructured narratives into structured data, identifying common hazards like falls from roofs or struck-by objects, which helps in understanding patterns that lead to accidents.

For the visual analysis, the team employed GPT-4o Vision to examine construction site images through a step-by-step reasoning process. Starting with scene descriptions, the model predicted accident scenarios, filtered high-risk hazards, and localized objects like missing personal protective equipment or dangerous tools using bounding boxes. In tests on 10 images, it correctly identified hazards such as suspended chains posing struck-by risks or workers on roofs without fall protection, with a self-review mechanism refining the annotations for better accuracy.

The second case study explored cost-effective alternatives using open-source models Molmo-7B and Qwen2-VL-2B, evaluated on the ConstructionSite10k dataset for PPE compliance. By testing 10 semantically equivalent prompts per image and using majority voting, Qwen2-VL-2B achieved a precision of 67.2%, recall of 98.0%, and F1 score of 72.6%, outperforming larger models like GPT in rule-based checks. This shows that lightweight models, when paired with clear, focused prompts, can deliver reliable safety assessments, making AI tools accessible for smaller firms with limited resources.

This framework's are significant for improving workplace safety, as it allows safety officers to quickly analyze historical data and real-time images to spot risks without technical expertise. By avoiding the need for fine-tuning or large annotated datasets, it reduces barriers to adoption, potentially integrating with existing systems like BIM for automated inspections. However, limitations include sensitivity to report formats and prompt wording, as well as the small sample sizes tested, which may affect generalization across diverse sites. Future work could expand to real-time video analysis and broader datasets to enhance reliability.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn