AI Detects News Quality With 87% Accuracy, Study Finds

TL;DR

A new study shows AI can reliably spot high- and low-quality news articles, outperforming most human reviewers. Here is how it works.

In an era where misinformation floods digital platforms and the sheer volume of news content overwhelms human evaluators, a groundbreaking study from Lakehead University offers a technological lifeline. Researchers have demonstrated that machine learning and deep learning models can effectively distinguish between perceived lower-quality and higher-quality news articles, achieving up to 87.44% accuracy. This work, leveraging a massive dataset of over 1.4 million English articles from Common Crawl spanning 2018 to 2024, taps into expert consensus ratings to train algorithms on linguistic patterns. By moving beyond manual fact-checking to automated quality assessment, the study addresses a critical gap in scalable content moderation, potentially empowering platforms and readers to filter noise from signal in real-time. extend from academic research tools to practical applications in news aggregation and social media, signaling a shift toward AI-driven curation in the fight against low-quality information.

Ology behind this research is both rigorous and innovative, starting with a custom web parser that extracts article text by scoring HTML sections based on structural heuristics like paragraph count and link density. This parser, designed to isolate main content while discarding irrelevant elements like footers or ads, processed millions of articles to build a clean dataset. Each article was then labeled using website-level quality scores derived from expert evaluations aggregated by Lin et al., which applied Principal Component Analysis (PCA) to create PC1 scores ranging from 0.0 (lowest quality) to 1.0 (highest quality). A binary classification was established by splitting articles at the median PC1 score of 0.8301, resulting in balanced classes of approximately 706,000 articles each. For analysis, the team extracted 194 linguistic features per article using SpaCy, covering part-of-speech tags, dependency roles, and named-entity recognition, while also fine-tuning deep learning models like DistilBERT and ModernBERT with context lengths up to 512 tokens.

From the study reveal a clear hierarchy in model performance, with deep learning significantly outpacing traditional machine learning approaches. Among baseline classifiers, Random Forest achieved the best with an accuracy of 0.7355 and a ROC-AUC of 0.8131, while Gaussian Naïve Bayes and Logistic Regression lagged behind, likely due to struggles with the high-dimensional feature set. In contrast, transformer-based models excelled: ModernBERT-large, with a 256-token context, topped the charts with an accuracy of 0.8744, a ROC-AUC of 0.9593, and an F1 score of 0.8739. DistilBERT-base also showed strong performance, improving from 0.8478 accuracy at 256 tokens to 0.8685 at 512 tokens, highlighting the benefit of longer context for capturing article nuances. These gains translate to thousands of additional articles correctly classified, with ModernBERT-large boosting accuracy by 13.89% over Random Forest and maintaining balanced precision and recall across quality classes.

Of this research are profound for both technology and society, offering a scalable tool to combat the spread of low-quality news without relying on labor-intensive human review. For researchers, the models provide a convenient way to filter and analyze large news datasets, enhancing studies on media trends or misinformation. In practical terms, platforms could integrate such AI systems to flag or deprioritize content from lower-quality sources, aiding readers in making informed choices about what they consume. However, the study carefully notes that its focus is on perceived quality based on linguistic style, not factual accuracy or reliability, distinguishing it from fake news detection efforts. This nuance is crucial, as it avoids the pitfalls of AI overreach into truth-claim adjudication while still addressing stylistic markers that correlate with expert judgments of quality.

Despite its successes, the study acknowledges several limitations that warrant caution and further exploration. The reliance on website-level labels means individual article quality may vary within domains, though the researchers argue this trade-off is necessary given the impracticality of manual article-level rating. Additionally, the models do not assess content veracity, leaving room for complementary approaches that combine stylistic analysis with fact-checking. Future work could explore multi-class quality assessments for more granular insights, extend context lengths in ModernBERT models with enhanced hardware, or test additional state-of-the-art architectures. As AI continues to evolve, this research lays a foundational step toward automated quality control in news, balancing technological promise with ethical considerations in an increasingly complex information landscape.

McElroy et al., "Classification of worldwide news articles perceived quality, 2018-2024," Lakehead University, 2024.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn