AI Finds Hidden Patterns in Neural Networks

As artificial intelligence systems increasingly influence high-stakes decisions in healthcare, finance, and criminal justice, understanding how they reach conclusions becomes critical. Current explanation methods often fail to capture how multiple features interact, leaving practitioners unable to answer crucial questions about bias, redundancy, and causality. The Modules of Influence (MoI) framework addresses this gap by transforming individual feature attributions into a network that reveals collaborative feature groups, enabling more actionable and transparent AI auditing.

Researchers discovered that AI models frequently rely on coordinated groups of features rather than isolated variables. By constructing explanation graphs from standard attribution methods like SHAP and LIME, the MoI approach applies community detection algorithms to identify modules—sets of features that consistently activate together. These modules capture higher-order structure that traditional flat attribution lists miss, revealing patterns such as income-related features clustering separately from education and occupation groups in socioeconomic datasets.

The methodology begins with per-instance feature attributions collected into a matrix. Researchers compute co-influence weights between features using measures like magnitude-cosine similarity or correlation, then sparsify the resulting graph to retain only the strongest connections. Community detection algorithms like Leiden or Infomap partition the graph into modules, with hyperparameters selected to maximize stability across data resamples. The approach supports multiple affinity definitions and works with various attribution methods, making it flexible across different AI systems.

Results across synthetic and real-world datasets show MoI successfully recovers planted modules with higher accuracy than baseline methods. In fairness-focused applications, the framework identifies modules with elevated Bias Exposure Index (BEI) values—quantifying how specific feature groups mediate disparities between demographic groups. Targeted interventions on these high-BEI modules reduced equalized-odds gaps by up to 23% with minimal impact on overall accuracy. The method also enables dimensionality compression, with module-aggregated representations maintaining 91% of original predictive performance while reducing feature dimensions from 128 to 18.

This modular perspective matters because it shifts AI explanation from individual features to actionable groups. Practitioners can now identify problematic modules for regularization, prioritize data collection for underrepresented feature combinations, or compress models without significant performance loss. The approach makes AI auditing more practical by providing concrete intervention points—attenuating entire modules rather than tweaking dozens of individual features.

Limitations include dependence on attribution method choices and background data selection, which can affect module stability. The framework generates hypotheses about feature interactions but requires follow-up causal validation to establish mechanisms. Researchers caution against interpreting module influence as causal without additional experimental evidence and note that hard ablations may produce unrealistic inputs when masking feature groups.

The MoI framework represents a significant step toward module-centric explainable AI, providing tools to discover, quantify, and intervene on feature groups that collectively influence model predictions. By making these collaborative patterns visible and actionable, the method supports more transparent and trustworthy AI deployment across critical domains.

AI Finds Hidden Patterns in Neural Networks

About the Author

Guilherme A.