Rotational Invariance Makes AI Models 96% Smaller

TL;DR

New activation functions let neural networks drop up to 96% of their parameters using change-of-basis pruning, with almost no accuracy loss.

In the relentless pursuit of more efficient artificial intelligence, model pruning has emerged as a critical technique for slashing computational demands without sacrificing performance. Structured pruning, which removes entire neurons or channels from neural networks, faces a fundamental : importance is often diffusely spread across the representation space, making it difficult to identify what can be safely eliminated. This distribution problem has limited how aggressively models can be compressed before accuracy plummets, creating a bottleneck in the deployment of AI systems on resource-constrained devices. The paper 'Change-of-Basis Pruning via Rotational Invariance' introduces a novel approach that addresses this by leveraging geometric transformations to concentrate importance into specific dimensions, enabling far more aggressive pruning. By redesigning neural network architectures to be invariant to these transformations, the researchers demonstrate that models can maintain high accuracy even when stripped of the vast majority of their parameters, potentially revolutionizing how we build and deploy efficient AI.

At the heart of this innovation lies the introduction of two-subspace radial activations (TSRAs), a new family of activation functions designed to be invariant to orthogonal linear transformations applied independently within two separate subspaces. This rotational invariance is crucial because it allows change-of-basis (CoB) transformations—essentially rotations of the activation space—to be merged into surrounding weights without adding extra parameters or computational overhead. ology involves modifying standard architectures like VGG-16 by replacing non-invariant components; for instance, max-pooling is swapped for average-pooling, and BatchNorm is replaced with unlearned RMSNorm to maintain compatibility. In experiments on the CIFAR-10 dataset, the researchers trained models with TSRAs using the AdamW optimizer, as these activations proved less effective with standard SGD, and employed activation-magnitude importance scores computed via L2-norms across sampled data. The CoB transformations were applied using PCA to align dimensions with maximal variance, concentrating importance in a way that makes structured pruning more effective and reliable.

From layerwise fixed-ratio pruning are striking, showing that CoB-augmented models maintain robustness across a wide range of compression levels. Before fine-tuning, the baseline model with ReLU activations saw accuracy drop to chance levels (10%) by 70-80% pruning, whereas the CoB approach retained over 85% accuracy at 70% pruning and 72.5% at 80% pruning, representing improvements of up to 75.1 percentage points. After fine-tuning, CoB consistently outperformed the baseline, with accuracy gains widening at higher prune ratios—for example, +14.15 points at 95% pruning, where the model retained 64.55% accuracy despite having only 1.7M parameters left. Under threshold-based pruning strategies, CoB enabled even more extreme compression, removing 90-96% of parameters while keeping accuracy drops to just 1-6% below the unpruned baseline. For instance, with proportion-of-maximum thresholding, pruning 96.52% of parameters resulted in only a 3.18-point accuracy loss, underscoring how CoB reshapes activation distributions to concentrate importance into fewer dimensions.

These have profound for the future of AI deployment, particularly in edge computing and mobile applications where computational resources are scarce. By extending the reliable pruning frontier from roughly 30% to 70% of parameters without fine-tuning, and enabling up to 96% compression with minimal post-prune recovery, this approach could drastically reduce the energy and memory footprints of neural networks. The ability to maintain high accuracy under such aggressive sparsification suggests that rotational invariance could become a cornerstone of efficient model design, influencing everything from large language models to real-time vision systems. Moreover, the proof-of-concept with TSRAs opens the door to exploring other invariant activation families, potentially leading to even greater gains in model compressibility and performance in resource-constrained environments.

Despite these promising , the study acknowledges several limitations that temper its immediate applicability. The use of TSRAs introduces a slight accuracy drop of 4.52% compared to ReLU-based controls in unpruned models, indicating a trade-off between invariance and baseline performance. Additionally, the researchers did not perform a thorough analysis of optimal TSRA parameters or explore weight initialization schemes tailored to these activations, leaving room for improvement in future work. The modifications required for rotational invariance, such as swapping out max-pooling and BatchNorm, may not seamlessly transfer to all architectures, and the width saturation bounds of TSRAs—though higher than radial rescaling functions—could still constrain their use in very wide networks. These factors highlight that while CoB pruning via rotational invariance is a groundbreaking step, it remains an early-stage concept needing further refinement to achieve broad adoption.

Source: Rangaraju, V., & Ning, A. (2025). Change-of-Basis Pruning via Rotational Invariance.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn