QASER: How Quantum Circuits Stay Accurate at Greater Depth

TL;DR

QASER is a new method that lets quantum circuits run deeper without losing accuracy, solving a core bottleneck in practical quantum computing.

Quantum computing has long promised to solve problems beyond the reach of classical machines, but a persistent bottleneck has been the trade-off between circuit depth and accuracy. Deeper circuits often yield higher precision for tasks like quantum chemistry simulations, yet they are more vulnerable to errors in today's noisy hardware, where gate fidelity decays with each additional operation. This dilemma has stalled progress toward practical quantum applications, as existing s force a choice between computational reliability and performance. Now, researchers from Aalto University and the University of Helsinki have introduced QASER, a reinforcement learning (RL) framework that shatters this compromise by engineering a novel reward function. Their approach, detailed in a recent preprint, enables the automatic design of quantum circuits that are both shallower and more accurate, marking a potential leap forward for scalable quantum architecture search (QAS) in the post-NISQ era.

At the heart of QASER is a sophisticated reward engineering strategy that addresses key limitations in prior RL-based QAS s. Traditional approaches, such as those using sparse or linear rewards, often optimize for a single metric like energy minimization, leading to circuits that are either too deep or insufficiently accurate. For instance, earlier RL rewards might penalize excessive depth only after a hard threshold is crossed, encouraging agents to exploit small energy gains by adding redundant gates—a phenomenon known as reward hacking. QASER overcomes this by integrating multiple costs—energy, depth, and gate count—into an exponential reward function that tracks historical maxima of these metrics. Specifically, the reward is defined as R(st) = exp((E_min / E(st)) + (M_D,t / (D(st) + 1)) + (M_C,t / (C(st) + 1))), where E(st) is the estimated energy, D(st) is depth, C(st) is gate cost, and M_f,t are the worst-case values encountered during training. This max-tracking mechanism ensures that the RL agent is rewarded for simultaneous improvements across all dimensions, fostering stable learning and preventing overfitting to any one cost.

From benchmarking QASER on quantum chemistry problems are striking, demonstrating substantial gains in both efficiency and accuracy. In noiseless simulations for molecules like 6-LiH and 8-H2O, QASER achieved average errors as low as 6.5e-5 and 4.3e-4, respectively, outperforming the state-of-the-art CRLQAS by up to an order of magnitude in some cases. Crucially, it did so while reducing resource demands: for the 10-H2O system, QASER used an average of 103.1 CNOT gates and a depth of 81.3, compared to CRLQAS's 112.5 gates and 86.6 depth, representing a 20% reduction in 2-qubit gates and depth without sacrificing precision. In noisy environments with depolarizing error models, QASER maintained faster convergence and higher reward signals, reaching error thresholds around 8e-3 versus CRLQAS's 5e-2 after training. Additionally, when applied to warm-start scenarios with TensorRL-QAS, QASER's exponential reward led to smoother error reduction and more compact circuits, achieving competitive accuracy with the shallowest depths observed, such as just 4 layers for certain configurations.

Of QASER extend broadly across quantum computing, particularly as the field moves toward fault-tolerant systems. By decoupling depth from accuracy, this framework could accelerate applications in quantum chemistry, error correction, and optimization, where efficient circuits are critical for mitigating noise in real hardware. The authors highlight that QASER's reward engineering—rather than algorithmic overhauls—enables scalable QAS for up to 20-qubit systems, as demonstrated in warm-start tests. This approach not only reduces computational overhead by minimizing simulator queries but also enhances robustness, making it suitable for the error-prone NISQ devices and beyond. As quantum hardware evolves, tools like QASER could become essential for automating circuit design, potentially shortening development cycles for quantum algorithms and bringing practical quantum advantage closer to reality.

Despite its promising , QASER has limitations that warrant further investigation. The framework's performance relies heavily on proper initialization of the max-tracking variables, and suboptimal settings could lead to unstable training or slower convergence. Moreover, while tested on quantum chemistry benchmarks, its applicability to other domains like cryptography or machine learning remains unverified. The computational demands of RL training, though mitigated by QASER's efficiency, still require significant GPU resources, as seen in the experiments using AMD MI250X GPUs. Future work could explore adaptive reward scaling or integration with other optimization techniques to handle larger qubit counts and diverse noise models. Nonetheless, QASER represents a significant step in reward engineering for quantum RL, underscoring that nuanced reward design can drive breakthroughs where architectural changes alone fall short.

Reference: Moflic et al., 2025, arXiv preprint.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn