Quantum computers promise to solve problems beyond the reach of classical machines, but a fundamental question remains: how do we know when they are truly behaving quantumly? Researchers have developed a new benchmark that treats a quantum computer as a single, large quantum object, testing its ability to maintain quantum behavior as it scales up. This approach, based on a foundational concept called macrorealism, provides a direct measure of a quantum computer's 'quantumness'—the very property that gives it an edge over classical systems. By applying this test to current hardware, the team has not only pushed the limits of quantum verification but also offered a practical tool for comparing different machines as they grow in size and complexity.
The core finding is that quantum computers undergo a quantum-to-classical transition as more qubits are involved. In ideal conditions, the violation of macrorealism—a test that distinguishes quantum from classical behavior—should be independent of the number of qubits. However, in real quantum computers with noise and imperfections, this violation decreases as the qubit count increases, eventually dropping to zero. The researchers introduced a metric called the NDC Performance Metric, which identifies the maximum number of qubits, denoted NNDC, for which a quantum computer can still demonstrate non-classical behavior with high confidence. For example, on IBM's ibm_marrakesh quantum computer, this metric reached up to 38 qubits using one , a significant leap from previous tests limited to just 5 qubits.
To implement this benchmark, the team designed two s that involve consecutive parity measurements on groups of qubits. Parity measurements check whether the total state of multiple qubits is even or odd, a collective property that probes quantum coherence across the entire system. The first , called the H-, uses a classically controlled Hadamard gate to toggle an intermediate measurement on and off, minimizing unwanted classical disturbances. The second, the M-, involves a mid-circuit measurement that directly tests the irreversible collapse of the quantum state, a key process for error correction in future quantum computers. Both s were optimized to run efficiently on hardware with linear nearest-neighbor connectivity, using circuits with depths proportional to the number of qubits and a universal set of gates, including Clifford+T operations essential for fault-tolerant computing.
, Detailed in Figures 3 and 4 of the paper, show a clear decline in quantum behavior with increasing qubit numbers. For instance, on ibm_marrakesh, the NDC violation dropped from around 0.2 for small qubit counts to near zero by 38 qubits for the H-, and to zero by 17 qubits for the M-. Similarly, on the older ibm_brisbane quantum computer, the violations vanished at 14 and 6 qubits for the two s, respectively. This three-fold improvement between generations highlights the progress in hardware quality. Crucially, the researchers ensured their tests were free from the 'clumsiness loophole,' where classical noise could mimic quantum effects, by reducing classical disturbances to statistical noise levels, as confirmed by control experiments at specific rotation angles where quantum predictions yield zero violation.
Of this work are twofold: it provides a foundational benchmark for assessing quantum computers as they scale, and it advances tests of quantum mechanics itself. By measuring collective quantum coherence and mid-circuit measurement fidelity, the protocol checks essential requirements for scalable, universal quantum computation. For the broader field, it offers a standardized way to compare different quantum platforms, potentially guiding improvements in coherence times and gate errors. Moreover, by demonstrating macrorealism violation with up to 38 qubits—an order of magnitude beyond prior limits—it pushes experimental tests of quantum theory into more macroscopic regimes, exploring the boundary between quantum and classical physics.
Despite these advances, the study has limitations. The tests were conducted on specific IBM quantum computers, and may vary across different hardware platforms due to variations in noise profiles and connectivity. s assume ideal conditions for certain parameters, such as using a rotation angle of π/4 to achieve N-independent violation, which might not hold perfectly in all implementations. Additionally, while the protocols are designed to be scalable, their performance in future fault-tolerant systems with logical qubits remains to be validated. The researchers note that error mitigation was not applied, focusing on raw hardware performance, which means the metrics reflect current technological limits rather than optimized outcomes.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn