Universitat Politècnica de Catalunya. Doctorat en Física Computacional i Aplicada
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
Barcelona Supercomputing Center
2025-05-28
We propose a novel Reinforcement Learning (RL) method for optimizing quantum circuits using graph-theoretic simplification rules of ZX-diagrams. The agent, trained using the Proximal Policy Optimization (PPO) algorithm, employs Graph Neural Networks to approximate the policy and value functions. We demonstrate the capacity of our approach by comparing it against the best performing ZX-Calculus-based algorithm for the problem in hand. After training on small Clifford+T circuits of 5-qubits and few tenths of gates, the agent consistently improves the state-of-the-art for this type of circuits, for at least up to 80-qubit and 2100 gates, whilst remaining competitive in terms of computational performance. Additionally, we illustrate the versatility of the agent by incorporating additional optimization routines on the workflow during training, improving the two-qubit gate count state-of-the-art on multiple structured quantum circuits for relevant applications of much larger dimension and different gate distributions than the circuits the agent trains on. This conveys the potential of tailoring the reward function to the specific characteristics of each application and hardware backend. Our approach is a valuable tool for the implementation of quantum algorithms in the near-term intermediate-scale range (NISQ).
A.G-S received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951911 (AI4Media). This work was supported by the Agència de Gestió d’Ajuts Universitaris i de Recerca through the DI grant (No. 2020-DI00063) and by MICIU/AEI/10.13039/501100011033/ FEDER, UE.
Peer Reviewed
Postprint (published version)
Article
English
Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Aplicacions informàtiques a la física i l‘enginyeria; Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic; Reinforcement Learning (RL); Quantum circuits; Proximal Policy Optimization (PPO); Graph Neural Networks
https://quantum-journal.org/papers/q-2025-05-28-1758/
info:eu-repo/grantAgreement/EC/H2020/951911/EU/A European Excellence Centre for Media, Society and Democracy/AI4Media
http://creativecommons.org/licenses/by/4.0/
Open Access
Attribution 4.0 International
E-prints [73012]