Abstract
The recently introduced Quantum Lego framework provides a powerful method for
generating complex quantum error correcting codes (QECCs) out of simple ones.
We gamify this process and unlock a new avenue for code design and discovery
using reinforcement learning (RL). One benefit of RL is that we can specify
\textit{arbitrary} properties of the code to be optimized. We train on two such
properties, maximizing the code distance, and minimizing the probability of
logical error under biased Pauli noise. For the first, we show that the trained
agent identifies ways to increase code distance beyond naive concatenation,
saturating the linear programming bound for CSS codes on 13 qubits. With a
learning objective to minimize the logical error probability under biased Pauli
noise, we find the best known CSS code at this task for $\lesssim 20$ qubits.
Compared to other (locally deformed) CSS codes, including Surface, XZZX, and 2D
Color codes, our $[[17,1,3]]$ code construction actually has \textit{lower}
adversarial distance, yet better protects the logical information, highlighting
the importance of QECC desiderata. Lastly, we comment on how this RL framework
can be used in conjunction with physical quantum devices to tailor a code
without explicit characterization of the noise model.