Abstract
Accurate free energy simulations based on combined quantum mechanical and molecular mechanical (QM/MM) potentials are essential for understanding reaction mechanisms in complex environments. Achieving ab initio QM/MM accuracy at the cost of more affordable semiempirical QM/MM methods, thereby enabling efficient sampling, remains a major challenge. To address this, we previously introduced a Δ-machine-learning approach employing Gaussian process regression (GPR) with QM-solute-based molecular descriptors. Here, we extend this approach by using atomic environment descriptors constructed from atom-centered symmetry functions, which incorporate MM-solvent contributions into the GPR input features. Molecular similarity is inferred through a system-specific sum kernel. We trained our models using both an energy-only GPR scheme and a GPR with derivative observation (GPRwDO) scheme that incorporates force information with heteroscedastic noise. On-the-fly model deployment in Chemistry at HARvard Macromolecular Mechanics (CHARMM)-based molecular dynamics simulations is enabled through a GPflow/pyCHARMM interface. We evaluated these approaches on the solution-phase SN2 Menshutkin reaction, using AM1/MM and B3LYP/MM as the base and target levels. The optimized models reduce AM1/MM potential energy errors from ∼13.1 to 1.4 (energy-only GPR) and 2.2 (GPRwDO) kcal/mol, with the corresponding force errors reduced from ∼14.6 to 4.4 and 2.1 (kcal/mol)/Å. The energy-only GPR model predicts a free energy barrier of 14.3 and a reaction free energy of -30.2 kcal/mol, whereas the GPRwDO model predicts 12.7 and -28.7 kcal/mol, both in excellent agreement with high-level benchmarks. Analyses of free energy paths, potentials of mean force, internal forces, and radial distribution functions reveal broad improvements in energetics, force description, and solvation structure. The AM1-GPR(wDO)/MM approaches reach target-level accuracy with an ∼100-fold acceleration.