Abstract
As the planning and operation of modern power systems typically require repetitive power flow analysis on numerous scenarios, deep learning (DL) applications to power flow analysis have gained momentum. However, DL models require a large volume of training data to achieve satisfactory performance. In power systems, it is very challenging to collect enough training samples from the real world, so numerical simulations must be performed to generate enough training samples. While much effort has been devoted to the improvement of DL model design, little attention has been paid to the curation of training data. To tackle this challenge and save data curation cost, we propose a data-centric approach to effectively select a small amount of beneficial samples to boost DL performance. It utilizes and improves the powerful tool of influence function to estimate the influence of training data samples on DL performance with high computational efficiency, then uses this information to guide the generation of beneficial data samples. The proposed data-centric learning approach is materialized on a physics-informed graph neural network (GNN) model for power flow analysis. Simulation results on the IEEE 300-bus test power system demonstrate the effectiveness of our proposed method over the traditional way.