Abstract
In modern database systems, query scheduling is a critical task that involves deciding the order in which to execute a set of queued queries to optimize performance and resource utilization. However, this can be a challenging task due to the need to consider a wide range of factors, including the complexity of the queries being processed, the availability and utilization of computing resources, and the interactions between different queries. In order to address these challenges, By leveraging machine learning and artificial intelligence theory, we propose an efficient query scheduling framework that can adapt to both sequential and concurrent environments.
Our framework is designed to improve the buffer hits of executed queries, which are the number of data blocks accessed in the buffer pool of a database system. By smartly scheduling queries that take advantage of these cached data blocks instead of reading from disk, we can optimize query execution and enhance system performance. To achieve this, we apply deep reinforcement learning techniques to build a learned scheduler that understands the overlap between cached data blocks and the data access patterns of incoming queries. Our learned scheduler is able to learn a workload-specific scheduling policy that maximizes buffer hits in an adaptive and dynamic fashion. In the five years of this doctoral work, we designed and developed two systems: \system, which supports scheduling of sequential query execution, and \systemtwo, which supports scheduling for concurrent queries. Both systems use the same underlying machine-learning driven framework and their goal is to increase the buffer hits of executed queries.
We evaluated our framework using open-source database systems, PostgreSQL and MySQL, and compared its performance to traditional hand-crafted query scheduling approaches. Additionally, we built a Query Time Predictor (QTP) scheduler as a comparison baseline. Our results show that our learning-based scheduling framework is able to significantly improve query performance and throughput in both sequential and concurrent environments, outperforming both traditional scheduling approaches and the QTP scheduler. These results demonstrate the effectiveness of our approach and suggest that machine learning techniques can provide a powerful solution to the challenge of query scheduling in modern database systems. This work represents a promising intersection of machine learning and database research, and has the potential to open up new directions for optimizing query execution and building noninvasive learned components on top of database systems.
In addition, this dissertation explores the integration of multiagent systems into query scheduling to further enhance the adaptability and performance of database systems. We investigate how a collaborative scheduling strategy using multiple agents can improve overall system efficiency and present experimental results that demonstrate the viability of this approach.