Abstract
This article provides an overview of our research on data exploration. Our work aims to facilitate
interactive exploration tasks in many big data applications in the scientific, biomedical and healthcare
domains. We argue for a shift towards learning-based exploration techniques that automatically steer
the user towards interesting data areas based on relevance feedback on database samples, aiming to
achieve the goal of identifying all database objects that match the user interest with high efficiency.
Our research realizes machine learning theory in the new setting of interactive data exploration and
develops new optimizations to support “automated” data exploration with high performance over large
databases. In this paper, we discuss a suite of techniques that draw insights from machine learning
algorithms to guide the exploration of a big data space and leverage the knowledge of exploration
patterns to optimize query processing inside the database.