Abstract
This paper studies a discrete-time mean-variance model based on reinforcement
learning. Compared with its continuous-time counterpart in \cite{zhou2020mv},
the discrete-time model makes more general assumptions about the asset's return
distribution. Using entropy to measure the cost of exploration, we derive the
optimal investment strategy, whose density function is also Gaussian type.
Additionally, we design the corresponding reinforcement learning algorithm.
Both simulation experiments and empirical analysis indicate that our
discrete-time model exhibits better applicability when analyzing real-world
data than the continuous-time model.