Abstract
Recent advances in the alignment of large language models (LLMs) toward human preference and values have dramatically expanded the capabilities of artificial intelligence in natural language understanding and generation. However, despite their impressive performance , these models often lack the reflective and deliberative qualities necessary for effective human-AI collaboration. Traditional policy optimization methods, such as Reinforcement Learning from Human Feedback (RLHF), Proximal Policy Optimization (PPO), or Direct Preference Optimization (DPO), primarily focus on maximizing task-related rewards or aligning outputs with human preferences. These approaches, however, tend to neglect the critical epistemic dimension of alignment: the ability of an AI system to reason about, question, and update its underlying beliefs. In this paper, we propose a novel framework termed Frictive Policy Optimization (FPO), which explicitly incorporates " friction " as a desirable property in the policy optimization process for LLMs. Beyond fostering reflective deliberation, our approach also challenges the conventional expectation that autonomous agents must always comply with human commands. By integrating mechanisms that incentivize appropriate non-compliance, what we term " beneficial disobedience " , FPO equips AI systems with the capacity to question potentially harmful or ill-advised instructions. This dual focus on epistemic alignment and responsible disobedience paves the way for more robust, safe, and collaborative human-AI interactions.