EXPLAINABLE REINFORCEMENT LEARNING FOR HIGH-STAKES DECISION SYSTEMS DEVELOPING INTERPRETABLE RL MODELS FOR AUTONOMOUS VEHICLES, HEALTHCARE, OR FINANCE

Sai Srinivas Matta; Manish Bolli

doi:10.63125/53crx355

Authors

Sai Srinivas Matta Ms in CS Candidate, Campbellsville University, USA Author
Manish Bolli MS in CS Candidate, University of Central Missouri, USA Author

DOI:

https://doi.org/10.63125/53crx355

Keywords:

Explainable Reinforcement Learning, Interpretability, Trust, Accountability, High-Stakes Systems

Abstract

The study on Explainable Reinforcement Learning (XRL) for High-Stakes Decision Systems: Developing Interpretable RL Models for Autonomous Vehicles, Healthcare, or Finance had been conducted to investigate how interpretability in reinforcement learning enhances performance, trust, and accountability in critical decision-making environments. This research had reviewed and synthesized findings from 126 peer-reviewed papers spanning the past decade, focusing on the integration of explain ability mechanisms into reinforcement learning models applied to safety-critical and ethically sensitive domains. The study aimed to identify quantitative relationships between key explain ability constructs—fidelity, stability, and comprehensibility—and measurable human or system outcomes such as decision accuracy, response time, trust calibration, and accountability perception. Using a mixed quantitative framework, the research combined simulation-based performance data, human-centered evaluation metrics, and statistical modeling to assess how explainable RL architectures perform compared to non-explainable counterparts. The findings revealed that explainable reinforcement learning models consistently outperformed traditional opaque systems across all three domains. In autonomous vehicles, explanations improved driver response times and reduced intervention rates; in healthcare, they enhanced clinician confidence and treatment decision accuracy; and in finance, they improved risk-adjusted returns and investor trust. Regression and correlation analyses demonstrated that explanation fidelity strongly predicted decision accuracy, while explanation stability and comprehensibility were significant predictors of trust and accountability. Furthermore, repeated-measures ANOVA confirmed statistically significant improvements in user trust and performance under explainable conditions, supported by large effect sizes. The study also identified several persistent challenges, including the trade-off between interpretability and performance, variability in user comprehension, and limitations in real-time explanation delivery. Overall, the review and empirical analysis provided a comprehensive understanding of how explainable reinforcement learning contributes to safer, more transparent, and ethically accountable AI-driven decision systems. The insights derived from the 126 reviewed studies establish a robust foundation for developing future XRL frameworks capable of balancing performance optimization with human interpretability in complex, high-stakes environments such as autonomous vehicles, clinical systems, and financial analytics.