Offline Conservative RL for Transaction Authorization: Smartly Balancing Fraud Risk and Customer Friction
DOI:
https://doi.org/10.70393/6a6574626d.333932ARK:
https://n2t.net/ark:/40704/JETBM.v3n1a01Disciplines:
Business AnalyticsSubjects:
Econometric ModelingReferences:
20Keywords:
Offline Reinforcement Learning, Cost-Sensitive Credit Risk Optimization, User-Centric Financial Decision Systems, Conservative Q-Learning CQLAbstract
This study instantiates credit strategy optimization at the transaction authorization layer, with actions approve, review, and decline. Within an Offline Conservative RL (CQL) framework, we co-optimize fraud loss, operational burden from manual reviews, and customer friction from false positives and delays via a unified multi-objective cost function. Using a public credit-card transaction dataset with severe class imbalance, the learned policy improves total cost relative to cost-sensitive supervised baselines, while offering favorable trade-offs along a Pareto frontier between risk, operations, and friction. We detail the MDP design (state featurization, action space, and cost weights) and show that CQL mitigates out-of-distribution overestimation in offline settings. The results indicate that conservative RL is a practical path for transaction-level credit decision-making that balances fraud risk with operational efficiency and user impact.
References
[1] Khraishi, R., & Okhrati, R. (2022, November). Offline deep reinforcement learning for dynamic pricing of consumer credit. In Proceedings of the Third ACM International Conference on AI in Finance (pp. 325–333).
[2] So, M. M., & Thomas, L. C. (2011). Modelling the profitability of credit cards by Markov decision processes. European Journal of Operational Research, 212(1), 123–130.
[3] Sewak, M. (2019). Temporal difference learning, SARSA, and Q-learning: Some popular value approximation-based reinforcement learning approaches. In Deep reinforcement learning: Frontiers of artificial intelligence (pp. 51–63). Springer.
[4] Sha, F., Ding, C., Zheng, X., et al. (2025). Weathering the policy storm: How trade uncertainty shapes firm financial performance through innovation and operations. International Review of Economics & Finance, 104274.
[5] Deng, X. (2025). Cooperative optimization strategies for data collection and machine learning in large-scale distributed systems. In 2025 4th International Symposium on Computer Applications and Information Technology (ISCAIT) (pp. 2151–2154). IEEE.
[6] Trench, M. S., Pederson, S. P., Lau, E. T., Ma, L., Wang, H., & Nair, S. K. (2003). Managing credit lines and prices for Bank One credit cards. Interfaces, 33(5), 4–21.
[7] Wiesemann, W., Kuhn, D., & Rustem, B. (2013). Robust Markov decision processes. Mathematics of Operations Research, 38(1), 153–183.
[8] Tan, C., Gao, F., Song, C., Xu, M., Li, Y., & Ma, H. (2024). Highly reliable CI-JSO based densely connected convolutional networks using transfer learning for fault diagnosis. Journal of Information Systems Engineering and Management. https://doi.org/10.52783/jisem.v10i4.12207
[9] Tan, C., Gao, F., Song, C., Xu, M., Li, Y., & Ma, H. (2024). Proposed damage detection and isolation from limited experimental data based on a deep transfer learning and an ensemble learning classifier. Journal of Information Systems Engineering and Management. https://doi.org/10.52783/jisem.v10i4.12206
[10] Han, X., & Dou, X. (2025). User recommendation method integrating hierarchical graph attention network with multimodal knowledge graph. Frontiers in Neurorobotics, 19, 1587973.
[11] Zhuang, R. (2025). Evolutionary logic and theoretical construction of real estate marketing strategies under digital transformation. Economics and Management Innovation, 2(2), 117–124.
[12] Yang, Z., et al. (2025). RLHF fine-tuning of LLMs for alignment with implicit user feedback in conversational recommenders. arXiv. https://arxiv.org/abs/2508.05289
[13] Deng, X., & Yang, J. (2025). Multi-layer defense strategies and privacy-preserving enhancements for membership reasoning attacks in a federated learning framework. In 2025 5th International Conference on Computer Science and Blockchain (CCSB) (pp. 278–282). IEEE.
[14] Tan, C. (2024). The application and development trends of artificial intelligence technology in automotive production. Artificial Intelligence Technology Research, 2(5).
[15] Zhang, L., & Meng, Q. (2025, September). User portrait-driven smart home device deployment optimization and spatial interaction design. In 2025 5th International Conference on Artificial Intelligence, Automation and High Performance Computing (AIAHPC) (pp. 724–728). IEEE.
[16] Yang, H., Tian, Y., Yang, Z., Wang, Z., Zhou, C., & Li, D. (2025). Research on model parallelism and data parallelism optimization methods in large language model-based recommendation systems. arXiv. https://arxiv.org/abs/2506.17551
[17] Gonzalez, J., Tran, V., Meredith, J., Xu, I., Penchala, R., Vilar-Ribó, L., et al. (2025). How it begins: Initial response to opioids strongly predicts self-reported opioid use disorder. medRxiv.
[18] Wozabal, D., & Hochreiter, R. (2012). A coupled Markov chain approach to credit risk modeling. Journal of Economic Dynamics and Control, 36(3), 403–415.
[19] Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33, 1179–1191.
[20] Mendonca, R., Geng, X., Finn, C., & Levine, S. (2020). Meta-reinforcement learning that is robust to distributional shifts via model identification and experience relabeling. arXiv. https://arxiv.org/abs/2006.07178
Downloads
Published
How to Cite
Issue
Section
ARK
License
Copyright (c) 2026 The author retains copyright and grants the journal the right of first publication.

This work is licensed under a Creative Commons Attribution 4.0 International License.








