A Reproducible Baseline for Forecasting High-Frequency Realized Volatility with Order-Flow Features

Authors

  • Zhangqi Liu Brown University

DOI:

https://doi.org/10.70393/6a6574626d.333538

ARK:

https://n2t.net/ark:/40704/JETBM.v2n6a02

Disciplines:

Finance

Subjects:

Corporate Finance

References:

28

Keywords:

Realized Volatility, Limit Order Book, Order-flow Imbalance, Gradient Boosting, Monotone Constraints, Explainable Machine Learning, SHAP, Time-series Cross-validation

Abstract

This paper proposes an interpretable and reproducible baseline model for predicting high-frequency realized volatility based on limit order book numbers, balancing methodological rigor with the requirements of regulated financial practices. We combine economically motivated covariates with a monotonic constrained gradient boosting model that encodes directional prior information based on microstructure theory. The evaluation scheme integrates rolling windows and time-constrained k-fold cross-validation to assess its cross-domain performance. Nevertheless, some sensitivity to mechanism shifts may remain, and excluding news or cross-platform signals may limit coverage, indicating a need for further research.

Author Biography

Zhangqi Liu, Brown University

Brown University, 02912, USA.

References

[1] Mettu, V. A. (2025). Finance Trading Algorithms in High-Frequency Markets: Predictive Modeling, Reinforcement Learning, and Real Time Anomaly Detection. International Journal of Computer Technology and Electronics Communication, 8(5), 11335-11347.

[2] Yin, M. (2025). Predictive Maintenance of Semiconductor Equipment Using Stacking Classifiers and Explainable AI: A Synthetic Data Approach for Fault Detection and Severity Classification. Journal of Industrial Engineering and Applied Science, 3(6), 36-46.

[3] Ren, L. (2025). Reinforcement Learning for Prioritizing Anti-Money Laundering Case Reviews Based on Dynamic Risk Assessment. Journal of Economic Theory and Business Management, 2(5), 1-6.

[4] Liu, Z. (2025). Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges. ICCK Transactions on Emerging Topics in Artificial Intelligence, 2(4), 173-181.

[5] Sun, Y., & Ortiz, J. (2024). An ai-based system utilizing iot-enabled ambient sensors and llms for complex activity tracking. arXiv preprint arXiv:2407.02606.

[6] Jaddu, K. S., & Bilokon, P. A. (2023). Combining deep learning on order books with reinforcement learning for profitable trading. arXiv preprint arXiv:2311.02088.

[7] Huang, S. (2025). LSTM-Based Deep Learning Models for Long-Term Inventory Forecasting in Retail Operations. Journal of Computer Technology and Applied Mathematics, 2(6), 21-25.

[8] Yin, M. (2025). Data Quality Control in Semiconductor Manufacturing through Automated ETL Processes and Class Imbalance Handling Techniques. Journal of Industrial Engineering and Applied Science, 3(6), 13-22.

[9] Chen, Y. (2025). A Comparative Study of Machine Learning Models for Credit Card Fraud Detection. Academic Journal of Natural Science, 2(4), 11-18.

[10] Li, K., Chen, X., Song, T., Zhang, H., Zhang, W., & Shan, Q. (2024). GPTDrawer: Enhancing Visual Synthesis through ChatGPT. arXiv preprint arXiv:2412.10429.

[11] Luo, M., Zhang, W., Song, T., Li, K., Zhu, H., Du, B., & Wen, H. (2021, January). Rebalancing expanding EV sharing systems with deep reinforcement learning. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence (pp. 1338-1344).

[12] Yin, M. (2025). A Data-Driven Approach for Real-Time Bottleneck Detection and Optimization in Semiconductor Manufacturing Using Active Period Method and Visualization. Academic Journal of Natural Science, 2(4), 19-26.

[13] Wu, H., Liu, J., Zha, Z. J., Chen, Z., & Sun, X. (2019, August). Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition. In IJCAI (pp. 968-974).

[14] Chen, Y. (2025). Generative Diffusion Models for Option Pricing: A Novel Framework for Modeling Volatility Dynamics in US Financial Markets. Journal of Industrial Engineering and Applied Science, 3(6), 23-29.

[15] Wu, H., Zha, Z. J., Wen, X., Chen, Z., Liu, D., & Chen, X. (2019, October). Cross-fiber spatial-temporal co-enhanced networks for video action recognition. In Proceedings of the 27th ACM international conference on multimedia (pp. 620-628).

[16] Luo, M., Du, B., Zhang, W., Song, T., Li, K., Zhu, H., ... & Wen, H. (2023). Fleet rebalancing for expanding shared e-Mobility systems: A multi-agent deep reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 24(4), 3868-3881.

[17] Wang, H., Li, Q., & Liu, Y. (2024). Multi-response Regression for Block-missing Multi-modal Data without Imputation. Statistica Sinica, 34(2), 527.

[18] Lee, J. Y. J., Bonab, H., Zalmout, N., Zeng, M., Lokegaonkar, S., Lockard, C., ... & Wang, H. (2025, August). DocTalk: Scalable graph-based dialogue synthesis for enhancing LLM conversational capabilities. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 658-677).

[19] Scott, R. (2024). A Comparative Study of Classical and Quantum Machine Learning for Large-Scale Financial Forecasting. Robotics, Autonomous, Machine Learning, and Artificial intelligence Journal, 3(1), 1-14.

[20] Han, C. (2025). Can Language Models Follow Multiple Turns of Entangled Instructions?. arXiv preprint arXiv:2503.13222.

[21] Pang, F. (2020, November). Research on Incentive Mechanism of Teamwork Based on Unfairness Aversion Preference Model. In 2020 2nd International Conference on Economic Management and Model Engineering (ICEMME) (pp. 944-948). IEEE.

[22] Jaddu, K. S., & Bilokon, P. A. (2024). Deep Learning with Reinforcement Learning on Order Books. Journal of Financial Data Science, 6(1).

[23] Pang, F. (2025). Animal Spirit, Financial Shock and Business Cycle. European Journal of Business, Economics & Management, 1(2), 15-24.

[24] Wang J, Cao S, Tim K T, et al. A novel life-cycle analysis framework to assess the performances of tall buildings considering the climate change[J]. Engineering Structures, 2025, 323: 119258.

[25] Yin, M. (2025). Defect Prediction and Optimization in Semiconductor Manufacturing Using Explainable AutoML. Academic Journal of Natural Science, 2(4), 1-10.

[26] Ren, L. (2025). Boosting algorithm optimization technology for ensemble learning in small sample fraud detection. Academic Journal of Engineering and Technology Science, 8(4), 53-60.

[27] Wang J, Tse T K T, Li S, et al. A model of the sea–land transition of the mean wind profile in the tropical cyclone boundary layer considering climate changes[J]. International Journal of Disaster Risk Science, 2023, 14(3): 413-427.

[28] Samuel, A. A. (2024). Deep Learning vs. Financial Fraud Real-Time Detection in High-Frequency Trading. Journal of Science, Technology and Engineering Research, 2(4), 28-40.

Downloads

Published

2025-12-21

How to Cite

Liu, Z. (2025). A Reproducible Baseline for Forecasting High-Frequency Realized Volatility with Order-Flow Features. Journal of Economic Theory and Business Management, 2(6), 8–18. https://doi.org/10.70393/6a6574626d.333538

Issue

Section

Articles

ARK