Empirical Evaluation of Large Language Models for Asset‑Return Prediction

Bingxing Wang

doi:10.70393/616a736d.333035

Authors

Bingxing Wang Shanghai Jingzhuo Investment Management Co., Ltd.

DOI:

https://doi.org/10.70393/616a736d.333035

ARK:

https://n2t.net/ark:/40704/AJSM.v3n4a03

Disciplines:

Economics

Subjects:

Behavioral Economics

References:

47

Keywords:

Large Language Models, Asset‑return Prediction, Textual‑sentiment Factor, Machine Learning, Information Ratio, Interpretability

Abstract

In an era of exploding financial‐market information and rapid algorithmic iteration, traditional asset‑return forecasting models struggle to exploit unstructured text. Using cross‑asset data—equities, Treasuries and commodity futures—from 2004 to 2024, we build an integrated prediction framework that fuses semantic factors extracted by Large Language Models (LLMs) with price‑volume and macro‑numerical factors. We benchmark it against Logit, Random Forest, LightGBM and bidirectional LSTM. A comprehensive evaluation with weighted F₁, ROC‑AUC, Information Ratio and Sharpe Ratio shows that (i) LLM‑based semantic factors significantly improve directional accuracy (F₁ + 20.5 %, ROC‑AUC + 11.9 %); (ii) after a 3 bp transaction cost, the LLM‑driven long–short portfolio achieves annualised information and Sharpe ratios of 0.96 and 1.17, markedly outperforming all baselines; (iii) robustness checks confirm this edge across high‑volatility regimes, asset classes and text‑lag scenarios; and (iv) the combination of SHAP and attention visualisation traces keyword‑level contributions, enhancing interpretability. Our results provide reproducible, quantifiable evidence for large‑scale LLM deployment in quantitative investing and point to future work on model compression, slippage estimation and multimodal extension.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Bingxing Wang, Shanghai Jingzhuo Investment Management Co., Ltd.

Shanghai Jingzhuo Investment Management Co., Ltd., China.

References

[1] Bačić, B., Feng, C., & Li, W. (2024). Jy61 imu sensor external validity: A framework for advanced pedometer algorithm personalisation. ISBS Proceedings Archive, 42(1), 60.

[2] Xu, Z., & Liu, Y. (2025). Robust anomaly detection in network traffic: Evaluating machine learning models on CICIDS2017. arXiv preprint arXiv:2506.19877.

[3] Liu, Y., Qin, X., Gao, Y., Li, X., & Feng, C. (2025). SETransformer: A hybrid attention-based architecture for robust human activity recognition. INNO-PRESS: Journal of Emerging Applied AI, 1(1).

[4] Bačić, B., Vasile, C., Feng, C., & Ciucă, M. G. (2024). Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study. arXiv preprint arXiv:2412.20733.

[5] He, Y., Li, S., Li, K., Wang, J., Li, B., Shi, T., … & Wang, X. (2025). Enhancing low-cost video editing with lightweight adaptors and temporal-aware inversion. arXiv preprint arXiv:2501.04606.

[6] Feng, C., Bačić, B., & Li, W. (2025). SCA-LSTM: A deep learning approach to golf swing analysis and performance enhancement. In International Conference on Neural Information Processing (pp. 72–86). Springer, Singapore.

[7] Yu, D., Liu, L., Wu, S., Li, K., Wang, C., Xie, J., … & Ji, R. (2025, March). Machine learning optimizes the efficiency of picking and packing in automated warehouse robot systems. In 2025 IEEE International Conference on Electronics, Energy Systems and Power Engineering (EESPE) (pp. 1325–1332). IEEE.

[8] Zhang, T. (2025). Constructing a decentralized AI data marketplace enabled by a blockchain-based incentive mechanism. Journal of Industrial Engineering and Applied Science, 3(3), 42–46.

[9] Zhou, Y., Zhang, J., Chen, G., Shen, J., & Cheng, Y. (2024). Less is more: Vision representation compression for efficient video generation with large language models.

[10] Wang, J., Zhang, Z., He, Y., Song, Y., Shi, T., Li, Y., … & He, L. (2024). Enhancing Code LLMs with reinforcement learning in code generation. arXiv preprint arXiv:2412.20367.

[11] Voigt, F., Von Luck, K., & Stelldinger, P. (2024, June). Assessment of the applicability of large language models for quantitative stock price prediction. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments (pp. 293-302).

[12] Xing, Z., & Zhao, W. (2024). Unsupervised action segmentation via fast learning of semantically consistent actoms. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 6270–6278).

[13] Liu, S., Zhang, Y., Li, X., Liu, Y., Feng, C., & Yang, H. (2025). Gated multimodal graph learning for personalized recommendation. INNO-PRESS: Journal of Emerging Applied AI, 1(1).

[14] Ding, T., Xiang, D., Rivas, P., & Dong, L. (2025). Neural pruning for 3D scene reconstruction: Efficient NeRF acceleration. arXiv preprint arXiv:2504.00950.

[15] Wang, J. (2021, April 30). Excess return method is the core of IP asset valuation. China Accounting News, 7.

[16] Xing, Z., Li, H., Liu, W., Ren, Z., Chen, J., Xu, J., & Qin, C. (2022). Spectrum efficiency prediction for real-world 5G networks based on drive-testing data. In IEEE Wireless Communications and Networking Conference (WCNC) (pp. 2136–2141).

[17] Ding, T., Xiang, D., Sun, T., Qi, Y., & Zhao, Z. (2025). AI-driven prognostics for state-of-health prediction in Li-ion batteries: A comprehensive analysis with validation. arXiv preprint arXiv:2504.05728.

[18] Xing, Z., & Chen, J. (2024). Constructing indoor region-based radio map without location labels. IEEE Transactions on Signal Processing, 72(1), 2512–2526.

[19] Chen, F., & Yu, Q. (2021). Building a data asset valuation model: A multi-period excess return approach. Finance and Accounting Monthly, (23), 21–27.

[20] Wang, C., Nie, C., & Liu, Y. (2025). Evaluating supervised learning models for fraud detection: A comparative study of classical and deep architectures on imbalanced transaction data. arXiv preprint arXiv:2505.22521.

[21] Zhou, Y., Shen, J., & Cheng, Y. (2025). Weak to strong generalization for large language models with multi-capabilities. In The Thirteenth International Conference on Learning Representations.

[22] Ding, T., Xiang, D., Qi, Y, Yang, Z., Zhao, Z., Sun, T., Feng, P., & Wang, H. (2025). NeRF-based defect detection. In International Conference on Remote Sensing, Mapping, and Image Processing (RSMIP 2025) (Vol. 13650, 136501E). SPIE.

[23] Liu, F., Guo, S., Xing, Q., Sha, X., Chen, Y., Jin, Y., Zheng, Q., & Yu, C. (2024). Application of an ANN and LSTM-based ensemble model for stock market prediction. In 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE) (pp. 390–395). IEEE.

[24] Fan, B., Han, F., Chen, S., & Guo, Y. (2022). Cost–benefit model and empirical test of government performance meta-evaluation. Journal of Beijing Institute of Technology (Social Sciences), 24(2), 133–140.

[25] Xing, Z., & Chen, J. (2024). HMM-based CSI embedding for trajectory recovery from RSS measurements of non-cooperative devices. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7060–7064).

[26] Fatouros, G., Metaxas, K., Soldatos, J., & Kyriazis, D. (2024). Can large language models beat wall street? unveiling the potential of ai in stock selection. arXiv preprint arXiv:2401.03737.

[27] Yu, Z., Tang, H., & Leng, N. (2025). Spillover effects of carbon emission trading on the green supply-chain finance network. Journal of Economic Theory and Business Management, 2(3), 1-5.

[28] Wu, S., Fu, L., Chang, R., Wei, Y., Zhang, Y., Wang, Z., … & Li, K. (2025). Warehouse robot task scheduling based on reinforcement learning to maximize operational efficiency. Authorea Preprints.

[29] Cheng, Y., Wang, L., Sha, X., Tian, Q., Liu, F., Xing, Q., Wang, H., & Yu, C. (2024). Optimized credit score prediction via an ensemble model and SMOTEENN integration. In 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE) (pp. 355–361). IEEE.

[30] Lopez-Lira, A., Kwon, J., Yoon, S., Sohn, J. Y., & Choi, C. (2025). Bridging language models and financial analysis. arXiv preprint arXiv:2503.22693.

[31] Li, K., Liu, L., Chen, J., Yu, D., Zhou, X., Li, M., … & Li, Z. (2024, November). Research on reinforcement learning-based warehouse robot navigation algorithm in complex warehouse layout. In 2024 6th International Conference on Artificial Intelligence and Computer Applications (ICAICA) (pp. 296–301). IEEE.

[32] Liu, C., Arulappan, A., Naha, R., Mahanti, A., Kamruzzaman, J., & Ra, I. H. (2024). Large language models and sentiment analysis in financial markets: A review, datasets and case study. IEEE Access.

[33] Lopez-Lira, A., & Tang, Y. (2023). Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint arXiv:2304.07619.

[34] Xing, Z., Chen, J., & Tang, Y. (2022). Integrated segmentation and subspace clustering for RSS-based localization under blind calibration. In IEEE Global Communications Conference (GLOBECOM) (pp. 5360-5365).

[35] Ding, T., Xiang, D., Schubert, K. E., & Dong, L. (2025). GKAN: Explainable diagnosis of Alzheimer’s disease using graph neural networks with Kolmogorov-Arnold networks. arXiv preprint arXiv:2504.00946.

[36] Yu Jiang, Tianzuo Zhang, & Huanyu Liu. (2025, June 6). Research and design of blockchain-based propagation algorithm for IP tags using local random walk in big data marketing. TechRxiv. https://doi.org/10.36227/techrxiv.174918179.99038154/v1

[37] Li, X., & Wang, Y. (2024). Deep Learning-Enhanced Adaptive Interface for Improved Accessibility in E-Government Platforms. Preprints. https://doi.org/10.20944/preprints202412.1062.v1

[38] Wang, Y., Gong, C., Xu, Q., & Zheng, Y. (2024). Design of Privacy-Preserving Personalized Recommender System Based on Federated Learning.

[39] Zhao, P., Liu, X., Su, X., Wu, D., Li, Z., Kang, K., ... & Zhu, A. (2025). Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans. Algorithms, 18(4), 214.

[40] Li, Z. (2025). Retrieval-augmented forecasting with tabular time series data. Proceedings of the 4th Table Representation Learning Workshop at ACL 2025.

[41] Freedman, H., Young, N., Schaefer, D., Song, Q., van der Hoek, A., & Tomlinson, B. (2024). Construction and analysis of collaborative educational networks based on student concept maps. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW1), 1–22.

[42] Zhang, B., Han, Y., & Han, X. (2025). Research on multi-modal retrieval system of e-commerce platform based on pre-training model. Artificial Intelligence Technology Research, 2(9).

[43] Li, Z. (2025). Investigating spurious correlations in vision models using counterfactual images. Proceedings of the First Workshop on Experimental Model Auditing via Controllable Synthesis at CVPR 2025.

[44] Lv, K. (2024). CCi-YOLOv8n: Enhanced fire detection with CARAFE and context-guided modules. arXiv preprint arXiv:2411.11011.

[45] Wang, J., Ding, W., & Zhu, X. (2025). Financial analysis: Intelligent financial data analysis system based on LLM-RAG. arXiv preprint arXiv:2504.06279.

[46] Li, Z. (2025). Episodic memory banks for lifelong robot learning: A case study focusing on household navigation and manipulation. Proceedings of the Workshop on Foundation Models Meet Embodied Agents at CVPR 2025.

[47] Zhang, L., Liang, R., … (2025). Avocado price prediction using a hybrid deep learning model: TCN-MLP-Attention architecture. arXiv preprint arXiv:2505.09907.