Benchmarking Learned Cardinality Estimation Techniques for Analytical Query Processing in Data Warehouses
DOI:
https://doi.org/10.70393/6a6374616d.343134ARK:
https://n2t.net/ark:/40704/JCTAM.v3n3a01Disciplines:
Software SystemsSubjects:
OtherReferences:
21Keywords:
Learned Cardinality Estimation, Data Warehouse, Query Optimization, Benchmark EvaluationAbstract
Cardinality estimation remains one of the most critical yet error-prone components of query optimization in modern data warehouses. Recent advances in machine learning have produced a diverse family of learned cardinality estimators that demonstrate substantial accuracy improvements on standard benchmarks. Yet existing evaluations predominantly rely on third-normal-form schemas, leaving their effectiveness on star and snowflake schemas—the backbone of analytical data warehousing—largely unexplored. This paper presents a systematic empirical evaluation of seven representative learned cardinality estimation methods spanning three paradigmatic categories: query-driven, data-driven, and hybrid approaches. All methods are benchmarked against the PostgreSQL histogram-based estimator on three complementary datasets: TPC-DS with its native snowflake schema, STATS-CEB with real-world relational data, and IMDB/JOB as the established cross-study reference. The evaluation encompasses estimation accuracy measured by Q-Error and P-Error, inference latency, training cost, model compactness, end-to-end query execution time, and robustness under simulated ETL batch insertions. Results indicate that hybrid methods, particularly FactorJoin, achieve the strongest accuracy on data warehouse workloads with a median Q-Error of 1.74 on TPC-DS, while data-driven methods such as FLAT and BayesCard offer a favorable balance between accuracy and inference speed. BayesCard and FactorJoin exhibit the highest resilience to data updates, with median Q-Error increasing by fewer than 1.5 points after a 50% data insertion. These findings provide actionable guidance for practitioners seeking to deploy learned cardinality estimation in production data warehouse environments.
References
[1] Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2015). How good are query optimizers, really? Proceedings of the VLDB Endowment, 9(3), 204–215.
[2] Zhou, X., Chai, C., Li, G., & Sun, J. (2022). Database meets artificial intelligence: A survey. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1096–1116.
[3] Han, Y., Wang, H., Chen, L., Dong, Y., Chen, X., Yu, B., Yang, C., & Qian, W. (2024). ByteCard: Enhancing ByteDance's data warehouse with learned cardinality estimation. In Proceedings of the 2024 ACM SIGMOD International Conference on Management of Data.
[4] Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., & Kemper, A. (2019). Learned cardinalities: Estimating correlated joins with deep learning. In Proceedings of the 9th Biennial Conference on Innovative Data Systems Research (CIDR).
[5] Negi, P., Marcus, R., Kipf, A., Mao, H., Tatbul, N., Kraska, T., & Alizadeh, M. (2021). Flow-Loss: Learning cardinality estimates that matter. Proceedings of the VLDB Endowment, 14(11), 2019–2032.
[6] Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., Abbeel, P., Hellerstein, J. M., Krishnan, S., & Stoica, I. (2019). Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, 13(3), 279–292.
[7] Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., & Stoica, I. (2020). NeuroCard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 14(1), 61–73.
[8] Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., & Binnig, C. (2020). DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, 13(7), 992–1005.
[9] Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., Zhou, J., & Cui, B. (2021). FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, 14(9), 1489–1502.
[10] Wu, P., & Cong, G. (2021). A unified deep model of learning from both data and queries for cardinality estimation. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data (pp. 2009–2022).
[11] Wu, Z., Negi, P., Alizadeh, M., Kraska, T., & Madden, S. (2023). FactorJoin: A new cardinality estimation framework for join queries. Proceedings of the ACM on Management of Data, 1(1).
[12] Wang, X., Qu, C., Wu, W., Wang, J., & Zhou, Q. (2021). Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment, 14(9), 1640–1654.
[13] Han, Y., Wu, Z., Wu, P., Zhu, R., Yang, J., Tan, L. W., Zeng, K., Cong, G., Qin, Y., Pfadler, A., Qian, Z., Zhou, J., Li, J., & Cui, B. (2022). Cardinality estimation in DBMS: A comprehensive benchmark evaluation. Proceedings of the VLDB Endowment, 15(4), 752–765.
[14] Kim, K., Jung, J., Seo, I., Han, W.-S., Choi, K., & Chong, J. (2022). Learned cardinality estimation: An in-depth study. In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data (pp. 1214–1227).
[15] Zhang, J., Zhang, C., Li, G., & Chai, C. (2021). Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment, 15(1), 85–97.
[16] Wu, Z., Shaikhha, A., Zhu, R., Zeng, K., Han, Y., & Zhou, J. (2020). BayesCard: Revitalizing Bayesian frameworks for cardinality estimation. arXiv preprint arXiv:2012.14743.
[17] Li, P., Wei, W., Zhu, R., Ding, B., Zhou, J., & Lu, H. (2023). ALECE: An attention-based learned cardinality estimator for SPJ queries on dynamic workloads. Proceedings of the VLDB Endowment, 17(2), 197–210.
[18] Wang, J., Chai, C., Liu, J., & Li, G. (2021). FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, 15(1), 72–84.
[19] Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2021). Bao: Making learned query optimization practical. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data (pp. 1275–1288).
[20] Sun, J., & Li, G. (2019). An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3), 307–319.
[21] Negi, P., Marcus, R., Kipf, A., Mao, H., Tatbul, N., Kraska, T., & Alizadeh, M. (2023). Robust query driven cardinality estimation under changing workloads. Proceedings of the VLDB Endowment, 16(7), 1520–1533.
Downloads
Published
How to Cite
Issue
Section
ARK
License
Copyright (c) 2026 The author retains copyright and grants the journal the right of first publication.

This work is licensed under a Creative Commons Attribution 4.0 International License.









