Generative AI Models Theoretical Foundations and Algorithmic Practices

Yongnian Cao; Xuechun Yang; Rui Sun

doi:10.70393/6a69656173.323633

Authors

Yongnian Cao TikTok Inc
Xuechun Yang TikTok Inc
Rui Sun TikTok Inc

DOI:

https://doi.org/10.70393/6a69656173.323633

ARK:

https://n2t.net/ark:/40704/JIEAS.v3n1a01

Disciplines:

Artificial Intelligence Technology

Subjects:

Natural Language Processing

References:

30

Keywords:

Generative AI, Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, Probabilistic Modeling, KL Divergence, Evidence Lower Bound (ELBO), Adversarial Optimization

Abstract

Generative models in AI are an entirely new paradigm for machine learning, allowing computers to create realistic data in all kinds of categories, like text (NLP), images, and even physics simulations. In this paper this formalism is used to guide the theory, algorithms and applications of generative models, with particular focus on a few well established techniques like VAEs, GANs, and diffusion models. It stresses the importance of probabilistic generative modelling and information theory (I.e. KL divergence, ELBO, adversarial optimization, etc.) We cover algorithmic practices such as optimization techniques, multimodal and conditional generation, and efficient data-driven strategies, demonstrating the impact of these methods in various real-world applications including text, image, and audio generation, industrial design, and scientific discovery. However, the fields are still grappling with significant challenges — training instability, the need for huge computational resources, and a lack of consistent, unified treatment across applications. The paper finishes with an optimistic vision of what the future has to hold, such as finding more sample efficient ways to learn, architectures to facilitate scalability on a global scale, and cohesive theoretical frameworks to bring out the very best in generative AI. By combining this theoretical understanding with practical implications, this paper will explore generative AI technologies and their potential to transform whole industries and scientific disciplines.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Yongnian Cao, TikTok Inc

TikTok Inc, USA.

Xuechun Yang, TikTok Inc

TikTok Inc, USA.

Rui Sun, TikTok Inc

TikTok Inc, USA.

References

Showrov, A. A., Aziz, M. T., Nabil, H. R., Jim, J. R., Kabir, M. M., Mridha, M. F., ... & Shin, J. (2024). Generative Adversarial Networks (GANs) in Medical Imaging: Advancements, Applications and Challenges. IEEE Access.

Bilgram, V., & Laarmann, F. (2023). Accelerating innovation with generative AI: AI-augmented digital prototyping and innovation methods. IEEE Engineering Management Review, 51(2), 18-2

Theis, L., Oord, A. V. D., & Bethge, M. (2015). A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844.

Huang, X., Wu, Y., Zhang, D., Hu, J., & Long, Y. (2024, September). Improving Academic Skills Assessment with NLP and Ensemble Learning. In 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE) (pp. 37-41). IEEE.

Markechová, D., & Riečan, B. (2017). Kullback–Leibler divergence and mutual information of partitions in product MV algebras. Entropy, 19(6), 267.

Arbel, M., Zhou, L., & Gretton, A. (2020). Generalized energy based models. arXiv preprint arXiv:2003.05033.

Rogers, W. A. (2004). Evidence based medicine and justice: a framework for looking at the impact of EBM upon vulnerable or disadvantaged groups. Journal of Medical Ethics, 30(2), 141-145.

Bond-Taylor, S., Leach, A., Long, Y., & Willcocks, C. G. (2021). Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE transactions on pattern analysis and machine intelligence, 44(11), 7327-7347.

Regis, M., Serra, P., & van den Heuvel, E. R. (2022). Random autoregressive models: A structured overview. Econometric Reviews, 41(2), 207-230.

McCoy, R. T., Yao, S., Friedman, D., Hardy, M. D., & Griffiths, T. L. (2024). Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proceedings of the National Academy of Sciences, 121(41), e2322420121.

Baur, M., Fesl, B., & Utschick, W. (2024). Leveraging variational autoencoders for parameterized MMSE estimation. IEEE Transactions on Signal Processing.

Ma, Y., Yang, J., & Yan, R. (2024). Sharpness-Aware Gradient Alignment for Domain Generalization with Noisy Labels in Intelligent Fault Diagnosis. IEEE Transactions on Instrumentation and Measurement.

Puy, G., Gidaris, S., Boulch, A., Siméoni, O., Sautier, C., Pérez, P., ... & Marlet, R. (2024). Three pillars improving vision foundation model distillation for lidar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21519-21529).

Qu, Y., Nathaniel, J., Li, S., & Gentine, P. (2024). Deep generative data assimilation in multimodal setting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 449-459).

Yang, C., Nutakki, T. U. K., Alghassab, M. A., Alkhalaf, S., Alturise, F., Alharbi, F. S., ... & Abdullaev, S. (2024). Optimized integration of solar energy and liquefied natural gas regasification for sustainable urban development: Dynamic modeling, data-driven optimization, and case study. Journal of Cleaner Production, 447, 141405.

Briouya, A., Briouya, H., & Choukri, A. (2024). Overview of the progression of state-of-the-art language models. TELKOMNIKA (Telecommunication Computing Electronics and Control), 22(4), 897-909.

Hatamizadeh, A., Song, J., Liu, G., Kautz, J., & Vahdat, A. (2025). Diffit: Diffusion vision transformers for image generation. In European Conference on Computer Vision (pp. 37-55). Springer, Cham.

Konya, A., & Nematzadeh, P. (2024). Recent applications of AI to environmental disciplines: A review. Science of The Total Environment, 906, 167705.

Bendoly, E., Chandrasekaran, A., Lima, M. D. R. F., Handfield, R., Khajavi, S. H., & Roscoe, S. (2024). The role of generative design and additive manufacturing capabilities in developing human–AI symbiosis: Evidence from multiple case studies. Decision Sciences, 55(4), 325-345.

Li, X., Zhou, Y., & Dou, Z. (2024, March). Unigen: A unified generative framework for retrieval and question answering with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 8, pp. 8688-8696).

Tekgul, E. (2024). Sample-efficient learning of antenna parameters for enhanced coverage, capacity, and spectrum coexistence (Doctoral dissertation).

Guo, Z. B., Xu, L. F., Zheng, Y. H., Xie, J. S., & Wang, T. T. (2025). Bearing fault diagnostic framework under unknown working conditions based on condition-guided diffusion model. Measurement, 242, Article 115951.

Zhong, Y. N. (2024). Optimizing the structural design of computing units in autonomous driving systems and electric vehicles to enhance overall performance stability. International Journal of Advance in Applied Science Research, 3, 93-98.

Zhong, Y. (2024). Enhancing the heat dissipation efficiency of computing units within autonomous driving systems and electric vehicles.

Lin, W. (2024). A Review of Multimodal Interaction Technologies in Virtual Meetings. Journal of Computer Technology and Applied Mathematics, 1(4), 60-68.

Lin, W. (2024). A Systematic Review of Computer Vision-Based Virtual Conference Assistants and Gesture Recognition. Journal of Computer Technology and Applied Mathematics, 1(4), 28-35.

Lyu, S. (2024). The Application of Generative AI in Virtual Reality and Augmented Reality. Journal of Industrial Engineering and Applied Science, 2(6), 1-9.

Lyu, S. (2024). The Technology of Face Synthesis and Editing Based on Generative Models. Journal of Computer Technology and Applied Mathematics, 1(4), 21-27.

Lyu, S. (2024). Machine Vision-Based Automatic Detection for Electromechanical Equipment. Journal of Computer Technology and Applied Mathematics, 1(4), 12-20.

Sun, Y., & Ortiz, J. (2024). GenAI-Driven Cyberattack Detection in V2X Networks for Enhanced Road Safety and Autonomous Vehicle Defense. International Journal of Advance in Applied Science Research, 3, 67-75.

Generative AI Models Theoretical Foundations and Algorithmic Practices

Authors

DOI:

ARK:

Disciplines:

Subjects:

References:

Keywords:

Abstract

Downloads

Metrics

Author Biographies

Yongnian Cao, TikTok Inc

Xuechun Yang, TikTok Inc

Rui Sun, TikTok Inc

References

Downloads

Published

How to Cite

Issue

Section

ARK

License

Make a Submission

Keywords

Index

Information

Announcements

Strengthened Review Announcement

Announcements