This is an outdated version published on 2025-03-18. Read the most recent version.

A Differential Privacy-Based Mechanism for Preventing Data Leakage in Large Language Model Training

Authors

Xingpeng Xiao Shandong University of Science and Technology
Yaomin Zhang University of San Francisco
Heyao Chen Beijing University of Posts and Telecommunications
Wenkun Ren Illinois Institute of Technology
Junyi Zhang Lawrence Technological University
Jian Xu University of Southern California

DOI:

https://doi.org/10.70393/616a736d.323732

ARK:

https://n2t.net/ark:/40704/AJSM.v3n2a04

Disciplines:

Management

Subjects:

Human Resource Management

References:

Keywords:

Large Language Model, Differential Privacy, Data Leakage Prevention, Privacy-preserving Machine Learning

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, yet they face significant challenges in protecting sensitive information during training. This paper presents a novel differential privacy-based mechanism for preventing data leakage in LLM training processes. The proposed system introduces a dynamic privacy budget allocation strategy integrated with adaptive noise injection mechanisms, specifically designed for transformer architectures. The mechanism implements a multi-layered protection framework that combines real-time monitoring capabilities with automated response systems. Through comprehensive experimental evaluation on models ranging from 100M to 175B parameters, our approach demonstrates superior performance in privacy protection while maintaining model utility. The system achieves a 99.2% detection rate for potential data leakages with a minimal false alarm rate of 0.8%, representing a significant improvement over traditional approaches. Performance analysis reveals that the proposed mechanism maintains model accuracy within 1.8% of non-private baselines while providing strong privacy guarantees. The implementation reduces computational overhead by 35% compared to conventional differential privacy methods. Our research establishes new benchmarks in privacy-preserving machine learning, particularly for large-scale language models, and provides a practical framework for secure AI system deployment.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Xingpeng Xiao, Shandong University of Science and Technology

Computer Application Technology, Shandong University of Science and Technology, Qingdao, China.

Yaomin Zhang, University of San Francisco

Computer Science, University of San Francisco, San Francisco, USA.

Heyao Chen, Beijing University of Posts and Telecommunications

Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing, China.

Wenkun Ren, Illinois Institute of Technology

Information Technology and Management, Illinois Institute of Technology, Chicago, USA.

Junyi Zhang, Lawrence Technological University

Electrical and Computer Engineering, Lawrence Technological University, Houston, USA.

Jian Xu, University of Southern California

Electrical and Electronics Engineering, University of Southern California, Angeles, USA.

Published

2025-03-18

Versions

2025-07-01 (2)
2025-03-18 (1)

How to Cite

Xiao, X., Zhang, Y., Chen, H., Ren, W., Zhang, J., & Xu, J. (2025). A Differential Privacy-Based Mechanism for Preventing Data Leakage in Large Language Model Training. Academic Journal of Sociology and Management, 3(2), 33–42. https://doi.org/10.70393/616a736d.323732