Data Quality Control in Semiconductor Manufacturing through Automated ETL Processes and Class Imbalance Handling Techniques
DOI:
https://doi.org/10.70393/6a69656173.333532ARK:
https://n2t.net/ark:/40704/JIEAS.v3n6a03Disciplines:
Information ScienceSubjects:
Data ManagementReferences:
26Keywords:
Automated ETL Processes, Data Quality Control, Missing Data Imputation, Class Imbalance Handling, Synthetic Minority Over-sampling Technique, Feature Selection, Yield Prediction, Predictive MaintenanceAbstract
In semiconductor manufacturing, ensuring data quality is crucial for maintaining high production efficiency and product consistency. However, missing values, noise, and class imbalance in sensor data complicate the quality control process. This paper proposes a comprehensive framework that automates data cleaning and quality control by integrating ETL processes, advanced interpolation techniques, and class imbalance handling methods. A feature selection mechanism based on a voting strategy is introduced to optimize model predictions. Our research on real semiconductor manufacturing data validates the accuracy of the proposed method in improving data quality, yield, and defect detection prediction accuracy. This contributes to advancing data quality control in semiconductor manufacturing and provides a practical approach for future research in industrial data management and predictive maintenance.
Downloads
Metrics
References
[1] Sun, Y., & Ortiz, J. (2024). An ai-based system utilizing iot-enabled ambient sensors and llms for complex activity tracking. arXiv preprint arXiv:2407.02606.
[2] Huang, S. (2025). Reinforcement Learning with Reward Shaping for Last-Mile Delivery Dispatch Efficiency. European Journal of Business, Economics & Management, 1(4), 122-130.
[3] Ren, L. (2025). Leveraging Large Language Models for Anomaly Event Early Warning in Financial Systems. European Journal of AI, Computing & Informatics, 1(3), 69-76.
[4] Wang, K. J., Wang, S. M., & Yang, S. J. (2007). A resource portfolio model for equipment investment and allocation of semiconductor testing industry. European Journal of Operational Research, 179(2), 390-403.
[5] Ren, L. (2025). Causal Modeling for Fraud Detection: Enhancing Financial Security with Interpretable AI. European Journal of Business, Economics & Management, 1(4), 94-104.
[6] Chen, Y. (2025). Artificial Intelligence in Economic Applications: Stock Trading, Market Analysis, and Risk Management. Journal of Economic Theory and Business Management, 2(5), 7-14.
[7] Tian, Y., Yang, Z., Liu, C., Su, Y., Hong, Z., Gong, Z., & Xu, J. (2025). CenterMamba-SAM: Center-Prioritized Scanning and Temporal Prototypes for Brain Lesion Segmentation. arXiv preprint arXiv:2511.01243.
[8] Li, K., Chen, X., Song, T., Zhou, C., Liu, Z., Zhang, Z., Guo, J., & Shan, Q. (2025a, March 24). Solving situation puzzles with large language model and external reformulation.
[9] Luo, M., Du, B., Zhang, W., Song, T., Li, K., Zhu, H., ... & Wen, H. (2023). Fleet rebalancing for expanding shared e-Mobility systems: A multi-agent deep reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 24(4), 3868-3881.
[10] Chen, Y. (2025). Interpretable Automated Machine Learning for Asset Pricing in US Capital Markets. Journal of Economic Theory and Business Management, 2(5), 15-21.
[11] Liu, Z. (2022, January 20–22). Stock volatility prediction using LightGBM based algorithm. In 2022 International Conference on Big Data, Information and Computer Network (BDICN) (pp. 283–286). IEEE.
[12] Liu, Z. (2025). Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges. ICCK Transactions on Emerging Topics in Artificial Intelligence, 2(4), 173-181.
[13] Wu, H., Zha, Z. J., Wen, X., Chen, Z., Liu, D., & Chen, X. (2019, October). Cross-fiber spatial-temporal co-enhanced networks for video action recognition. In Proceedings of the 27th ACM international conference on multimedia (pp. 620-628).
[14] Liu, Z. (2025). Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems. arXiv:2507.17774.
[15] Jin, Y., Li, Z., Zhang, C., Cao, T., Gao, Y., Jayarao, P., ... & Yin, B. (2024). Shopping mmlu: A massive multi-task online shopping benchmark for large language models. Advances in Neural Information Processing Systems, 37, 18062-18089.
[16] Wang, H., Li, Q., & Liu, Y. (2022). Regularized Buckley–James method for right‐censored outcomes with block‐missing multimodal covariates. Stat, 11(1), e515.
[17] Wang, H., Sun, W., & Liu, Y. (2022). Prioritizing autism risk genes using personalized graphical models estimated from single-cell rna-seq data. Journal of the American Statistical Association, 117(537), 38-51.
[18] Chen, Yinlei. "Daily Asset Pricing Based on Deep Learning: Integrating No-Arbitrage Constraints and Market Dynamics." Journal of Computer Technology and Applied Mathematics 2.6 (2026): 1-10.
[19] Ren, L. (2025). Reinforcement Learning for Prioritizing Anti-Money Laundering Case Reviews Based on Dynamic Risk Assessment. Journal of Economic Theory and Business Management, 2(5), 1-6.
[20] Pang, F. (2020, November). Research on Incentive Mechanism of Teamwork Based on Unfairness Aversion Preference Model. In 2020 2nd International Conference on Economic Management and Model Engineering (ICEMME) (pp. 944-948). IEEE.
[21] Cao S, Wang J, Tse T K T. Life‐cycle cost analysis and life‐cycle assessment of the second‐generation benchmark building subject to typhoon wind loads in Hong Kong[J]. The Structural Design of Tall and Special Buildings, 2023, 32(11-12): e2014.
[22] Ren, L. (2025). Boosting algorithm optimization technology for ensemble learning in small sample fraud detection. Academic Journal of Engineering and Technology Science, 8(4), 53-60.
[23] Wang J, Tse K T, Li S W. Integrating the effects of climate change using representative concentration pathways into typhoon wind field in Hong Kong[C]//Proceedings of the 8th European African Conference on Wind Engineering. 2022: 20-23.
[24] Wang J, Tim K T, Li S, et al. A systematic comparison of the wind profile codifications in the Western Pacific Region[J]. Wind and Structures, 2023, 37(2): 105-115.
[25] Saxena, S., & Unruh, A. (2002). Diagnosis of semiconductor manufacturing equipment and processes. IEEE transactions on semiconductor manufacturing, 7(2), 220-232.
[26] Ditmore, D., Stewart, J., Dudley, R., & Bright, N. (1989, September). Achieving semiconductor equipment reliability. In Proceedings. Seventh IEEE/CHMT International Electronic Manufacturing Technology Symposium, (pp. 5-11). IEEE.
Downloads
Published
How to Cite
Issue
Section
ARK
License
Copyright (c) 2025 The author retains copyright and grants the journal the right of first publication.

This work is licensed under a Creative Commons Attribution 4.0 International License.







