A Systematic Review of Computer Vision-Based Virtual Conference Assistants and Gesture Recognition

Weikun Lin

doi:10.5281/zenodo.13889718

Authors

Weikun Lin Shandong University of Science and Technology

DOI:

https://doi.org/10.5281/zenodo.13889718

ARK:

https://n2t.net/ark:/40704/JCTAM.v1n4a04

Disciplines:

Computer Science

Subjects:

Computer Vision

References:

40

Keywords:

Gesture Recognition, Computer Vision, Deep Learning

Abstract

In the process of introducing gesture recognition, it is essential to explore its technical background and implementation methods. Gesture recognition algorithms based on deep learning perform exceptionally well when processing real-time video streams. These algorithms can extract gesture features and classify them to identify user intentions. For instance, analyzing gesture images using Convolutional Neural Networks (CNN) can effectively enhance recognition accuracy and real-time performance. Additionally, combining optical flow methods with object detection techniques allows for real-time tracking of user hand movements, leading to more precise recognition results. Factors such as changes in ambient lighting, cluttered backgrounds, and the diversity of user gestures can all impact recognition accuracy. Therefore, researchers need to continuously optimize algorithms to improve the robustness and adaptability of the system. At the same time, when designing virtual conference assistants, the user interface's friendliness and usability should also be considered, enabling users of varying technical skill levels to use the system with ease.

Author Biography

Weikun Lin, Shandong University of Science and Technology

Shandong University of Science and Technology, China.

References

Mo, Y. ., Qin, H., Dong, Y., Zhu, Z., & Li, Z. (2024). Large Language Model (LLM) AI Text Generation Detection based on Transformer Deep Learning Algorithm. International Journal of Engineering and Management Research, 14(2), 154–159. https://doi.org/10.5281/zenodo.11124440

Hao Qin, & Zhi Li. (2024). A Study on Enhancing Government Efficiency and Public Trust: The Transformative Role of Artificial Intelligence and Large Language Models. International Journal of Engineering and Management Research, 14(3), 57–61. https://doi.org/10.5281/zenodo.12619360

Hao Qin, & Li, Z. (2024). Precision in Practice: Enhancing Healthcare with Domain-Specific Language Models. Applied Science and Engineering Journal for Advanced Research, 3(4), 28–33. https://doi.org/10.5281/zenodo.13253336

Qu, M. (2024). High Precision Measurement Technology of Geometric Parameters Based on Binocular Stereo Vision Application and Development Prospect of The System in Metrology and Detection. Journal of Computer Technology and Applied Mathematics, 1(3), 23–29. https://doi.org/10.5281/zenodo.13366612

Wang, D. (Ed.). (2016). Information Science and Electronic Engineering: Proceedings of the 3rd International Conference of Electronic Engineering and Information Science (ICEEIS 2016), January 4-5, 2016, Harbin, China. CRC Press.

Wang, Z., Chen, Y., Wang, F., & Bao, Q. (2024). Improved Unet model for brain tumor image segmentation based on ASPP-coordinate attention mechanism. arXiv preprint arXiv:2409.08588.

Wu, Z. (2024). MPGAAN: Effective and Efficient Heterogeneous Information Network Classification. Journal of Computer Science and Technology Studies, 6(4), 08-16

Y. Chen, J. Zhao, Z. Wen, Z. Li, and Y. Xiao, Temporalmed: Advancing medical dialogues with time-aware responses in large language models, in Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024, pp. 116–124.

Y. Chen, Q. Fu, Y. Yuan, Z. Wen, G. Fan, D. Liu, D. Zhang, Z. Li, and Y. Xiao, Hallucination detection: Robustly discerning reliable answers in large language models, in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 245–255.

Yuyan Chen, Yichen Yuan, Panjun Liu, Dayiheng Liu, Qinghao Guan, Mengfei Guo, Haiming Peng, Bang Liu, Zhixu Li, and Yanghua Xiao. 2024e. Talk funny! a large-scale humor response dataset with chain-ofhumor interpretation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17826–17834.

Guo, R., & Roth, D. (2021, August). Constrained labeled data generation for low-resource named entity recognition. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4519-4533).

Guo, R., Xu, W., & Ritter, A. (2024, August). Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 13708-13731).

Zhao, X., Wang, L., Zhang, Y., Han, X., Deveci, M., & Parmar, M. (2024). A review of convolutional neural networks in computer vision. Artificial Intelligence Review, 57(4), 99.

Ghazal, S., Munir, A., & Qureshi, W. S. (2024). Computer vision in smart agriculture and precision farming: Techniques and applications. Artificial Intelligence in Agriculture.

Khang, A., Abdullayev, V., Litvinova, E., Chumachenko, S., Alyar, A. V., & Anh, P. T. N. (2024). Application of Computer Vision (CV) in the Healthcare Ecosystem. In Computer Vision and AI-Integrated IoT Technologies in the Medical Ecosystem (pp. 1-16). CRC Press.

Schmidt, A., Mohareri, O., DiMaio, S., Yip, M. C., & Salcudean, S. E. (2024). Tracking and mapping in medical computer vision: A review. Medical Image Analysis, 103131.

Waelen, R. A. (2024). The ethics of computer vision: an overview in terms of power. AI and Ethics, 4(2), 353-362.

Goldblum, M., Souri, H., Ni, R., Shu, M., Prabhu, V., Somepalli, G., ... & Goldstein, T. (2024). Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks. Advances in Neural Information Processing Systems, 36.

Shedrawi, G., Magron, F., Vigga, B., Bosserelle, P., Gislard, S., Halford, A. R., ... & Andrew, N. L. (2024). Leveraging deep learning and computer vision technologies to enhance management of coastal fisheries in the Pacific region. Scientific Reports, 14(1), 20915.

Öncü, S. E., & Süral, İ. (2024). Leveraging AI for Enhanced Support: Satisfaction Levels of Users Utilizing Virtual Assistant in Open Education. Asian Journal of Distance Education, 19(1).

Pradhan, R., Jain, D., Agrawal, U., Sharma, M., Singh, D., & Sharma, D. K. (2024, March). Nova Virtual Assistant to Enhance Daily Life: On Perspective of User Needs, Preferences, and Expectations. In 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) (pp. 1-5). IEEE.

Katic, M. A., Miletic, E., & Candrlic, S. (2024). PERFORMANCE ANALYSIS OF VIRTUAL ASSISTANTS-A CASE STUDY BASED ON THE CROATIAN LANGUAGE. In INTED2024 Proceedings (pp. 7305-7315). IATED.

Kreamer, L. M., Rogelberg, S. G., Tankelevitch, L., & Rintel, S. (2024). Virtual voices: Exploring individual differences in chat and verbal participation in virtual meetings. Journal of Vocational Behavior, 152, 104015.

Cabrero-Daniel, B., Herda, T., Pichler, V., & Eder, M. (2024, May). Exploring Human-AI Collaboration in Agile: Customised LLM Meeting Assistants. In International Conference on Agile Software Development (pp. 163-178). Cham: Springer Nature Switzerland.

Shin, J., Miah, A. S. M., Kabir, M. H., Rahim, M. A., & Al Shiam, A. (2024). A Methodological and Structural Review of Hand Gesture Recognition Across Diverse Data Modalities. IEEE Access.

Kapitanov, A., Kvanchiani, K., Nagaev, A., Kraynov, R., & Makhliarchuk, A. (2024). HaGRID--HAnd Gesture Recognition Image Dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 4572-4581).

Rastgoo R, Kiani K, Escalera S, et al. Multi-modal zero-shot dynamic hand gesture recognition[J]. Expert Systems with Applications, 2024, 247: 123349.

Rahim, M. A., Miah, A. S. M., Akash, H. S., Shin, J., Hossain, M. I., & Hossain, M. N. (2024). An advanced deep learning based three-stream hybrid model for dynamic hand gesture recognition. arXiv preprint arXiv:2408.08035.

Mohammadi, Z., Akhavanpour, A., Rastgoo, R., & Sabokrou, M. (2024). Diverse hand gesture recognition dataset. Multimedia Tools and Applications, 83(17), 50245-50267.

Eddy, E., Campbell, E., Bateman, S., & Scheme, E. (2024). Big data in myoelectric control: Large multi-user models enable robust zero-shot emg-based discrete gesture recognition. Frontiers in Bioengineering and Biotechnology, 12, 1463377.

Karsh, B., Laskar, R. H., & Karsh, R. K. (2024). mIV3Net: modified inception V3 network for hand gesture recognition. Multimedia Tools and Applications, 83(4), 10587-10613.

Miah, A. S. M., Hasan, M. A. M., Tomioka, Y., & Shin, J. (2024). Hand gesture recognition for multi-culture sign language using graph and general deep learning network. IEEE Open Journal of the Computer Society.

Luo, M., Zhang, W., Song, T., Li, K., Zhu, H., Du, B., & Wen, H. (2021, January). Rebalancing expanding EV sharing systems with deep reinforcement learning. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence (pp. 1338-1344).

Luo, M., Du, B., Zhang, W., Song, T., Li, K., Zhu, H., ... & Wen, H. (2023). Fleet rebalancing for expanding shared e-Mobility systems: A multi-agent deep reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 24(4), 3868-3881.

Fan, H., Li, K., Li, X., Song, T., Zhang, W., Shi, Y., & Du, B. (2019). CoVSCode: a novel real-time collaborative programming environment for lightweight IDE. Applied Sciences, 9(21), 4642.

Tao Y. SQBA: sequential query-based blackbox attack, Fifth International Conference on Artificial Intelligence and Computer Science (AICS 2023). SPIE, 2023, 12803: 721-729.

Tao Y. Meta Learning Enabled Adversarial Defense, 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE). IEEE, 2023: 1326-1330.

Yiyi Tao, Yiling Jia, Nan Wang, and Hongning Wang. 2019. The FacT: Taming Latent Factor Models for Explainability with Factorization Trees. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 295–304.

Yiyi Tao, Zhuoyue Wang, Hang Zhang, Lun Wang. 2024. NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training. arXiv:2409.09582.

Sun, Y., & Ortiz, J. (2024). Machine Learning-Driven Pedestrian Recognition and Behavior Prediction for Enhancing Public Safety in Smart Cities. Journal of Artificial Intelligence and Information, 1, 51-57.