A Review of Multimodal Interaction Technologies in Virtual Meetings
DOI:
https://doi.org/10.5281/zenodo.13988124ARK:
https://n2t.net/ark:/40704/JCTAM.v1n4a08Disciplines:
Artificial IntelligenceSubjects:
Natural Language ProcessingReferences:
34Keywords:
Virtual Meetings, Multimodal Interaction, User Experience, Natural Language Processing, Technological DevelopmentAbstract
Multimodal interaction technologies enhance the planning of human-to-human virtual meetings that involve Call Centers located in one part of the world and customers in other activities through the use of interacting with people regardless of their languages thus addressing problems of culture, equity and more interaction and participation from humans is sought after. Hence, language translation and transcription in real-time is an important aspect of multimodal technologies. In addition to these functions, emotion analysis, natural language processing and sentiment analysis, in their turn, allow the host to understand how participants feel and the main ideas expressed in the audio or video minutes. Also, participants have the opportunity to record their notes, which will in turn be generated in the form of minutes using natural language processing. Thus, turning audio and videos into texts significantly improves the efficiency and quality of meetings with natural human engagement.
References
Brennan, J. N., Hall, A. J., & Baird, E. J. (2023). Online case-based educational meetings can increase knowledge, skills, and widen access to surgical training: The nationwide Virtual Trauma & Orthopaedic Meeting series. The Surgeon, 21(5), e263-e270.
Bohao, D., & Desheng, L. (2021, May). User visual attention behavior analysis and experience improvement in virtual meeting. In 2021 IEEE 7th International Conference on Virtual Reality (ICVR) (pp. 269-278). IEEE.
Bahreini, K., Nadolski, R., & Westera, W. (2016). Data fusion for real-time multimodal emotion recognition through webcams and microphones in e-learning. International Journal of Human–Computer Interaction, 32(5), 415-430.
Multimodal interaction is a technology that achieves human-computer interaction by integrating multiple sensory modalities, such as vision, hearing, and touch.
The notable features of multimodal interaction include the diversity of information and the sense of user engagement.
Wang, D. (Ed.). (2016). Information Science and Electronic Engineering: Proceedings of the 3rd International Conference of Electronic Engineering and Information Science (ICEEIS 2016), January 4-5, 2016, Harbin, China. CRC Press.
Rakkolainen, I., Farooq, A., Kangas, J., Hakulinen, J., Rantala, J., Turunen, M., & Raisamo, R. (2021). Technologies for multimodal interaction in extended reality—a scoping review. Multimodal Technologies and Interaction, 5(12), 81.
Deng, L., Wang, Y. Y., Wang, K., Acero, A., Hon, H. W., Droppo, J. G., ... & Huang, X. D. (2004). Speech and language processing for multimodal human-computer interaction. Real World Speech Processing, 87-113.
Y. Chen, J. Zhao, Z. Wen, Z. Li, and Y. Xiao, Temporalmed: Advancing medical dialogues with time-aware responses in large language models, in Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024, pp. 116–124.
Leong, H. Y., Gao, Y. F., Shuai, J., Zhang, Y., & Pamuksuz, U. (2024). Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation. arXiv preprint arXiv:2409.09324.
Y. Chen, Q. Fu, Y. Yuan, Z. Wen, G. Fan, D. Liu, D. Zhang, Z. Li, and Y. Xiao, Hallucination detection: Robustly discerning reliable answers in large language models, in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 245–255.
Kachhoria, R., Daga, N., Ramteke, H., Akotkar, Y., & Ghule, S. (2024, March). Minutes of Meeting Generation for Online Meetings Using NLP & ML Techniques. In 2024 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 1-6). IEEE.
Qu, M. (2024). High Precision Measurement Technology of Geometric Parameters Based on Binocular Stereo Vision Application and Development Prospect of The System in Metrology and Detection. Journal of Computer Technology and Applied Mathematics, 1(3), 23-29.
Lavanya, R., Gautam, A. K., & Anand, A. (2024, April). Real Time Translator with Added Features for Cross Language Communiciation. In 2024 10th International Conference on Communication and Signal Processing (ICCSP) (pp. 854-857). IEEE.
Atawneh, S., Alshammari, Z., Mousa, A. L., & Shawar, B. A. A Security Framework for Addressing Privacy Issues in the Zoom Conference System.
Liu, S., Yan, K., Qin, F., Wang, C., Ge, R., Zhang, K., ... & Cao, J. (2024, August). Infrared image super-resolution via lightweight information split network. In International Conference on Intelligent Computing (pp. 293-304). Singapore: Springer Nature Singapore.
Cao, J., Xu, R., Lin, X., Qin, F., Peng, Y., & Shao, Y. (2023). Adaptive receptive field U-shaped temporal convolutional network for vulgar action segmentation. Neural Computing and Applications, 35(13), 9593-9606.
Li, H., & Zhang, W. (2022, November). Cat face recognition using Siamese network. In International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP 2022) (Vol. 12456, pp. 640-645). SPIE.
Zhang, J., Zhang, W., Tan, C., Li, X., & Sun, Q. (2024). YOLO-PPA based efficient traffic sign detection for cruise control in autonomous driving. arXiv preprint arXiv:2409.03320. https://arxiv.org/abs/2409.03320
Zhang, W., Huang, J., Wang, R., Wei, C., Huang, W., & Qiao, Y. (2024). Integration of Mamba and Transformer--MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics. arXiv preprint arXiv:2409.08530.
Bačić, B., Feng, C., & Li, W. (2024). JY61 IMU SENSOR EXTERNAL VALIDITY: A FRAMEWORK FOR ADVANCED PEDOMETER ALGORITHM PERSONALISATION. ISBS Proceedings Archive, 42(1), 60.
Kheddar, H., Hemis, M., & Himeur, Y. (2024). Automatic speech recognition using advanced deep learning approaches: A survey. Information Fusion, 102422.
Liu, W., Zhou, L., Zeng, D., Xiao, Y., Cheng, S., Zhang, C., ... & Chen, W. (2024). Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction. arXiv preprint arXiv:2405.01884.
Zhang, Y., Wang, F., Huang, X., Li, X., Liu, S., & Zhang, H. (2024). Optimization and application of cloud-based deep learning architecture for multi-source data prediction. arXiv. https://arxiv.org/abs/2410.12642
Jiang, L., Yang, X., Yu, C., Wu, Z., & Wang, Y. (2024, July). Advanced AI framework for enhanced detection and assessment of abdominal trauma: Integrating 3D segmentation with 2D CNN and RNN models. In 2024 3rd International Conference on Robotics, Artificial Intelligence and Intelligent Control (RAIIC) (pp. 337-340). IEEE.
Kord, S., Taghikhany, T., & Akbari, M. (2024). A novel spatiotemporal 3D CNN framework with multi-task learning for efficient structural damage detection. Structural Health Monitoring, 23(4), 2270-2287.
Jiang, H., Qin, F., Cao, J., Peng, Y., & Shao, Y. (2021). Recurrent neural network from adder’s perspective: Carry-lookahead RNN. Neural Networks, 144, 297-306.
Liu, W., Cheng, S., Zeng, D., & Qu, H. (2023). Enhancing document-level event argument extraction with contextual clues and role relevance. arXiv preprint arXiv:2310.05991.
Ge, Q., Li, J., Wang, X., Deng, Y., Zhang, K., & Sun, H. (2024). LiteTransNet: An interpretable approach for landslide displacement prediction using transformer model with attention mechanism. Engineering Geology, 331, 107446.
Yu, P., Cui, V. Y., & Guan, J. (2021, March). Text classification by using natural language processing. In Journal of Physics: Conference Series (Vol. 1802, No. 4, p. 042010). IOP Publishing.
Jin, Y., Zhou, W., Wang, M., Li, M., Li, X., & Hu, T. (2024, June). Online learning of multiple tasks and their relationships: Testing on spam email data and eeg signals recorded in construction fields. In 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation (AIEA) (pp. 463-467). IEEE.
Cao, Y., Weng, Y., Li, M., & Yang, X. The Application of Big Data and AI in Risk Control Models: Safeguarding User Security.
Yuyan Chen, Yichen Yuan, Panjun Liu, Dayiheng Liu, Qinghao Guan, Mengfei Guo, Haiming Peng, Bang Liu, Zhixu Li, and Yanghua Xiao. 2024e. Talk funny! a large-scale humor response dataset with chain-ofhumor interpretation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17826–17834.
Yu, P., Cui, V. Y., & Guan, J. (2021, March). Text classification by using natural language processing. In Journal of Physics: Conference Series (Vol. 1802, No. 4, p. 042010). IOP Publishing.
Downloads
Published
How to Cite
Issue
Section
ARK
License
Copyright (c) 2024 The author retains copyright and grants the journal the right of first publication.
This work is licensed under a Creative Commons Attribution 4.0 International License.