AI Powered Commentary and Camera Direction in E-Sports
DOI:
https://doi.org/10.9781/ijimai.2026.6566Keywords:
AI-driven Commentary, Camera Control, Computer Vision, E- sports Analytics, Neural ArchitecturesAbstract
Real-time, AI-driven commentary and camera direction provide revolutionary possibilities to improve spectator engagement and comprehension of live events in the rapidly advancing world of e-sports. This paper proposes an autonomous system designed to both generate dynamic commentary as well as control the spectator camera for live-streamed e-sports matches, specifically focusing on League of Legends (LoL), a popular Multiplayer Online Battle Arena (MOBA) game. It incorporates the use of GPT-4o with Vision and OpenAI’s TTS API. Synchronization of commentary with real-time camera movements is one of the major challenges tackled. This is done using a camera tracking and scene change detection algorithm that effectively adjusts the commentary to changing scenes in real-time by utilizing computer vision techniques. Further, two neural architectures for AI-driven camera control: a 2D Convolutional-LSTM (Conv-LSTM) model that concentrates on independent spatial and temporal analysis, and a 3D CNN model that combines these features to forecast camera movements in a more comprehensive way are presented. Evaluations on fluency, relevance, and strategic depth metrics, show that our integrated system improves viewer experience by providing deep and coherent narratives that are contextually aligned with the game dynamics. The proposed models are evaluated quantitatively in capturing spectator camera movement patterns.
Downloads
References
[1] Block, F. Haack, “esports: a new industry,” in Proceedings of 20th International Scientific Conference Globalization and its Socio-Economic Consequences, SHS Web of Conferences, Zilina, Slovak Republic, vol. 92, 2021, p. 04002, EDP Sciences, doi: https://doi.org/10.1051/shsconf/20219204002.
[2] N. Renella, M. Eger, “Towards automated video game commentary using generative ai,” in Proceedings of the AIIDE Workshop on Experimental Artificial Intelligence in Games, CEUR Workshop, Salt Lake City, Utah, USA, vol. 3626, 2023, pp. 341–350. url: https://ceur–ws.org/Vol–3626/paper7.pdf.
[3] Z. Wang, N. Yoshinaga, “Esports data-to-commentary generation on large-scale data-to-text dataset,” arXiv preprint arXiv:2212.10935, 2022.
[4] X. Qi, C. Li, Z. Liang, J. Liu, C. Zhang, Y. Wei, L. Yuan, G. Yang, L. Huang, M. Li, “Mcs: an in-battle commentary system for moba games,” in Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 2022, pp. 2962–2967. url: https://aclanthology.org/2022.coling–1.262/.
[5] O. Olarewaju, A. V. Kokkinakis, S. Demediuk, J. Robertson, I. Nölle, S. Patra, D. Slawson, A. Chitayat, A. Coates, B. Kirman, et al., “Automatic generation of text for match recaps using esport caster commentaries,” in Proceedings of International Conference of Natural Language Computing, CS IT – Computer Science Conference Proceedings, Sydney, Australia, 2020, pp. 117–131. Virtual conference, doi: https://doi.org/10.5121/CSIT.2020.101811.
[6] H. Huang, J. H. Xu, X. Ling, P. Paliyawan, “Sentence punctuation for collaborative commentary generation in esports live-streaming,” in Proceedings of IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2022, pp. 1–2. url: https://arxiv.org/abs/2110.12416.
[7] T. Ishigaki, G. Topić, Y. Hamazono, H. Noji, I. Kobayashi, Y. Miyao, H. Takamura, “Generating racing game commentary from vision, language, and structured data,” in Proceedings of the 14th International Conference on Natural Language Generation, Aberdeen, Scotland, UK, 2021, pp. 103–113, doi: https://doi.org/10.18653/v1/2021.inlg-1.11.
[8] C. Nimpattanavong, P. Taveekitworachai, I. Khan, T. V. Nguyen, R. Thawonmas, W. Choensawat, K. Sookhanaphibarn, “Am i fighting well? fighting game commentary generation with chatgpt,” in Proceedings of the 13th International Conference on Advances in Information Technology, ACM, Bangkok, Thailand, 2023, pp. 1–7, doi: https://doi.org/10.1145/3628454.3629551.
[9] M. Czaplicki, “Live commentary in a football video game generated by an ai,” Master’s thesis, University of Twente, Business IT BSc programme, Enschede, The Netherlands, July 2023. url: https://purl.utwente.nl/essays/96001.
[10] Y. Taniguchi, Y. Feng, H. Takamura, M. Okumura, “Generating live soccer-match commentary from play data,” in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA, vol. 33, 2019, pp. 7096–7103, doi: https://doi.org/10.1609/aaai.v33i01.33017096.
[11] C. Chan, C. Hui, W. Siu, S. Chan, H. A. Chan, “To start automatic commentary of soccer game with mixed spatial and temporal attention,” in TENCON 2022-2022 IEEE Region 10 Conference (TENCON), Hong Kong SAR, China, 2022, pp. 1–6, IEEE, doi: https://doi.org/10.1109/ TENCON55691.2022.9978078.
[12] E. Marrese-Taylor, Y. Hamazono, T. Ishigaki, G. Topić, Y. Miyao, I. Kobayashi, H. Takamura, “Open-domain video commentary generation,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022, pp. 7326– 7339, doi: https://doi.org/10.18653/v1/2022.emnlp-main.495.
[13] S. Ma, L. Cui, D. Dai, F. Wei, X. Sun, “Livebot: Generating live video comments based on visual and textual contexts,” in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA, vol. 33, 2019, pp. 6810–6817. url: https://arxiv.org/abs/1809.04938, doi: https://doi.org/10.1609/aaai.v33i01.33016810.
[14] H. Wu, G. J. F. Jones, F. Pitié, “Knowing where and what to write in automated live video comments: A unified multi-task approach,” in Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal QC, Canada, 2021, pp. 619–627, doi: https://doi.org/10.1145/3462244.3479942.
[15] J. Chen, J. Ding, W. Chen, Q. Jin, “Knowledge enhanced model for live video comment generation,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 2267–2272, IEEE, doi: https://doi.org 10.1109/ICME55011.2023.00387.
[16] D. Dai, “Live video comment generation based on surrounding frames and live comments,” arXiv preprint arXiv:1808.04091, 2018.
[17] J. H. Xu, Y. Cai, Z. Fang, P. Paliyawan, “Promoting mental well-being for audiences in a live-streaming game by highlight-based bullet comments,” in 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), Kyoto, Japan, 2021, pp. 383–385, IEEE, doi: https://doi.org/10.1109/GCCE53005.2021.9621853.
[18] A. Dosovitskiy, “An image is worth 16×16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[19] G. Bertasius, H. Wang, L. Torresani, “Is space-time attention all you need for video understanding?,” in Proceedings of 38th International Conference on Machine Learning, Virtual (online), vol. 2, 2021, p. 4. url: https://arxiv.org/abs/2102.05095.
[20] C. Wu, Y. Li, K. Mangalam, H. Fan, B. Xiong, J. Malik, C. Feichtenhofer, “Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 2022, pp. 13587–13597. url: https://arxiv.org/abs/2201.08383. doi: https://doi.org/10.1109/CVPR52688.2022.01322.
[21] L. Wei, L. Xie, W. Zhou, H. Li, Q. Tian, “Mvp: Multimodality-guided visual pre-training,” 2022. [Online]. Available: https://arxiv.org/abs/2203.05175.
[22] Z. Tong, Y. Song, J. Wang, L. Wang, “Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training,” Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093, 2022. url: https://arxiv.org/abs/2203.12602.
[23] M. H. Mohd Noor, S. Y. Tan, M. N. Ab Wahab, “Deep temporal conv-lstm for activity recognition,” Neural Processing Letters, vol. 54, no. 5, pp. 4027–4049, 2022. url: 10.1007/s11063-022-10799-5, doi: https://doi.org/10.1007/s11063-022-10799-5.
[24] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 4489–4497, doi: https://doi.org/10.1109/ICCV.2015.510.
[25] Y. Ding, L. L. Zhang, C. Zhang, Y. Xu, N. Shang, J. Xu, F. Yang, M. Yang, “Longrope: Extending llm context window beyond 2 million tokens,” 2024. [Online]. Available: https://arxiv.org/abs/2402.13753.
[26] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, H. Wang, “Retrieval-augmented generation for large language models: A survey,” 2024. [Online]. Available: https://arxiv.org/abs/2312.10997.
[27] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, W. Yih, “Dense passage retrieval for open-domain question answering,” 2020. [Online]. Available: https://arxiv.org/abs/2004.04906.
Downloads
Published
-
Abstract156
-
PDF54






