AI Powered Commentary and Camera Direction in E-Sports

Authors

DOI:

https://doi.org/10.9781/ijimai.2026.6566

Keywords:

AI-driven Commentary, Camera Control, Computer Vision, E- sports Analytics, Neural Architectures

Abstract

Real-time, AI-driven commentary and camera direction provide revolutionary possibilities to improve spectator engagement and comprehension of live events in the rapidly advancing world of e-sports. This paper proposes an autonomous system designed to both generate dynamic commentary as well as control the spectator camera for live-streamed e-sports matches, specifically focusing on League of Legends (LoL), a popular Multiplayer Online Battle Arena (MOBA) game. It incorporates the use of GPT-4o with Vision and OpenAI’s TTS API. Synchronization of commentary with real-time camera movements is one of the major challenges tackled. This is done using a camera tracking and scene change detection algorithm that effectively adjusts the commentary to changing scenes in real-time by utilizing computer vision techniques. Further, two neural architectures for AI-driven camera control: a 2D Convolutional-LSTM (Conv-LSTM) model that concentrates on independent spatial and temporal analysis, and a 3D CNN model that combines these features to forecast camera movements in a more comprehensive way are presented. Evaluations on fluency, relevance, and strategic depth metrics, show that our integrated system improves viewer experience by providing deep and coherent narratives that are contextually aligned with the game dynamics. The proposed models are evaluated quantitatively in capturing spectator camera movement patterns.

Downloads

Download data is not yet available.

Author Biographies

Swathi Jamjala Narayanan, Vellore Institute of Technology University

She received her Ph.D. from Vellore Institute of Technology in 2015. She is currently designated as Professor G in School of Computer Science and Engineering, VIT Vellore, India. She has 17 years of teaching experience in computer science. Her research interest includes Soft Computing, Pattern Recognition, Machine Learning and Data Mining. She has been awarded with the Best Ph.D Thesis by Computer Society of India. She is a member of the International Association of Engineers and also a lifetime member of the Computer society of India and the Soft computing research society.

Kevin Winston Joseph, Vellore Institute of Technology University

He is currently in his final year of pursuing his B. Tech in Computer Science Engineering with a specialization in Data Science at Vellore Institute of Technology, Vellore, India. His research interests include foundational and applied deep learning.

Devansh Sirohi, Vellore Institute of Technology University

He is currently in his final year of pursuing his B.Tech in Computer Science Engineering with a specialization in Data Science at Vellore Institute of Technology, Vellore, India. His current research interests are data mining, machine learning and cloud computing.

Harsh Chaudhary, Vellore Institute of Technology University

He is currently pursuing his B.Tech in Computer Science with a specialization in Information Security at Vellore Institute of Technology, India, expected to graduate in 2024. With a keen interest in generative AI, NLP, and cybersecurity, he has actively engaged in research projects and academic pursuits.

Hitesh Shivkumar, Vellore Institute of Technology University

He is currently pursuing his B.Tech in Computer Science at Vellore Institute of Technology, Vellore, India. His research areas include Generative AI, LLMs, Cloud Development, and Stochastic Modeling. He is actively
engaged in multiple research projects along with academic pursuits.

References

[1] Block, F. Haack, “esports: a new industry,” in Proceedings of 20th International Scientific Conference Globalization and its Socio-Economic Consequences, SHS Web of Conferences, Zilina, Slovak Republic, vol. 92, 2021, p. 04002, EDP Sciences, doi: https://doi.org/10.1051/shsconf/20219204002.

[2] N. Renella, M. Eger, “Towards automated video game commentary using generative ai,” in Proceedings of the AIIDE Workshop on Experimental Artificial Intelligence in Games, CEUR Workshop, Salt Lake City, Utah, USA, vol. 3626, 2023, pp. 341–350. url: https://ceur–ws.org/Vol–3626/paper7.pdf.

[3] Z. Wang, N. Yoshinaga, “Esports data-to-commentary generation on large-scale data-to-text dataset,” arXiv preprint arXiv:2212.10935, 2022.

[4] X. Qi, C. Li, Z. Liang, J. Liu, C. Zhang, Y. Wei, L. Yuan, G. Yang, L. Huang, M. Li, “Mcs: an in-battle commentary system for moba games,” in Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 2022, pp. 2962–2967. url: https://aclanthology.org/2022.coling–1.262/.

[5] O. Olarewaju, A. V. Kokkinakis, S. Demediuk, J. Robertson, I. Nölle, S. Patra, D. Slawson, A. Chitayat, A. Coates, B. Kirman, et al., “Automatic generation of text for match recaps using esport caster commentaries,” in Proceedings of International Conference of Natural Language Computing, CS IT – Computer Science Conference Proceedings, Sydney, Australia, 2020, pp. 117–131. Virtual conference, doi: https://doi.org/10.5121/CSIT.2020.101811.

[6] H. Huang, J. H. Xu, X. Ling, P. Paliyawan, “Sentence punctuation for collaborative commentary generation in esports live-streaming,” in Proceedings of IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2022, pp. 1–2. url: https://arxiv.org/abs/2110.12416.

[7] T. Ishigaki, G. Topić, Y. Hamazono, H. Noji, I. Kobayashi, Y. Miyao, H. Takamura, “Generating racing game commentary from vision, language, and structured data,” in Proceedings of the 14th International Conference on Natural Language Generation, Aberdeen, Scotland, UK, 2021, pp. 103–113, doi: https://doi.org/10.18653/v1/2021.inlg-1.11.

[8] C. Nimpattanavong, P. Taveekitworachai, I. Khan, T. V. Nguyen, R. Thawonmas, W. Choensawat, K. Sookhanaphibarn, “Am i fighting well? fighting game commentary generation with chatgpt,” in Proceedings of the 13th International Conference on Advances in Information Technology, ACM, Bangkok, Thailand, 2023, pp. 1–7, doi: https://doi.org/10.1145/3628454.3629551.

[9] M. Czaplicki, “Live commentary in a football video game generated by an ai,” Master’s thesis, University of Twente, Business IT BSc programme, Enschede, The Netherlands, July 2023. url: https://purl.utwente.nl/essays/96001.

[10] Y. Taniguchi, Y. Feng, H. Takamura, M. Okumura, “Generating live soccer-match commentary from play data,” in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA, vol. 33, 2019, pp. 7096–7103, doi: https://doi.org/10.1609/aaai.v33i01.33017096.

[11] C. Chan, C. Hui, W. Siu, S. Chan, H. A. Chan, “To start automatic commentary of soccer game with mixed spatial and temporal attention,” in TENCON 2022-2022 IEEE Region 10 Conference (TENCON), Hong Kong SAR, China, 2022, pp. 1–6, IEEE, doi: https://doi.org/10.1109/ TENCON55691.2022.9978078.

[12] E. Marrese-Taylor, Y. Hamazono, T. Ishigaki, G. Topić, Y. Miyao, I. Kobayashi, H. Takamura, “Open-domain video commentary generation,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022, pp. 7326– 7339, doi: https://doi.org/10.18653/v1/2022.emnlp-main.495.

[13] S. Ma, L. Cui, D. Dai, F. Wei, X. Sun, “Livebot: Generating live video comments based on visual and textual contexts,” in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA, vol. 33, 2019, pp. 6810–6817. url: https://arxiv.org/abs/1809.04938, doi: https://doi.org/10.1609/aaai.v33i01.33016810.

[14] H. Wu, G. J. F. Jones, F. Pitié, “Knowing where and what to write in automated live video comments: A unified multi-task approach,” in Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal QC, Canada, 2021, pp. 619–627, doi: https://doi.org/10.1145/3462244.3479942.

[15] J. Chen, J. Ding, W. Chen, Q. Jin, “Knowledge enhanced model for live video comment generation,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 2267–2272, IEEE, doi: https://doi.org 10.1109/ICME55011.2023.00387.

[16] D. Dai, “Live video comment generation based on surrounding frames and live comments,” arXiv preprint arXiv:1808.04091, 2018.

[17] J. H. Xu, Y. Cai, Z. Fang, P. Paliyawan, “Promoting mental well-being for audiences in a live-streaming game by highlight-based bullet comments,” in 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), Kyoto, Japan, 2021, pp. 383–385, IEEE, doi: https://doi.org/10.1109/GCCE53005.2021.9621853.

[18] A. Dosovitskiy, “An image is worth 16×16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

[19] G. Bertasius, H. Wang, L. Torresani, “Is space-time attention all you need for video understanding?,” in Proceedings of 38th International Conference on Machine Learning, Virtual (online), vol. 2, 2021, p. 4. url: https://arxiv.org/abs/2102.05095.

[20] C. Wu, Y. Li, K. Mangalam, H. Fan, B. Xiong, J. Malik, C. Feichtenhofer, “Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 2022, pp. 13587–13597. url: https://arxiv.org/abs/2201.08383. doi: https://doi.org/10.1109/CVPR52688.2022.01322.

[21] L. Wei, L. Xie, W. Zhou, H. Li, Q. Tian, “Mvp: Multimodality-guided visual pre-training,” 2022. [Online]. Available: https://arxiv.org/abs/2203.05175.

[22] Z. Tong, Y. Song, J. Wang, L. Wang, “Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training,” Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093, 2022. url: https://arxiv.org/abs/2203.12602.

[23] M. H. Mohd Noor, S. Y. Tan, M. N. Ab Wahab, “Deep temporal conv-lstm for activity recognition,” Neural Processing Letters, vol. 54, no. 5, pp. 4027–4049, 2022. url: 10.1007/s11063-022-10799-5, doi: https://doi.org/10.1007/s11063-022-10799-5.

[24] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 4489–4497, doi: https://doi.org/10.1109/ICCV.2015.510.

[25] Y. Ding, L. L. Zhang, C. Zhang, Y. Xu, N. Shang, J. Xu, F. Yang, M. Yang, “Longrope: Extending llm context window beyond 2 million tokens,” 2024. [Online]. Available: https://arxiv.org/abs/2402.13753.

[26] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, H. Wang, “Retrieval-augmented generation for large language models: A survey,” 2024. [Online]. Available: https://arxiv.org/abs/2312.10997.

[27] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, W. Yih, “Dense passage retrieval for open-domain question answering,” 2020. [Online]. Available: https://arxiv.org/abs/2004.04906.

Downloads

Published

2026-02-19
Metrics
Views/Downloads
  • Abstract
    156
  • PDF
    54

How to Cite

Jamjala Narayanan, S., Winston Joseph, K., Sirohi, D., Chaudhary, H., and Shivkumar, H. (2026). AI Powered Commentary and Camera Direction in E-Sports. International Journal of Interactive Multimedia and Artificial Intelligence, 9(6), 116–125. https://doi.org/10.9781/ijimai.2026.6566