Reinforcement Learning in Third-Person Environments: A Hybrid Q-Learning and Sensorimotor Approach

  • Unique Paper ID: 179239
  • Volume: 11
  • Issue: 12
  • PageNo: 8136-8140
  • Abstract:
  • The area of reinforcement learning (RL) has evolved significantly, particularly in first-person games and simulation environments like VizDoom, in which agents can be trained directly from raw pixels. Yet transferring RL to third-person action games, like Devil May Cry 3, presents additional challenges such as partial observability, high-dimensional pixel data, and the need for stylistic fighting behavior. This work introduces a learning framework that, in contrast to earlier methods dependent on raw sensor feedback, uses learned latent dynamics (world models) to model game worlds and train policies in a dense, imagination-based representation space. A convolutional neural network processes gameplay frames to obtain low- dimensional latent vectors describing key environmental features, including opponent positions and player positions. To predict future states and rewards, the hidden representations are then passed to a Recurrent State-Space Model (RSSM). In this hidden space, an actor-critic reinforcement learning agent decides the optimal action by finding a compromise between short-term tactics and long-term planning. As compared to end-to-end pixel- based learning, this process vastly improves training efficiency while allowing adaptive play. The reward system is optimized not only for success, but also for flair—obtaining high-ranked performances (e.g., S-rank) by means of combo diversity and spatial awareness. A modular training paradigm is employed, partitioning combat, exploration, and boss battles into separate learning tasks, enabling targeted optimization and later integration. This method shows enhanced agent performance, flexibility, and potential generalization to other visually rich, multi-objective tasks. The results are used to advance AI technology towards real- time decision making in visually dense settings with changing objectives.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{179239,
        author = {Harsh Mittal and Kumar Namah and Ashmit Pandey},
        title = {Reinforcement Learning in Third-Person Environments: A Hybrid Q-Learning and Sensorimotor Approach},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {12},
        pages = {8136-8140},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=179239},
        abstract = {The area of reinforcement learning (RL) has evolved significantly, particularly in first-person games and simulation environments like VizDoom, in which agents can be trained directly from raw pixels. Yet transferring RL to third-person action games, like Devil May Cry 3, presents additional challenges such as partial observability, high-dimensional pixel data, and the need for stylistic fighting behavior. This work introduces a learning framework that, in contrast to earlier methods dependent on raw sensor feedback, uses learned latent dynamics (world models) to model game worlds and train policies in a dense, imagination-based representation space. A convolutional neural network processes gameplay frames to obtain low- dimensional latent vectors describing key environmental features, including opponent positions and player positions. To predict future states and rewards, the hidden representations are then passed to a Recurrent State-Space Model (RSSM). In this hidden space, an actor-critic reinforcement learning agent decides the optimal action by finding a compromise between short-term tactics and long-term planning. As compared to end-to-end pixel- based learning, this process vastly improves training efficiency while allowing adaptive play. The reward system is optimized not only for success, but also for flair—obtaining high-ranked performances (e.g., S-rank) by means of combo diversity and spatial awareness. A modular training paradigm is employed, partitioning combat, exploration, and boss battles into separate learning tasks, enabling targeted optimization and later integration. This method shows enhanced agent performance, flexibility, and potential generalization to other visually rich, multi-objective tasks. The results are used to advance AI technology towards real- time decision making in visually dense settings with changing objectives.},
        keywords = {Reinforcement Learning, Q-Learning, Sensorimotor Control, Devil May Cry 3, Game AI, Style Evaluation, Vision-Based Learning, Modular Training},
        month = {May},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 12
  • PageNo: 8136-8140

Reinforcement Learning in Third-Person Environments: A Hybrid Q-Learning and Sensorimotor Approach

Related Articles