). Later, during an exchange with Professor Wu from Tsinghua University, he mentioned that their improved Agents using reinforcement learning (Reinforcement Learning, RL) performed better in the Werewolf game.
"Strategic Play in the Werewolf Game by Language Agents through Reinforcement Learning" (LANGUAGE AGENTS WITH REINFORCEMENT LEARNING FOR STRATEGIC PLAY IN THE WEREWOLF GAME) https://arxiv.org/pdf/2310.18940.pdf
I had previously bookmarked the paper but never had time to read it carefully until today, when I finally had the opportunity to study it thoroughly ✍️.
Abstract
This paper primarily explores how to use reinforcement learning (RL) to develop intelligent agents for the Werewolf game. Professor Wu's agent first uses LLMs to infer potential deception and generate a series of strategically diverse action plans. Then, through group-based training, an RL strategy is learned to select an action from candidate actions, enhancing the agent’s decision-making ability. By combining LLMs with RL strategies, Professor Wu's agent produces diverse strategies, achieving the highest win rate when competing against other LLM-based agents, while maintaining robustness when playing against adversarial human players in the Werewolf game.
Framework
This paper rarely considers the exploitability of their behaviors, tending to take actions with clear strategic patterns, making them easy for real human players to identify during competitions.
Professor Wu proposed a framework that combines large language models (LLMs) and reinforcement learning (RL) to build strategic language agents.
The agent uses LLMs to organize key information to infer hidden roles and generate a set of diverse action candidates. Then, through group-based training, an RL strategy is learned to produce final actions from the candidate actions, achieving strong strategic gameplay.
This LLM-based agent combined with reinforcement learning (RL), which has strategic thinking capabilities, is called a Strategic Language Agent.
Reasoning Judgment: Using LLMs to classify key information and apply reasoning judgment.
Diverse Action Generation: Prompting LLMs to provide a set of strategically diverse action candidates.
Group-based RL Training: Learning an RL strategy by playing against itself, its past versions, and a pool of agents.
Experimental Results:
: Displays the win rate comparisons between different agents.
: Bold numbers indicate that Professor Wu's agent is more robust than all weakened versions of the agents. Underlined numbers show that, in single-player evaluations, Professor Wu's agent achieved higher win rates than average human players.
: Bold numbers indicate that Professor Wu's RL strategy improves the performance of agents built on unseen LLMs.
: To intuitively demonstrate the benefits of RL training, the action distributions of agents with and without RL strategies are compared, and their behaviors are analyzed under three scenarios to showcase differences.
Comparison with Other Prompting Techniques:
Comparison with Self-play: