Embodied LLM Agents Learn to Cooperate in Organized Teams

1Tsinghua University, 2Princeton University, 3Penn State University, 4Oregon State University

TL;DR: This paper demonstrates that a hierarchically-organized multi-LLM-agent team with a designated/elected leader has superior team efficiency, which can be further improved by the proposed dual-LLM framework Criticize-Reflect.

Emerging cooperative behaviors when Agent_3 is designated as the leader of the 3×GPT-4 team.

Abstract

Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks. LLMs thus hold tremendous potential for natural language interaction within multi-agent systems to foster cooperation. However, LLM agents tend to over-report and comply with any instruction, which may result in information redundancy and confusion in multi-agent cooperation.

Inspired by human organizations, this paper introduces a framework that imposes prompt-based organization structures on LLM agents to mitigate these problems. Through a series of experiments with embodied LLM agents, our results highlight the impact of designated leadership on team efficiency, shedding light on the leadership qualities displayed by LLM agents and their spontaneous cooperative behaviors.

Further, we harness the potential of LLMs to propose enhanced organizational prompts, via a Criticize-Reflect process, resulting in novel organization structures that reduce communication costs and enhance team efficiency.

Video


Motivation

LLMs are not explicitly designed for multi-agent cooperation:

  • LLM agents tend to send redundant messages to other agents, bringing high token costs and information redundancy;
  • Usually LLM agents tend to obey other agents' suggestions, if the agents communicate in a disordered way, there will be chaos.

Example of disorganized communication and interruption, without a designated leader. In a team of three GPT-4 agents, two agents engaged in unnecessary communication and made disordered decisions due to the lack of a predefined organization.

Drawing on prior studies in human collaboration, we design a novel framework which offers the flexibility to prompt and organize LLM agents into various team structures, facilitating versatile inter-agent communication.

Specifically, we study two research questions:

  1. What role do organizational structures play in multi-LLM-agent systems?
  2. can we optimize these organizational structures to support efficient multi-agent coordination?


Multi-LLM-agent architecture

We propose the following embodied multi-LLM-agent architecture to enable organized teams of ≥ 3 agents to communicate, plan, and act in physical/simulated environments. Borrowing insights from Co-LLM-Agents, we adopt four standard modules: Configurator, Perception Module, Memory Module, and Execution Module.

Multi-LLM-agent architecture. (a) The modules of an LLM agent and the composition of prompts. (b) Two phases in one time step: Communication Phase and Action Phase. In the Communication Phase, the agents take turns communicating by broadcasting or selecting receivers to send distinct messages. The agents can also choose to keep silent. Comm: Communication; PO: Partial Observation.

To enable organized multi-agent communication:

  • We impose an organizational structure for the agent team via prompting, i.e., including a textual description as part of the prompt for each round of communication;
  • There are two phases in one time step: the communication phase and the action phase;
  • During communication, agents take turns to communicate. An agent can choose to broadcast a message, select one recipient for a message, choose multiple recipients and send them distinct messages, or remain silent.


Criticize-Reflect framework

We introduce a dual-LLM framework to allow the multi-LLM-agent system to ponder and improve the organizational structure.

  • LLM critic: The critic takes as input the dialogue and action history of one episode. Then, the critic analyzes the input and reasons to extract and summarize the key steps that are believed to influence the performance in the episode. Also, the critic provides a textual evaluation of agents' behaviors and the ranking of their leadership.
  • LLM coordinator: The LLM coordinator takes as input the outputs of the LLM critic as well as cost metrics of previous episodes from the environment. It reflects on these data and generates new prompts based on the analysis of the past episodes.

Criticize-Reflect architecture for improving organizational structure. The red agent represents the leader in an organization. The Critic evaluates the trajectories and analyzes the agents' performance. Together with the external costs from the environment, the Coordinator proposes a new organization prompt to improve the team efficiency. The new organization prompt contains the topology, role assignment, and rules of the organization.

Visualization

Communication patterns and the corresponding organizational prompts for different team structures. (a) Team without organization prompts. (b) Team with a leader. (c) A team in the chain structure. (d) A dual-leader team. (e) A team with a dynamic leadership. (c, d, e) are proposed by LLM via Reflection. Red-robot nodes mark the lead agents, and other nodes are the followers. Edges mark the accumulated communication cost between the two nodes (darker edge means higher token costs).

Examples

Communication styles for different LLM types

Examples of communication messages when there is a designated leader. Left: messages from lead agents; Right: messages from non-lead agents. GPT-4 (upper), GPT-3.5-turbo (center), and Llama2-70B (lower) demonstrated different communication styles.

Examples of election

Examples of the election of a new leader. It takes two steps to vote and negotiate to determine the new leader in this case. Note that Agent_3 chooses not to send a message as the election is done and no more information to be shared for now. All the messages in the figure are broadcasts.

Examples of correction

Examples of correction dialogues and the corresponding thoughts. The prompt includes If the leader's instructions are not right, you can correct the leader.

Examples of human-AI collaboration

Examples of human-AI collaboration when the human player leads two GPT-4 agents (Agent_2&3).

BibTeX

@article{guo2024embodied,
      title={Embodied LLM Agents Learn to Cooperate in Organized Teams},
      author={Guo, Xudong and Huang, Kaixuan and Liu, Jiale and Fan, Wenhui and V{\'e}lez, Natalia and Wu, Qingyun and Wang, Huazheng and Griffiths, Thomas L and Wang, Mengdi},
      journal={arXiv preprint arXiv:2403.12482},
      year={2024}
    }