Beginner's Guide to Multi-Agents | Highlights and Annotations by Gistr.

Here's a holistic view of the provided content on multi-agent systems: Large Language Models (LLMs) and Their Limitations Initially, Large Language Models (LLMs) were developed using vast amounts of internet data, enabling them to perform tasks like math and reasoning effectively . However, a significant limitation of these models is their inability to access and utilize private or specific local business data . They also tend to exhibit biases present in their training data, such as favoring certain cultural contexts (e.g., "flour" over "rice" when discussing bags of goods) . This means that while LLMs are good at general knowledge and logic, they can "hallucinate" or provide irrelevant answers when specific, proprietary information is required . Retrieval Augmented Generation (RAG) Retrieval Augmented Generation (RAG) addresses the limitations of LLMs by incorporating an extra retrieval step before generation . Instead of relying solely on its pre-trained knowledge, the LLM first retrieves relevant information from a specific, user-controlled database, often stored in a vector database . This retrieved "context" (your data, not internet data) is then combined with the user's query and specific instructions, which are collectively sent to the LLM , . This process ensures that the LLM's answer is grounded in accurate, relevant data, preventing hallucinations and making it suitable for specific applications like a local grocery store's pricing . Transitioning from RAG to Agents The evolution from RAG to an "agent" involves a shift from simply providing instructions to defining roles and enabling decision-making capabilities within the LLM , . While RAG uses explicit instructions (e.g., "be brief"), an agent is given a "role" or "persona" (e.g., "eager seller") along with a backstory, encouraging it to engage more advanced skills and knowledge . This allows the agent to not just follow direct commands but to make intelligent decisions and perform actions based on its assigned role, such as proactively offering an upsell rather than just stating a price , . Multi-Agent Systems (MAS) Multi-Agent Systems involve multiple AI agents collaborating to achieve a complex goal that a single agent might struggle with . Each agent typically has a defined role, specific tools or functions it can utilize, and interacts with other agents following a structured process , . This distributed approach allows for specialized tasks to be handled by appropriate agents, enhancing efficiency and accuracy for elaborate scenarios like handling customer refund requests . Key Components of Multi-Agent Systems Multi-Agent Systems are characterized by several core components that define their structure and operation: Agents : These are the individual AI entities, each defined by a specific role (e.g., "helpful receptionist," "generous manager") and equipped with a set of functions or tools they can use to perform tasks . The role provides context, while tools allow interaction with external systems like databases or other agents . Processes : This refers to the organizational flow of how control and tasks are transferred among agents . Sequential : Agents operate one after another, like a list or loop, with one agent completing its task and handing off to the next . This is the most basic and easiest pattern to understand . Hierarchical : Modeled after company organizations, a central "manager" agent delegates tasks to other agents and consolidates their results . The manager decides which agents to engage and when, creating a tree-like structure . Graph : The most complex process, where agents can transfer control to any other agent, or even back to previous agents, forming arbitrary, non-linear workflows based on predefined business logic . Communication Mechanisms : These define how agents exchange information and context. Context Hand-off : In sequential processes, agents simply pass all accumulated context and data to the next agent in the chain . This acts like a "stateful" transfer, where all prior information is always available to the succeeding agent . Question Answering (Q&A) : Agents can directly ask questions of other agents and expect a specific answer in return, similar to function calls in programming or emailing a colleague , . Shared Message Queue : Agents can push messages to a shared queue, and other agents can "listen" to and retrieve messages relevant to their tasks . Memory Systems : Agents can store and retrieve past interactions and learnings to improve their performance over time. Short-Term Memory : Stores immediate transactional data or raw text, typically in a vector database, for quick recall within a single interaction or short sequence of tasks . Long-Term Memory : Stores evaluated reflections, quality scores, and suggestions for improvement derived from past interactions . This allows agents to "learn" and evolve, becoming more polite or efficient in future interactions, often stored in structured formats like SQL tables . Frameworks for Multi-Agent Systems Swarm : Described as an educational tool, Swarm offers a simplistic approach to defining agents, roles, and functions, primarily supporting sequential processes , . It's useful for understanding basic multi-agent design . Crew AI : A more advanced framework that supports complex agent definitions including roles, goals, and backstories, as well as hierarchical and graph-based processes , . It also facilitates sophisticated communication mechanisms like question-answering and message queues, and memory systems , . Examples and Use Cases Software Engineering (Open Hands) : A multi-agent system designed to act as a group of software engineers, handling various aspects of the software development process from chat interfaces to code editing and report generation . It employs a hierarchical process with a controller and manager, uses event streams for communication, and includes both short-term (in-memory) and long-term (file storage) memory systems . Issue Resolution : Another example involves agents working together to resolve GitHub issues, often utilizing a graph-based process where agents write reports back to a manager, similar to a Q&A communication . Legal Knowledge Graphs : Multiple specialized knowledge graphs, combined with various agents (search, supervisor, router, retrieval, indexing), can process complex legal documents . This system typically uses a graph-based process for navigation and communication through passing sub-graphs, with memory stored directly within the graph database itself . Comparison of Communication Mechanisms Feature Context Hand-off (e.g., Swarm) Question Answering (e.g., Crew AI) Message Queue (e.g., Crew AI) Mechanism Pass all accumulated context/data to the next agent Agent asks a direct question to another agent, expects specific answer Agents push messages to a shared queue; other agents listen/retrieve Data Flow Linear, sequential; knowledge accumulates downstream Point-to-point, specific request/response flow Decoupled, asynchronous; messages can be processed by multiple listeners Nature Stateful, entire context is carried forward Request-driven, expectation of an immediate answer Event-driven, broadcast-like or targeted messages Analogy Passing a detailed patient chart/notebook in shifts Making a function call or emailing a colleague for specific info , Publishing to a bulletin board where interested parties subscribe Advantage Easy to reason about, no memory needed by individual agents Familiar to developers (function calls); intuitive for human interaction Decoupling, scalability, flexible routing of information