What are AI Agents? | Highlights and Annotations by Gistr.

This video explains the shift in AI from monolithic models to compound AI systems and then to AI agents. Compound systems use multiple components (models, databases, programs) to solve problems, improving adaptability over tuning single models. AI agents take this further by using large language models to control the logic of the compound system, enabling more complex problem-solving through reasoning, action (using external tools), and memory. The video uses the example of calculating sunscreen needs for a vacation to illustrate the power and flexibility of AI agents, contrasting them with simpler, more programmed systems. control logic of a program.. so compound AI systems,, we said most of them have programmatic control logic.. So that was something that I defined myself as the human.. now let's talk about, where do agents come in? one other way of controlling the logic of a compound AI system is to put a large language model in charge, and this is only possible because we're seeing tremendous improvements in the capabilities of reasoning of large language models.. so large language models,, you can feed them complex problems, and you can prompt them to break them down and come up with a plan on how to tackle it.. another way to think about it is,, on one end of the spectrum,, I'm telling my system to think fast,, act as programmed, and don't deviate from the instructions I've given you. and on the other end of the spectrum, you're designing your system to think slow. so, create a plan, attack each partof the plan, see where you get stuck, see if you need to readjust the plan. so I might give you a complex question, and if you would just give me the first answer that pops into your head, very likely the answer might be wrong, but you have higher chances of success if you break it down, understand where you need external help to solve some parts of the problem, and maybe take an afternoon to solve it. and when we put a LLMs in charge of the logic, this is when we're talking about an agentic approach. so let's break down the components of LLM agents. the first capability is the ability to reason, which we talked about.. so this is putting the model at the core So let's take a concrete example to illustrate this point. I want to plan a vacation for thissummer, and I want to know how many vacation days are at my disposal.. what I can do is take my query, feed that into a model thatcan generate a response. I think we can all expect that this answerwill be incorrect, because the model doesn't know who I am and does not have access to this sensitive information about me.. So models on their own could be useful for a number of tasks,, as we've seen in other videos. So they can help with summarizing documents., they can help me with creating first drafts for emails and different reports I'm trying to do.. but the magic gets unlocked when I start building systems around the model, and actually take the model and integrate them into the existing processes I have.. So, if we can be able to do that.. and there's so many other possibilities of what can do here.. so these can be APIs. basically any piece of external program you want to give your model access to.. third capability, that is the ability to access memory. and the term "memory" can mean a couple of things.. so we talked about the models thinking through the program kind of how you think out loud when you're trying to solve through a problem.. So those inner logs can be stored and can be useful to retrieve at different points in time.. But also this could be the history of conversations that you as a human had when interacting with the agent. and that would allow to make the experience much more personalized.. So the way of configuring agents, there's many are ways to approach it.. one of the more most popular ways of going about it is through something called react, which, as you can tell by the name, combines the reasoning and ACT components of LLM agents.. so let's make this very concrete.. what happens when I configure a react agent? you have your user query that gets fed into a model. so an alarm, the alarm is given a prompt. so the instructions that's given to plan. one is how many vacation days are my planning to take? and maybe that is information the system can retrieve from its memory. because I asked that question before.. two is how many hours do I plan to be in the sun?? I said, I plan to be in there a lot, so maybe that would mean looking into the weather forecast, for next month in Florida and seeing what is the average sun hours that are expected.. three is trying maybe going to a public health website to understand what is the recommended dosage of sunscreen per hour in the sun. and then for doing some math, to be able to determine how much of that sunscreen fits into two ounce bottles. so that's quite complicated. but what's really powerful here is there's so many paths that can be explored in order to solve a problem.. so this makes the system quite modular. and I can hit it with much more complex problems.. so going back to the concept of compound AI systems, compound AI systems are here to stay. what we're going to observe this year is that they're going to become more agent tech. the way I like to think about it is you have a sliding scale of AI autonomy. and you would the person defining the system would examine what trade-offs they want in terms of autonomy in the system for certain problems, especially problems that are narrow, well--defined. so you don't expect someone to ask them about the weather when they need to ask about vacations. so a narrow problem set. you can define a narrow system like this one. it's more efficient to go the programmatic route because every single query will be answered the same way. if I were to apply the genetic approach here, there might be unnecessarily looping and iteration. So for narrow problems,, pragmatic approach can be more efficient than going the generic route. but if I expect to have a system, accomplish very complex tasks, like, say,, trying to solve Github issues independently, and handle a variety of queries,, a spectrum of queries.. this is where an agent de Groot can be helpful, because it would take you too much effort to configure every single path in the system. and we're still in theearly days of what does that mean?? by the term "system", you can understand there's multiple components. so systems are inherently modular. I can have a model, I can choosebetween tuned models, large language models, image generation models, but also I have programmatic components that can come around it. so I can have output verifiers. I can have programs that can that can take a query and then break it down to increase the chances of the answer being correct.. I can combine that with searching databases.. I can combine that with different tools. so when we talking about a system approaches, I can break down what I desire myprogram to do and pick the right components to be able to solve that.. And this is inherently easier to solve for than tuning a model.. So that makes this much faster and quicker to adapt.. Okay,, so the example you can define whether. if you want to use external tools to help you come up with the solution.. once you get, you call a tool and you get an answer. maybe it gave you the wrong answer or it came up with an error. you can observe that. so the alarm would observe. the answer would determine if it does answer the question at hand, or whether it needs to iterate on the plan and tackle it differently. up until I get to a final answer. so let's go back and