smolagents - HuggingFace's NEW Agent Framework | Highlights and Annotations by Gistr.

Hugging Face's Small Agents library offers a simplified framework for building agents, addressing shortcomings of previous attempts. It emphasizes ease of use with open-source models, supports code agents interacting with tools (including custom ones), and allows for flexible agency levels. While initially limited to Hugging Face models, it now supports others like OpenAI's, showcasing improved code execution and error handling, though limitations remain. The video demonstrates its capabilities through examples and encourages viewers to explore its potential. So one of the big advantages that HuggingFace has, which they're now sort of really taking advantage of here is that they've got all these open-source models on their Hugging Face hub. And this framework allows you to use any of those models. So they've actually got it set up so that you can basically call the models on the hub. Now, you will be limited a little bit to what models you can access, depending on whether you're a pro user, like as in a paid user user or not of Hugging Face, but certainly out of the box, it seems that they're using the Quen 2.5 kod 32 billion models. So that's definitely a good model that you can use to get started with this. Although I imagine a lot of people are going to want to be able to use things like deep seat codo and stuff like that. Now you can use proprietary models in here. I will talk about that as well. If you want to use open AI, if you want to use anthropics, CLAE, etc. all those things are possible in here. So let's jump in and have a look. So first off, they start off talking about what agents are. So they talk about the whole idea of agency of the sort of ability to take action in here. And I would say that this is one of the sort of key things that we've been seeing different frameworks approach this idea of agency in different ways. So if we go right back to things like baby, AGI and auto GPT, etc. The challenge with those was that they had too much agency in that they decided everything themselves, they often didn't have tools at the start. Even when they did have tools, they were just randomly almost you going out there and taking actions. And this just meant a huge bill for the amount of tokens, etc. Then on the other hand, we have frameworks like Langra, which really try to restrict the agency of an agent so that it can only do things that are on rails. And that in 2024 showed to be the much safer way to build agents, the much more reliable way to build agents. we're now moving into the next phase as the models have gotten a lot better over the last year where people are now starting to think, okay, maybe we can give some of the agency back to these agents that we're building in here. So they've got this nice sort of description here of the agency level of what these sort of mean and then how that's called, right, whether that's basically you're giving them tool calls, so tools whether youve got multi-step agents, where they're going in a loop where you've got multiple agents going on here and you can imagine with things like multiple agents, we've seen that being done with things like crew AI with f AI, with a bunch of other things that are out there as well. And I think the big debate at the moment around agents is really, do you really even need any sort of sophisticated framework, you can basically just use something that's very light, like pantic AI to do your calls, you can write all your tools and functions and stuff like that in pure Python. And it does seem that the idea here is that small agents is trying to place itself in the middle of this, of where you can go both ways. For this. Now, they do have a graphic here of an agent running in a loop, where it's doing, executing different tools or different things at each step. And this is on the upper level of how much agency do you want to give an agent going through here. And you can see that a lot of the patterns that they us in here are very similar to things like the react pattern, which I think I made a video for almost two years ago. So, a lot of those sorts of things haven't changed. They have a whole section in their blog post about when to use agents, when to avoid agents, I certainly would concur with most of what they've got in here that for a lot of things, people do not need agents. people are making a big deal about things that can just be done as a simple workflow. If you haven't seen the recent anthropic blog post all about that, I recommend you go and read that. I've put a video all about that with some code examples and stuff on my Patreon for people who are more interested in that. But really the only times you want to use agents are when you've got some kind of dynamic decision making, that's happening in this flow. And the flow can change directions, right? It can go off in different angles and stuff like that. So the big thing that sets small agents apart from other frameworks that are out there is this whole idea of code agents is that they're really going back to the idea of getting the agents to communicate to itself in code and work things out in code. Now, they mentioned some research papers to justify this. Well, that one of the ones that I didn't see that they've included, there is actually the PAL program, AED language model. And this goes back to 2022. The latest version of the paper, I think is early 2023 But originally, this came out in 2022 I did a video about this. Again, we're coming up on close to two years ago, of the idea of getting the LM to write code, and then use that to help it. Now, that has been advanced a lot in papers like this code, ACT paper, right? the executive code, actions paper. And what that paper really showed us was that often, it's going to be better to get the model to use something like Python to execute code, and adjust any feedback with that. So this doesn't totally eliminate text and JSON, etc. You can still put those things in there, but it also allows you then to be able to pass back other objects. It, it also allows you to take advantage of a lot of these models that are being trained for coding use now. So I think this is a really interesting direction that Hug-andF-Face have gone here, and that the small agents team have gone here is looking at incorporating. So they've actually got sort of code agents and tool calling agents in here. Okay, Some other things just quickly, they really emphasized this idea of first-class support for these code agents that these are thinking in or writing their action in code. Pass in the tool, and then we can pass something in and we will get our response back. It will decide whether it needs to do multiple steps in there. It will decide what tool to call based on what tools you put in there, etc. as you go through. So, to get this going, really, you basically just need to pass in a model and some tools.