Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

This Lex Fridman podcast features Andrej Karpathy, discussing neural networks, their surprising capabilities, and the future of AI. Karpathy downplays the brain analogy for neural nets, emphasizing their emergent behavior from simple mathematical expressions and effective optimization. He speculates on the prevalence of extraterrestrial life and the challenges of interstellar travel. The conversation also touches on the future of AI, including the potential for AGI to solve the "puzzle" of the universe, the ethical implications of conscious AI, and the role of data in training increasingly sophisticated models. He highlights the transformer architecture's importance and discusses Tesla's approach to autonomous driving, emphasizing data-driven development and the transition to "Software 2.0." come out especially when they have a source code available, that's my favorite place to go. So like I said you're one of the greatest teachers of machine learning ai ever. Uh, from CS231n to today. What advice would you give to beginners interested in getting into machine learning? Beginners are often focused on like what to do and I think the focus should be more like how much you do So I i'm kind of like believer on a high level in this 10, 000 hours kind of concept where you just kind of have to just pick the things where you can spend time and you you care about and you're interested in you literally have to put in 10, 000 hours of work Um, it doesn't even like matter as much like where you put it and your you'll iterate and you'll improve and you'll waste some time. I don't know if there's a better way you need to put in 10, 000 hours, but I think it's actually really nice because I feel like there's some sense of determinism about uh being an expert at a thing If you spend ten thousand hours you can literally pick an arbitrary thing And I think if you spend ten thousand hours of deliberate effort and work, you actually will become an expert at it. And so, I think it's kind of like a nice thought. Um, and so, uh, basically I would focus more on, like, are you spending 10, 000 hours? That's what I focus on. So, and then thinking about what kind of mechanisms maximize your likelihood of getting to ten thousand dollars, exactly which for us, silly humans means probably forming a daily habit of like every single day actually doing the thing, whatever helps you. So I do think to a large extent is a psychological problem for yourself. Uh, one other thing that I help that I think is helpful for the psychology of it, is many times people compare themselves to others in the area. I think this is very harmful. Only compare yourself to you. from some time ago, like say a year ago. Are you better than you year ago? This is the only way to think um, and I think this then you can see your progress and it's very motivating. that's so interesting that focus on the quantity of ours because I think a lot of people, uh, in the beginner stage but actually throughout get paralyzed uh, by uh, the choice like which one do I pick this path or this path? Yeah, like they'll literally get paralyzed by like which ID to use, Well, they're worried. Yeah, they're worried about all these things but the thing is some of the you you will waste time doing something wrong. Yes, you will eventually figure out it's not right. You will accumulate scar tissue and next time you'll grow stronger because next time you'll have the scar tissue and next time you'll learn from it and now next time you come into a similar situation You'll be like, all right I messed up I've spent a lot of time working on things that never materialize into anything and I have all that scar tissue and I have some intuitions about what was useful what wasn't useful how things turned out Uh, so all those mistakes were uh, were not dead work, you know? so I just think you should just focus on working. What have you done? What have you done last week? Uh, that's a good question actually to ask for for a lot of things not just machine learning Um, it's a good way to cut the the I forgot what the term will use but the fluff the blubber Whatever the uh, the inefficiencies in life. Uh, what do you love about teaching? You seem to find yourself often in the like drawn to teaching you're very good at it but you're also drawn to it I mean, I don't think I love teaching I love happy humans and happy humans like when I teach yes i I wouldn't say I hate teaching I tolerate teaching but it's not like the act of teaching that i like it's it's that um you know i i have some i have something I'm actually okay at it yes I'm okay at teaching and people appreciate it a lot Yeah, and uh so i'm just happy to try to be helpful and uh teaching itself is not like the most I mean, it's really it can be really annoying frustrating. I was working on a bunch of lectures just now. I was reminded back to my days of 231 and just how much work it is to create some of these materials and make them good, the amount of iteration and thought and you go down blind alleys and just how much you change it. So creating something good, um, in terms of like, educational value is really hard and, uh, it's not fun. It's difficult. So for people should definitely go watch your new stuff. you put out. there are lectures where you're actually building the thing like from, like you said, the coldest truth. So discussing back propagation by building it by looking through and just the whole thing. So how difficult is that to prepare for? I think that's a really powerful way to teach. How did you have to prepare for that? Or are you just live thinking through it? I will typically do like, say three takes. And then I take like the, the better take. Uh, so I do multiple takes and I take some of the better takes and then I just build out a lecture that way, uh, sometimes I have to delete 30 minutes of content because it just went down the nelly that I didn't like too much. There's about a bunch of iteration and it probably takes me, you know, somewhere around 10 hours to create one hour of content to give one hour. It's interesting. I mean, is it difficult to go back to the like, the basics? Do you draw a lot of like wisdom from going back to the basics? Yeah, going back to back propagation loss functions where they come from. And one thing I like about teaching a lot, honestly, is it definitely strengthens your understanding. Uh, so it's not a purely altruistic activity. It's a way to learn if you have to explain something to someone, uh, you realize you have gaps in knowledge, uh, and so i even surprised myself in those lectures like also the result will obviously look at this and then the result doesn't look like it and i'm like okay i thought i understood this yeah but that's why it's really cool to literally code you run it in a notebook and it gives you a result and you're like oh wow and like actual numbers actual input act you know actual code yeah it's not mathematical symbols etc. the source of truth is the code it's not slides. it's just like let's build it it's beautiful you're a rare human in that sense uh what advice would you give to researchers uh trying to develop and publish idea that have a big impact in the world of ai so maybe um undergrads maybe early graduate students yep i mean i would say like they definitely have to be a little bit more strategic than I had to be as a Phd student. Because of the way AI is evolving, it's going the way of physics where, you know, in physics, you used to be able to do experiments on your benchtop and everything was great and you could make progress and now you have to work in like LHC or like CERN And, and so AI is going in that direction as well. Um, so there's certain kinds of things that's just not possible to do on the bench top anymore. And uh, I think, um, that didn't used to be the case at the time. Do you still think that there's like GAN type papers to be written where, like, uh, like, very simple idea that requires just one computer to illustrate a simple example. I mean, one example that's been very influential recently is diffusion models. diffusion models are amazing. The fusion models are six years old for the longest time. people were kind of ignoring them as far as I can tell. and they're an amazing generative model, especially in, uh, in images and so stable diffusion and so on. it's all diffusion based. The fusion is new. It was not there and came from, well, it came from Google, but a researcher could have come up with it. In fact, some of the first actually no, those came from Google as well, but a researcher could come up with that in an academic institution. Yeah, what do you find most fascinating about diffusion models? So from the societal impact to the, the technical architecture, What I like about the fusion is it works so well, is that surprising to you the amount of the variety, almost the novelty of the synthetic data is generating? Yeah, so the stable diffusion images are incredible. It's the speed of improvement in generating images has been insane. Uh, we went very quickly from generating like tiny digits to the tiny faces and it all looked messed up And now we have stable diffusion and that happened very quickly. There's a lot that academia can still contribute. You know, for example, um, flash attention is a very efficient kernel for running the attention operation inside the transformer that came from academic environment. It's a very clever way to structure the kernel. Uh, that that's the calculation. So it doesn't materialize the attention matrix. Um, and so there's I think there's still like lots of things to contribute, but you have to be just more strategic. Do you think neural networks could be made to reason? Uh, yes, Do you think they're already reason? Yes. What's your definition of reasoning? Uh, information processing so in a way that humans think through a problem and come up with novel ideas it it feels like a reasoning Yeah, so the the novelty I don't want to say but out of distribution ideas you think it's possible yes and I think we're seeing that already in the current neural nets you're able to remix the training set information into true generalization in some sense that doesn't appear it doesn't matter like you're doing something interesting algorithmically you're manipulating you know some symbols and you're coming up with some correct a unique answer