Andrej Karpathy: Software Is Changing (Again) | Highlights and Annotations by Gistr.

Andrej Karpathy: Software Is Changing (Again) TL;DR: Software is undergoing a fundamental shift to "Software 3.0" with Large Language Models (LLMs) acting as new programmable operating systems, ushering in natural language programming and demanding a re-evaluation of application design for effective human-AI collaboration. The Gist: Who: Andrej Karpathy, former Director of AI at Tesla. Core Concept: Software is experiencing a profound, rapid transformation, moving through three distinct paradigms: Software 1.0: Explicitly written code (e.g., C++, Python). Software 2.0: Neural networks "programmed" by data and optimizers (e.g., AlexNet, Tesla Autopilot's growing neural net stack). , Software 3.0: Large Language Models (LLMs) programmed through natural language prompts, acting as a new type of programmable computer. , How it works / LLM Characteristics & Analogies: LLMs as Fabs: Involve significant capex and deep, rapidly evolving tech trees, centralizing R&D. LLMs as Operating Systems: Represent complex software ecosystems with closed-source (e.g., OpenAI, Gemini) and open-source (e.g., Llama) alternatives, where context windows act as memory. They are currently centralized in the cloud, akin to 1960s time-sharing computers. , LLM Psychology: Superpowers: Possess encyclopedic knowledge and perfect memory, far exceeding individual human capacity (e.g., Rainman analogy). Cognitive Deficits: Prone to hallucinations, exhibit "jagged intelligence" (superhuman in some areas, basic errors in others), and suffer from "anterograde amnesia" (do not natively learn or consolidate knowledge over time; context windows are temporary working memory). , Security Risks: Are gullible and susceptible to prompt injection and data leakage. Key Learnings / Takeaways / Advice: Fluency Across Paradigms: Professionals should be fluent in Software 1.0, 2.0, and 3.0, as each has distinct advantages. Partial Autonomy Apps: Integrate LLMs into traditional interfaces (e.g., Cursor for coding, Perplexity for search). , LLM apps manage context, orchestrate multiple LLM calls, and critically, feature application-specific GUIs for human auditing and faster verification. , Incorporate an "autonomy slider" allowing users to control the level of AI assistance. Optimizing Human-AI Cooperation: AI excels at generation; humans excel at verification. Focus on speeding up the verification loop, primarily through intuitive GUIs and visual representations. , Keep the AI on a "leash" by managing output size and scope (e.g., small, incremental diffs) to ensure human auditability. , Use concrete, specific prompts to increase the likelihood of successful AI output and reduce verification cycles. Create auditable intermediate artifacts (e.g., structured course syllabi for AI teachers) to maintain control and consistency Natural Language as a Programming Interface ("Vibe Coding"): English becoming a programming language is unprecedented, dramatically lowering the barrier to entry for software development. , Enables rapid prototyping of custom applications without deep domain-specific language knowledge (e.g., building an iOS app or MenuGenen). , Highlights the current bottleneck in traditional devops and infrastructure setup, which remains a human-driven, time-consuming process. :68660bab5b48fd490c5e3552]] Building for Agents: Digital agents are a new class of consumer and manipulator of digital information. Software infrastructure needs to adapt to agents, providing , Tools that transform human-centric data (e.g., GitHub repos) into LLM-friendly formats (e.g., get_ingest , Deep Wiki) are crucial. Long-term Outlook: The full realization of AI agents will be a "decade of agents," not just a single year, requiring careful human-in-the-loop development. The current focus should be on building "Iron Man suits" (augmentations and partial autonomy products) rather than fully autonomous "Iron Man robots," with a gradual increase in autonomy over time via the "autonomy slider." , Key Topics Covered & Timestamps: Introduction to Software 3 , , LLMs as Utilities, Fabs, and Operating Systems , , LLM Psychology and Limitations , , , Opportunities: Partial Autonomy Apps , Importance of GUIs and Autonomy Slider 0:68660bab5b48fd490c5e3552]], Human-AI Cooperation and Keeping AI on a Leash , Natural Language as Programming Interface , , Building for Agents and LLM-Friendly Infrastructure , , , Long-term Vision: Iron Man Suit Analogy 4720:68660bab5b48fd490c5e3552]], acceleration And I made the observation at the time that there was a ton of C++ code around in the autopilot, which was the software 1.0 code. And then there was some neural nets in there doing image recognition. And, uh, I kind of observed that over time, as we made the autopilot better, basically, the neural network grew in capability and size. And in addition to that, all the C++ code was being deleted And kind of like was, um, and a lot of the kind of capabilities and functionality that was originally written in 1.0 was migrated to 2.0. So as an example,, a lot of the stitching up of information across images from the different cameras and across time was done by a neural network. And we were able to delete a lot of code. And so the software 2.0 stack quite literally ate through the software stack of the autopilot., So I thought this was really remarkable at the time and I think we're seeing the same thing again where uh, basically we have a new kind of software and it's eating through the stack. we have three completely different programming paradigms and I think if you're entering the industry, it's a very good idea to be fluent in all of them because they all have slight pros and cons and you may want to program some functionality in 1.0 or 2.0 or 3.0. Are you going to train Neurallet? are you going to just prompt an LLM? should this be a piece of code that's explicit, etc.. So we all have to make these decisions and actually potentially uh, fluidly trans transition between these paradigms. so what I wanted to get into now is first I want to in the first part talk about LLMs and how to kind of like think of this new paradigm and the ecosystem and what that looks like., uh, like, what are, what is this new computer? what does it look like? And what does the ecosystem look like?, um, I was struck by this quote from Anduring actually, uh, many years ago now, I think and I think Andrew is going to be speaking right after me. uh, but he said at the time AI is the new electricity and I do think that it, um, kind of captures something very interesting in that LLMs certainly feel like they have properties of utilities right now. so um, LLM labs like OpenAI, Gemini, Enthropic, etc.. they spend capeX to train the LLMs and this is kind of equivalent to building out a grid and then there's Opex to serve that intelligence over APIs to all of us. And this is done through metered access where we pay per million tokens or something like that and we have a lot of demands that are very utility-like demands out of this API we demand low latency high uptime, consistent quality etc. in electricity, you would have a transfer switch. so you can transfer your electricity source from like grid and solar or battery or generator. in LLM, we have maybe open router and easily switch between the different types of LLMs that exist. because the LLM are software, they don't compete for physical space. so it's okay to have basically like six electricity providers and you can switch between them, right? because they don't compete in such a direct way. and I think what's also a little fascinating and we saw this in the last few days actually a lot of the LLMs went down and people were kind of like stuck and unable to work. and uh I think it's kind of fascinating to me that when the state-of-the-art llMs go down, it's actually kind of like an intelligence brownout in the world. it's kind of like when the voltage is unreliable in the grid and uh the planet just gets dumber the more reliance we have on these models, which already is like really dramatic and i think will continue to grow. but llm's don't only have properties of utilities. i think it's also fair to say that they have some properties of fabs. and the reason for this is that the capex required for building llm is actually quite large. uh it's not just like building some uh power station or something like that, right? you're investing a huge amount of money and i think the tech tree and uh for the technology is growing quite rapidly. so we're in a world where we have sort of deep tech trees, research and development secrets that are centralizing inside the redoing computing all over again. and they're currently available via time sharing and distributed like a utility. what is new and unprecedented is that they're not in the hands of a few governments and corporations. they're in the hands of all of us because we all have a computer and it's all just software and chaship was beamed down to our computers like billions of people like instantly and overnight and this is insane. uh and it's kind of insane to me that this is the case and now it is our time to enter the industry and program these computers. this is crazy. so i think this is quite remarkable. before we program llms, we have to kind of like spend some time to think about what these things are. and i especially like to kind of talk about their psychology. so the way i like to think about llms is that they're kind of like people spirits. um they are stoastic simulations of people. um and the simulator in this case happens to be an auto reggressive transformer. so transformer is a neural net. uh it's and it just kind of like is goes on the level of tokens. it goes chunk chunk chunk chunk chunk. and there's an almost equal amount of compute for every single chunk. um and um this simulator of course is is just is basically there's some weights involved and we fit it to all of text that we have on the internet and so on. and you end up with this kind of a simulator and because it is trained on humans, it's got this emergent psychology that is humanlike. so the first thing you'll notice is of course uh llm have encyclopedic knowledge and memory. uh and they can remember lots of things, a lot more than any single individual human can because they read so many things. it's it actually kind of reminds me of this movie rainman, which i actually really recommend people watch. it's an amazing movie. i love this movie. um and dustin hoffman here is an autistic savant who has almost perfect memory. so, he can read a he can read like a phone book and remember all of the names and phone numbers. and i kind of feel like lm are kind of like very similar. they can remember shaw hashes and lots of different kinds of things very very easily. so they certainly have superpowers in some set in some respects. but they also have a bunch of i would say cognitive deficits. so they hallucinate quite a bit. um and they kind of make up stuff and don't have a very good uh sort of internal model of self-nowledge, not sufficient at least. and this has gotten better but not perfect. they display jagged intelligence. so they're going to be superhuman in some problems solving domains. and then they're going to make mistakes that basically no human will make. like you know they will insist that 9.11 is greater than 9.9 or that there are two rs in strawberry these are some famous examples but basically there are rough edges that you can trip on so that's kind of i think also kind of unique um they also kind of suffer from entrograde amnesia um so uh and i think i'm alluding to the fact that if you have a co--orker who joins your organization this co--orker will over time learn your organization and uh they will understand and gain like a huge amount of context on the organization and they go home and they sleep and they consolidate knowledge and they develop expertise over time. llMs don't natively do this and this is not something that has really been solved in the r&d of llm. i think um and so context windows are really kind of like working memory and you have to sort of program the working memory quite directly because they don't just kind of like, get smarter by uh, by default. and I think a lot of people get tripped up by the analogies Uh in this way. uh, in popular culture, I recommend people watch these two movies uh, momento and 51st dates. in both of these movies, the protagonists, their weights are fixed and their context windows gets wiped every single morning and it's really problematic to go to work or have relationships when this happens and this happens to all the time.. I guess one more thing I would point to is security kind of related limitations of the use of LLM. so for example,, LLMs are quite gullible., uh, they are susceptible to prompt injection risks. they might leak your data etc. and so, um, and there's many other considerations, uh, security related. so, so basically long story short, you have to load your, you have to load your, you have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues. how do we and yet they are extremely, like useful and so how do we program them and how do we work around their deficits and enjoy their superhuman powers. so what I want to switch to now is talk about the opportunities of how do we use these models and what are some of the biggest opportunities.. this is not a comprehensive list just some of the things that I thought were interesting for this talk.. the first thing I'm kind of excited about is what I would call partial autonomy apps. so for example,, let's work with the example of coding. you can certainly go to chacht directly and you can start copy pasting code around and copyping bug reports and stuff around and getting code and copy pasting everything around. why would you why would you do that? why would you go directly to the operating system? it makes a lot more sense to have an app dedicated for this. And so I think many of you uh use uh, cursor. I do as well. and uh, cursor is kind of like the thing you want instead. you don't want to just directly go to the Chash apt. And I think cursor is a very good example of an early LLM app that has a bunch of properties that I think are um, useful across all the LLM apps. so in particular, you will notice that we have a traditional interface that allows a human to go in and do all the work manually just as before. but in addition to that, we now have this LLM integration that allows us to go in bigger chunks. and so some of the properties of LLM apps that i think are shared and useful to point out.. number one, the LLMs basically do a ton of the context management. um,, number two, they orchestrate multiple calls to LLMs, right? so in the case of cursor, there's under the hood embedding models for all your files, the actual chat models, models that apply diffs to the code, and this is all orchestrated for you. a really big one that uh, I think also maybe not fully appreciated always is application specific uh-gui and the importance of it. um, because you don't just want to talk to the operating system directly in text. text is very hard to read, interpret, understand and also like you don't want to take some of these actions natively in text. so it's much better to just see a diff as like red and green change and you can see what's being added is subtracted. it's much easier to just do command y to accept or command n to reject. i shouldn't have to type it in text, right? so, a guey allows a human to audit the work of these fallible systems and to go faster. i'm going to come back to this point a little bit uh later as well. and the last kind of feature i want to point out is that there's what i call the autonomy slider. so, for example, in cursor, you can just do tap completion. you're mostly in charge. you can select a chunk of code and command k to change just that chunk of code. you can do command l to change the entire file. me what the, what the neural network sees and so on. And we have the autonomy slider where over the course of my tenure there, we did more and more autonomous tasks for the user. And maybe the story that I wanted to tell very briefly is, uh, actually the first time I drove a self-driving vehicle was in 2013. And I had a friend who worked at Whimo and uh, he offered to give me a drive around Palo Alto. I took this picture using Google Glass at the time. and many of you are so young that you might not even know what that is.. uh, but uh, yeah,, this was like all the rage at the time. And we got into this car and we went for about a 30-minute drive around Palo alto highways, uh, streets and so on. and this drive was perfect. there was zero interventions and this was 2013, which is now 12 years ago. and it kind of struck me because at the time when I had this perfect drive, this perfect demo, I felt like, wow,, self-driving is imminent because this just worked.. this is incredible.., um,, but here we are 12 years later and we are still working on autonomy. um,, we are still working on driving agents and even now we haven't actually like really solved the problem.. like you may see whimos going around and they look driverless but you know, there's still a lot of teleoperation and a lot of human in the loop of a lot of this driving. So we still haven't even like declared success but I think it's definitely like going to succeed at this point. but it just took a long time. And so I think like, like this is software is really tricky. I think in the same way that driving is tricky. And so when I see things like oh, 2025 is the year of agents, I get very concerned. And I kind of feel like you know, this is the decade of agents and this is going to be quite some time. we need humans in the loop.. we need to do this carefully.. this is software. let's be serious here.. one more kind of analogy that I always think through is the Iron Man suit.. uh, I think this is I always love Iron Man. I think it's like so, um, correct in a bunch of ways with respect to technology and how it will play out. and what I love about the Iron Man suit is that it's both an augmentation and Tony Stark can drive it and it's also an agent. and in some of the movies, the Iron Man suit is quite autonomous and can fly around and find Tony and all this kind of stuff. and so this is the autonomy slider is we can be we can build augmentations or we can build agents and we kind of want to do a bit of both. but at this stage I would say working with fallible LLms and so on. I would say you know it's less iron Man robots and more iron Man suits that you want to build. it's less like building flashy demos of autonomous agents and more building partial autonomy products. and these products have custom Gueies and UIUX. and we're trying to, um, and this is done so that the generation verification loop of the human is very, very fast. but we are not losing the sight of the fact that it is in principle possible to automate this work. and there should be an autonomy slider in your product. and you should be thinking about how you can slide that autonomy slider and make your product, uh, sort of, um, more autonomous over time. But this is kind of how I think there's lots of opportunities in these kinds of products. I The Cursor application is designed as an early example of an LLM-integrated tool specifically for coding, aiming to provide a more efficient and dedicated experience compared to using general chat interfaces directly. It blends traditional human interaction methods with advanced large language model capabilities. How Cursor Utilizes LLMs for Coding: Integrated Workflows: Cursor allows users to work in "bigger chunks" by integrating LLM capabilities directly into the coding environment. This means that while traditional manual work is still possible, the LLM assists in more substantial tasks. Automated Context Management: A key function of LLMs within Cursor is to handle a significant portion of the context management, streamlining how the application understands and responds to coding requests. Orchestration of Multiple Models: Under the hood, Cursor orchestrates various LLM calls. This includes using embedding models for processing user files, chat models for interactive conversations, and specialized models for applying code diffs. This multi-model approach allows for a comprehensive coding assistance experience. Features Offered by Cursor: Traditional and LLM-Integrated Interface: It provides a familiar interface for manual coding while simultaneously offering LLM integration to assist with more complex tasks. Application-Specific GUI: Cursor features a graphical user interface (GUI) tailored for coding. This dedicated GUI is highlighted as crucial, as it allows for more effective interaction with the "operating system" (referring to the LLM) than a generic text-based chat. Versatile LLM Compatibility: The application is designed to be flexible, allowing it to run on different LLM series, such as GPT, Claude, or Gemini[[, similar to how traditional software can run on various operating systems. The design of Cursor emphasizes creating a specialized application for coding with LLMs, which is more practical and powerful than merely copy-pasting code into a general chat interface. (1]], , , , , , , ) Would you be interested in exploring other examples of successful LLM applications and their unique features?