Andrej Karpathy: Software Is Changing (Again) | Highlights and Annotations by Gistr.

so um, LLM labs like OpenAI, Gemini, Enthropic, etc.. they spend capeX to train the LLMs and this is kind of equivalent to building out a grid and then there's Opex to serve that intelligence over APIs to all of us. And this is done through metered access where we pay per million tokens or something like that and we have a lot of demands that are very utility-like demands out of this API we demand low latency high uptime, consistent quality etc. in electricity, you would have a transfer switch. so you can transfer your electricity source from like grid and solar or battery or generator. in LLM, we have maybe open router and easily switch between the different types of LLMs that exist. because the LLM are software, they don't compete for physical space. so it's okay to have basically like six electricity providers and you can switch between them, right? because they don't compete in such a direct way. and I think what's also a little fascinating and we saw this in the last few days actually a lot of the LLMs went down and people were kind of like stuck and unable to work. and uh I think it's kind of fascinating to me that when the state-of-the-art llMs go down, it's actually kind of like an intelligence brownout in the world. it's kind of like when the voltage is unreliable in the grid and uh the planet just gets dumber the more reliance we have on these models, which already is like really dramatic and i think will continue to grow. but llm's don't only have properties of utilities. i think it's also fair to say that they have some properties of fabs. and the reason for this is that the capex required for building llm is actually quite large. uh it's not just like building some uh power station or something like that, right? you're investing a huge amount of money and i think the tech tree and uh for the technology is growing quite rapidly. so we're in a world where we have sort of deep tech trees, research and development secrets that are centralizing inside the llm labs. um and but i think the analogy muddies a little bit also because as I mentioned this is software and software is a bit less defensible because it is so malleable. and so um i think it's just an interesting kind of thing to think about potentially. there's many analogy analogies you can make like a 4 nanometer process node maybe is something like a cluster with certain max flops. you can think about when you're use when you're using nvidia gpus and you're only doing the software and you're not doing the hardware. that's kind of like the fabless model. but if you're actually also building your own hardware and you're training on tpus if you're google, that's kind of like the intel model where you own your fab. so i think there's some analogies here that make sense. but actually I think the analogy that makes the most sense perhaps is that in my mind LLM have very strong kind of analogies to operating systems. uh in that this is not just electricity or water. it's not something that comes out of the tap as a commodity. uh, this is these are now increasingly complex software ecosystems right so uh, they're not just like simple commodities like electricity and it's kind of interesting to me that the ecosystem is shaping in a very similar kind of way where you have a few closed source providers like windows or Mac OS and then you have an open source alternative like Linux and I think for U Neural for LLMs as well we have a kind of a few competing closed source providers and then maybe the llama ecosystem is currently like maybe a close approximation to something that may grow into something like Linux. again,, I think it's still very early because these are just simple LLMs, but we're starting to see that these are going to get a lot more complicated. it's not just about the LLM itself. it's about all the tool use and the multiodalities and how all of that works. and so when I sort of had this realization a while back, I tried to sketch it out and it kind of seemed to me like LLMs are kind of like a new operating system, right??, so the LLM is a new kind of a computer. it's sitting, it's kind of like the cpu equivalent. uh, the context windows are kind of like the memory and then the LLM is orchestrating memory and compute, uh, for problem solving um, using all of these uh, capabilities here and so definitely if you look at it looks very much like operating system from that perspective. um,, a few more analogies. for example, if you want to download an app, say I go to vs code and I go to download, you can download vs code and you can run it on windows, linux or or Mac in the same way as you can take an LLM app like cursor and you can run it on gpt or cloud or gemini series, right? it's just a drop down. so, it's kind of like similar in that way as well. uh more analogies that I think strike me is that we're kind of like in this 1960sish era where LLM compute is still very expensive for this new kind of a computer and that forces the LLMs to be centralized in the cloud and we're all just uh, sort of thing clients that interact with it over the network and none of us have full utilization of these computers and therefore it makes sense to use time sharing where we're all just you know a dimension of the batch when they're running the computer in the cloud. and this is very much what computers used to look like at during this time. the operating systems were in the cloud. everything was streamed around and there was batching. and so the p the personal computing revolution hasn't happened yet because it's just not economical. it doesn't make sense. but I think some people are trying. and it turns out that Mac minis, for example, are a very good fit for some of the LLMs because it's all if you're doing batch one inference, this is all super memory bound. so this actually works. and uh, I think these are some early indications maybe of personal computing. uh but this hasn't really happened yet. it's not clear what this looks like. maybe some of you get to invent what what this is or how it works or uh what this should what this should be. maybe one more analogy that I'll mention is whenever I talk to chach or some LLM directly in text, I feel like I'm talking to an operating system through the terminal. like it's just it's it's text. it's direct access to the operating system. and I think a guey hasn't yet really been invented in like a general way like should chatt have a guey like different than just a tech bubbles. uh certainly some of the apps that we're going to go into in a bit have guey but there's no like guey across all the tasks if that makes sense. and uh, cursor is kind of like the thing you want instead. you don't want to just directly go to the Chash apt. And I think cursor is a very good example of an early LLM app that has a bunch of properties that I think are um, useful across all the LLM apps. so in particular, you will notice that we have a traditional interface that allows a human to go in and do all the work manually just as before. but in addition to that, we now have this LLM integration that allows us to go in bigger chunks. and so some of the properties of LLM apps that i think are shared and useful to point out.. number one, the LLMs basically do a ton of the context management. um,, number two, they orchestrate multiple calls to LLMs, right? so in the case of cursor, there's under the hood embedding models for all your files, the actual chat models, models that apply diffs to the code, and this is all orchestrated for you. a really big one that uh, I think also maybe not fully appreciated always is application specific uh-gui and the importance of it. um, because you don't just want to talk to the operating system directly in text. text is very hard to read, interpret, understand and also like you don't want to take some of these actions natively in text. so it's much better to just see a diff as like red and green change and you can see what's being added is subtracted. it's much easier to just do command y to accept or command n to reject. i shouldn't have to type it in text, right? so, a guey allows a human to audit the work of these fallible systems and to go faster. i'm going to come back to this point a little bit uh later as well. and the last kind of feature i want to point out is that there's what i call the autonomy slider. so, for example, in cursor, you can just do tap completion. you're mostly in charge. you can select a chunk of code and command k to change just that chunk of code. you can do command l to change the entire file. or you can do command i which just you know let it rip do whatever you want in the entire repo and that's the sort of full autonomy agent agentic version and so you are in charge of the autonomy slider and depending on the complexity of the task at hand you can uh tune the amount of autonomy that you're willing to give up uh for that task maybe to show one more example of a fairly successful llm app uh perplexity um it also has very similar features to what i've just pointed out to in cursor uh it packages up a lot of the information. it orchestrates multiple llMs. it's got a gui that allows you to audit some of its work. so, for example, it will site sources and you can imagine inspecting them. and it's got an autonomy slider. you can either just do a quick search or you can do research or you can do deep research and come back 10 minutes later. so, this is all just varying levels of autonomy that you give up to the tool.. so, I guess my question is I feel like a lot of software will become partially autonomous. I'm trying to think through like what does that look like? and for many of you who maintain products and services, how are you going to make your products and services partially autonomous? can an LLM see everything that a human can see? can an LLM act in all the ways that a human could act? and can humans supervise and stay in the loop of this activity? because again, these are fallible systems that aren't yet perfect. and what does a diff look like in Photoshop or something like that? you know, and also a lot of the traditional software right now, it has all these switches and all this kind of stuff that's all designed for human. all of this has to change and become accessible to LLMs. so, one thing I want to stress with a lot of these LLM apps that I'm not sure gets as much attention as it should is um we we're now kind of like cooperating with AIs and usually they are doing the generation and we as humans are doing the verification. it is in our interest to make this loop go as fast as possible. so, we're getting a lot of work done. there are two major ways that I think uh this can be done. number one, you can speed up verification a lot. um, and I think guies, for example, are extremely important to this because a guey utilizes your computer vision gpu in all of our head. reading text is effortful and it's not fun, but looking at stuff is fun and it's it's just a kind of like a highway to your brain. so, i think guies are very useful for auditing systems and visual representations in general. and number two, i would say is we have to keep the ai on the leash. we i think a lot of people are getting way over excited with ai agents and uh it's not useful to me to get a diff of 10,000 lines of code to my repo. like i have to i'm still the bottleneck, right? even though that 10,00 lines come out instantly, i have to make sure that this thing is not introducing bugs. it's just like and that it's doing the correct thing, right? and that there's no security issues and so on. so um i think that um yeah basically you we have to sort of like it's in our interest to make the the flow of these two go very very fast and we have to somehow keep the ai on the leash because it gets way too overreactive. it's uh it's kind of like this. this is how I feel when I do ai assisted coding. if I'm just bite coding everything is nice and great but if i'm actually trying to get work done it's not so great to have an overreactive uh agent doing all this kind of stuff. so this slide is not very good. i'm sorry, but i guess i'm trying to develop like many of you some ways of utilizing these agents in my coding workflow and to do ai assisted coding. and in my own work, i'm always scared to get way too big diffs. i always go in small incremental chunks. I want to make sure that everything is good. I want to spin this loop very, very fast and um, I sort of work on small chunks of single concrete thing. uh and so I think many of you probably are developing similar ways of working with the with LLMs. um,, I also saw a number of blog posts that try to develop these best practices for working with LLMs.. and here's one that I read recently and I thought was quite good. and it kind of discussed some techniques and some of them have to do with how you keep the AI on the leash. And so, as an example, if you are prompting, if your prompt is vague, then uh, the AI might not do exactly what you wanted And in that case, verification will fail. you're going to ask for something else.. if a verification fails, then you're going to start spinning. so it makes a lot more sense to spend a bit more time to be more concrete in your prompts, which increases the probability of successful verification and you can move forward. And so I think a lot of us are going to end up finding um, kind of techniques like this.. I think in my own work as well, I'm currently interested in uh, what education looks like in um, together with kind of like now that we have ai uh and LLMs, what does education look like? And I think a a large amount of thought for me goes into how we keep AI on the leash. I don't think it just works to go to chat and be like,, "hey, teach me physics." I don't think this works because the AI is like gets lost in the woods. and so for me, this is actually two separate apps. for example, there's an app for a teacher that creates courses and then there's an app that takes courses and serves them to students. and in both cases, we now have this intermediate artifact of a course that is auditable and we can make sure it's good. we can make sure it's consistent. and the AI is kept on the leash with respect to a certain syllabus, a certain like um, progression of projects and so on. and so this is one way of keeping the AI on leash and I think has a much higher likelihood of working and the AI is not getting lost in the woods.. one more kind of analogy I wanted to sort of allude to is I'm not I'm no stranger to partial autonomy and I kind of worked on this. i think, for five years, at Tesla. And this is also a partial autonomy product and shares a lot of the features like, for example, right there in the instrument panel is the GUI of the autopilot. So it's showing me what the, what the neural network sees and so on. And we have the autonomy slider where over the course of my tenure there, we did more and more autonomous tasks for the user. And maybe the story that I wanted to tell very briefly is, uh, actually the first time I drove a self-driving vehicle was in 2013. And I had a friend who worked at Whimo and uh, he offered to give me a drive around Palo Alto. I took this picture using Google Glass at the time. and many of you are so young that you might not even know what that is.. uh, but uh, yeah,, this was like all the rage at the time. And we got into this car and we went for about a 30-minute drive around Palo alto highways, uh, streets and so on. and this drive was perfect. there was zero interventions and this was 2013, which is now 12 years ago. and it kind of struck me because at the time when I had this perfect drive, this perfect demo, I felt like, wow,, self-driving is imminent because this just worked.. this is incredible.., um,, but here we are 12 years later and we are still working on autonomy. um,, we are still working on driving agents and even now we haven't actually like really solved the problem.. like you may see whimos going around and they look driverless but you know, there's still a lot of teleoperation and a lot of human in the loop of a lot of this driving. So we still haven't even like declared success but I think it's definitely like going to succeed at this point. but it just took a long time. And so I think like, like this is software is really tricky. I think in the same way that driving is tricky. And so when I see things like oh, 2025 is the year of agents, I get very concerned. And I kind of feel like What is work (thermodynamics) Definition: In thermodynamics, work is the energy transferred between a system and its surroundings when a force acts through a distance; in chemical contexts we most often mean mechanical (pressure–volume) work. , Piston example (pressure–volume work): Consider a gas in a cylinder with a movable piston. If the external pressure is p_ex and the gas volume changes by ΔV = Vf − Vi, the mechanical work done on the system is w = − p_ex ΔV. , Sign convention explained: With the IUPAC sign convention used here, w is positive when work is done on the system (compression) and negative when work is done by the system (expansion). , Reversible process: For an infinitesimal, reversible change where the external pressure equals the internal pressure at each step, work is the integral w_rev = −∫(V_i to V_f) p dV. , Isothermal ideal-gas example: For a reversible isothermal expansion/compression of an ideal gas this gives w_rev = −nRT ln(Vf / Vi). Special cases: Free expansion into vacuum has p_ex = 0, so no work is done (w = 0). At constant volume ΔV = 0, so w = 0. Path dependence and energy accounting: Work (w) and heat (q) depend on the path taken between states, whereas internal energy (U) is a state function; they combine in the first law as ΔU = q + w. , Short numerical illustration: If a gas expands from 2 L to 10 L against a constant external pressure of 1 atm, w = −1 atm × (10 − 2) L = −8 L·atm (negative sign meaning the system did 8 L·atm of work on the surroundings). Adiabatic processes: If the system is adiabatic (no heat exchange), any change in internal energy comes solely from work done on or by the system. This captures how mechanical work is defined, computed, and interpreted Short answer Rachel currently lives as a lodger in Cathy’s flat in Ashbury, occupying a small second bedroom in Cathy’s duplex rather than being a homeowner or tenant. , Her relationship with her flatmate Cathy is strained: they were casual friends from university, Cathy offered Rachel a spare room about two years ago, but Cathy’s neediness and control clash with Rachel’s behaviour (including drinking), and Cathy please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. please welcome former director of ai tesla Andre carpathy. and it turns out that Mac minis, for example, are a very good fit for some of the LLMs because it's all if you're doing batch one inference, this is all super memory bound. so this actually Andrej Karpathy: Software Is Changing (Again) Idea 1 The Evolution of Software: From Code to Prompts Gist : Explore the three distinct eras of software: Software 1.0 (traditional, explicit code), Software 2.0 (neural networks programmed by weights and data), and the emerging Software 3.0 (Large Language Models "programmed" through natural language prompts). This content would illustrate how the definition of "programming" has fundamentally changed. Benefit to the User : Helps developers, tech enthusiasts, and decision-makers understand the paradigm shift in software development, offering a crucial historical context for current AI trends and future career opportunities. Idea 2 Are LLMs the New Electricity (or Operating System)? Gist : This content would delve into the compelling analogies used to describe LLMs – from being a new utility like electricity (requiring massive capital expenditure, offering metered access, and facing "intelligence brownouts") to functioning as a complex operating system (with closed and open-source ecosystems, similar to Windows/macOS vs. Linux). Benefit to the User : Provides a comprehensive framework for understanding the economic, infrastructural, and competitive landscape of the LLM industry, helping users anticipate market dynamics and make informed strategic decisions. Idea 3 Understanding LLM's "Jagged Intelligence": Quirks and Workarounds Gist : This idea focuses on the unique limitations and "cognitive deficits" of LLMs, such as their "jagged intelligence" (superhuman in some tasks, yet fallible in others), context window limitations (analogized to amnesia), and security vulnerabilities like prompt injection. It would offer insights into managing these challenges. Benefit to the User : Equips users with practical knowledge to anticipate, identify, and mitigate common LLM pitfalls, enabling them to build more robust, reliable, and secure AI applications. Idea 4 Keeping AI on a Leash: Best Practices for Effective LLM Interaction Gist : This content would explain the crucial concept of "keeping AI on a leash" by advocating for concrete prompts, clear verification steps, and structured workflows. It would discuss how to manage the "autonomy slider" in AI-assisted tools to ensure human oversight and auditability. Benefit to the User : Offers actionable strategies for users to maintain control over AI agents, improve the reliability of LLM outputs, and integrate AI safely and efficiently into their daily workflows, particularly in coding and content creation. Idea 5 Beyond Chatbots: Designing the Next Generation of LLM Apps Gist : Explore the exciting opportunities in developing LLM-native applications that go beyond simple chat interfaces. This idea would highlight examples like AI-assisted coding tools (e.g., Cursor), the importance of application-specific GUIs for auditing, and techniques for context management and orchestrating multiple LLM calls. Benefit to the User : Inspires developers and product managers with innovative ideas for building practical and powerful LLM-powered tools, showcasing how to leverage AI for enhanced productivity and new user experiences. Idea 6 Personal AI: Is Your Next Computer an LLM? Gist : This content would delve into the speculative yet intriguing future of personal computing with LLMs. It would explore the idea of LLMs running locally on devices like Mac minis, acting as a "text operating system," and the unique "flipped" technology diffusion pattern where consumers are adopting LLMs before governments and corporations. Benefit to the User : Encourages users to envision and prepare for a future where AI is deeply integrated into personal devices and daily computing, highlighting potential opportunities for innovation and early adoption. Idea 7 Agent-Ready Docs: Preparing Your Information for LLMs Gist : This content would focus on the concept of "building for agents" by making documentation and digital information LLM-friendly. It would explain the utility of lm.txt files, markdown formats, and tools that can ingest and structure existing data (e.g., GitHub repositories) for easy LLM consumption. Benefit to the User : Provides practical advice for businesses and developers on how to make their data and documentation accessible and actionable for AI agents, unlocking new automation, integration, and knowledge management possibilities. Idea 8 From Bite-Coding to Deployment: The Real Challenges of Building with AI Gist : Using the "Menu Genen" app example, this content would differentiate between the "easy" part of rapid prototyping or "bite-coding" with LLMs (generating basic functionality) and the often-overlooked "hard" parts of real-world software development, such as authentication, payments, domain management, and deployment (DevOps). Benefit to the User : Offers a realistic perspective on building AI-powered applications, helping aspiring developers and entrepreneurs understand that while LLMs can accelerate initial coding, the broader engineering and operational challenges remain crucial for bringing products to life. Here are the core formulas and operations that define the Transformer architecture, organized by component and written compactly so you can copy them into notes or code. Input embeddings and positional encoding Token embeddings + positional encoding: where and . Sinusoidal positional encoding (element 2i and 2i+1): Linear projections for Q, K, V Project input to queries, keys, values: where (often per head). Scaled dot‑product attention Attention weights (no mask): Output of attention: Masked / causal attention Apply mask M (M_{ij}=0 for allowed, for masked) by adding before softmax: Multi‑head attention (MHA) Per‑head projection and concat: for heads Position‑wise feed‑forward network (FFN) Two linear layers with nonlinearity (GELU or ReLU): where typically and is GELU: Residual connections and layer normalization Sublayer + residual + layernorm (original Transformer ordering): LayerNorm formula (per feature dimension): where . Stacking blocks One encoder block: with residual+LN after each sublayer. Decoder block adds masked self‑attention, encoder–decoder attention, then FFN. Output probabilities and loss Final logits from decoder/encoder: Softmax to probabilities: Cross‑entropy loss (per token): (for one‑hot ground truth ) Autoregressive factorization (for generation) Joint probability of tokens: Masking in loss/decoder ensures only depends on earlier positions during training. Common training schedule (transformer‑style warmup) Inverse‑sqrt learning rate schedule with warmup: Shapes and practical notes (compact) Typical shapes: , , , concatenated heads then projected back by . Scaling by prevents softmax saturation for large dimensions. Masking uses large negative values (e.g., ) to zero out attention to masked positions in practice. Minimal end‑to‑end block (mathematical flow) Given input : Use these formulas to implement attention, multi‑head attention, transformer blocks, and the training loop. Definition A differential equation is an equation that relates a function to its derivatives — it describes how a quantity changes with respect to one or more variables. Solutions are functions that satisfy that relationship. Basic notation and types Ordinary differential equation (ODE): derivatives with respect to a single independent variable (usually or ). Partial differential equation (PDE): derivatives with respect to multiple independent variables (e.g., ). Order: highest derivative appearing (first‑order, second‑order, …). Linear vs nonlinear: linear if the unknown function and its derivatives appear linearly. Core formulas and common forms General first‑order ODE (explicit): $$ \frac{dy}{dx}=f(x,y). $$ Separable first‑order ODE: if , then $$ \frac{1}{h(y)},dy = g(x),dx \quad\Rightarrow\quad \int\frac{1}{h(y)},dy = \int g(x),dx + C. $$ Linear first‑order ODE: $$ \frac{dy}{dx}+P(x),y=Q(x). $$ Integrating factor method: and $$ \frac{d}{dx}\big(\mu y\big)=\mu Q(x) \quad\Rightarrow\quad y(x)=\frac{1}{\mu(x)}\Big(\int \mu(x)Q(x),dx + C\Big). $$ General linear nth‑order ODE with variable coefficients: $$ a_n(x)y^{(n)} + a_{n-1}(x)y^{(n-1)} + \dots + a_1(x)y' + a_0(x)y = g(x). $$ Homogeneous linear constant‑coefficient second‑order ODE: $$ y'' + ay' + by = 0. $$ Solve via characteristic equation . If roots : distinct real: , repeated ( ): , complex : . Initial( and boundary conditions Initial value problem IVP): ODE plus values at a point, e.g., , ; typically yields a unique solution under standard conditions. Boundary value problem (BVP): conditions specified at multiple points (e.g., , ). Some fundamental PDEs (examples and forms) Heat equation (diffusion): . Wave equation: . Laplace’s equation (steady state): . Simple solved examples Exponential growth/decay (separable): → . Linear first order example: . Integrating factor , so $y=\frac{1}{e^{2x}}\Big(\int e^{2x}\sin x,dx + C\Big)$ Solution methods (quick list) First order: separable, linear (integrating factor), exact equations, substitution (Bernoulli). Higher order linear ODEs: method of undetermined coefficients, variation of parameters, reduction of order, characteristic equation for constant coefficients. PDEs: separation of variables, transform methods (Fourier/Laplace), characteristics (for first‑order PDEs), numerical methods (finite difference/finite element). Numerical methods for ODEs: Euler, Runge–Kutta families. Key concepts to remember A general solution contains arbitrary constants equal to the order of the equation; initial/boundary conditions fix those constants. Existence and uniqueness theorems (e.g., Picard–Lindelöf) give conditions when IVPs have unique solutions. Linear equations permit superposition for homogeneous parts; nonlinear equations generally do not. If you want, I can (a) solve a specific differential equation step‑by‑step, (b) show how to set up an IVP/BVP for a physical problem, or (c give code examples for numerical solutions.