WTF is Artificial Intelligence Really? | Yann LeCun x Nikhil Kamath | People by WTF Ep #4

AI expert discusses its evolution from rule-based systems to modern machine learning, focusing on neural networks and self-supervised learning's role in advancements like chatbots. Future AI will leverage video/image data and architectures like JEPA for world modeling and reasoning, emphasizing open-source development and societal impact. LeCun identifies a fundamental cause for many global issues, arguing that a lack of knowledge and intelligence among humans leads to mistakes and missed opportunities for solutions. He uses climate change as an example, highlighting the need for smarter approaches to finding solutions. LeCun shares his personal journey, revealing his long-standing obsession with uncovering the mysteries of intelligence. He explains his approach to tackling this challenge by building intelligent machines, emphasizing both the scientific understanding of intelligence and the practical consequences of creating such machines for humanity. LeCun uses the analogy of blind men and an elephant to illustrate the multifaceted nature of intelligence and AI. He explains how different approaches in AI history have focused on specific aspects of intelligence, neglecting others, leading to an incomplete understanding of the field.LeCun explains two distinct branches of AI that emerged in the 1950s: one focused on problem-solving through logical reasoning and search, and the other focused on learning and mimicking biological intelligence. He highlights the limitations of the problem-solving approach and the initial setbacks of the learning approach. This segment discusses the contrasting approaches in early AI, focusing on Marvin Minsky's shift from neural nets to logic-based approaches. It explains how the limitations of perceptrons led to a renaming of neural net research to "statistical pattern recognition" and "adaptive filter theory," emphasizing the continued relevance of these concepts in modern applications like finance. LeCun describes the perceptron, a foundational AI model developed in 1957, which aimed to mimic the learning mechanism of the brain by modifying the strength of connections between simulated neurons. He explains the perceptron's simple yet innovative approach to learning through adjusting weights based on input and output, illustrating its early application in recognizing simple shapes.This segment details how a neural network distinguishes between two inputs (C and D) by assigning positive weights to pixels unique to C and negative weights to pixels unique to D, effectively discriminating between them. The historical context of AI experts mimicking biological processes in the 1950s and 60s is also highlighted. This segment explains the breakthrough of stacking multiple layers of neurons in neural networks and the role of backpropagation in training these networks. It highlights the limitations of single-layer perceptrons and the development of convolutional neural networks (convnets), inspired by the visual cortex, to overcome these limitations and improve image recognition. This segment explains the principle of supervised learning, where a system adjusts its coefficients to improve its output. It contrasts this with the limitations of the perceptron, which couldn't handle complex functions like image recognition, setting the stage for the advancements brought by neural nets and deep learning in the 1980s.This segment clarifies the hierarchical relationship between Artificial Intelligence (AI), Good Old-Fashioned AI (GOFAI), Machine Learning, and Deep Learning. It defines GOFAI as a rule-based system using logic and search, contrasting it with data-driven machine learning and its subfield, deep learning. This segment focuses on the rise of Large Language Models (LLMs) and their impressive capabilities. The speaker discusses how training LLMs on massive datasets enables them to generate human-quality text, understand grammar and syntax across multiple languages, and even answer questions. However, it also emphasizes the limitations of LLMs, particularly their lack of true understanding of the physical world and their tendency to make factual errors despite their fluency. This segment discusses the limitations of current LLMs and the future direction of AI research. The speaker argues that LLMs, while impressive in their language manipulation capabilities, lack the understanding of the physical world necessary for true intelligence. The segment proposes that the next major advancement in AI will involve developing systems that can learn from videos and images, leading to embodied intelligence capable of planning and interacting with the environment. The importance of persistent memory and a better understanding of the world is highlighted. This segment contrasts Convolutional Nets (ConvNets) and Transformers, highlighting their distinct architectural components and the crucial concept of equivariance. ConvNets exhibit translational equivariance, meaning shifting the input shifts the output proportionally, while Transformers demonstrate permutation equivariance, where input permutation results in a similar output permutation. The speaker explains how these properties are achieved and how combining these components in neural networks allows for desired functionalities.This segment provides a clear and concise explanation of the "neuron" concept in neural networks. It clarifies that these artificial neurons are not biological neurons but rather computational units that perform weighted sums of inputs and apply a threshold function to produce an output. The explanation differentiates between the basic neuron functionality and variations found in different architectures like Transformers, emphasizing the core concept of weighted sums and nonlinear activation.This segment traces the evolution of language models, starting with Claude Shannon's work on information theory and n-gram models. It explains how n-gram models, while conceptually simple, become computationally intractable with increasing context lengths. The segment then introduces the shift towards using neural networks to predict the next word in a sequence, highlighting the advantages of this approach in handling larger contexts and generating more coherent text. This segment draws a parallel between human cognitive processes (System 1 and System 2 thinking) and current AI limitations. It highlights the lack of a separate memory system in LLMs, unlike the human brain's hippocampus and cortex interaction, and explains the need for more sophisticated reasoning and planning capabilities in AI. The speaker introduces JEPA (Joint Embedding Predictive Architecture), a new approach to training AI models that learn from video data. Instead of predicting every pixel, JEPA uses encoders to create abstract representations of the video, enabling more efficient and long-term predictions. The explanation clarifies the difference between pixel-level and abstract representation prediction.This segment explores the potential long-term implications of AI capable of predicting the future. The discussion covers the possibility of creating AI with human-level intelligence within a decade, acknowledging both optimistic and realistic timelines, and emphasizing the need for new architectures beyond simply scaling existing LLMs. This segment explores how AI will reshape human intelligence, shifting focus from task execution to strategic decision-making and creative problem-solving. The speaker defines intelligence as a combination of existing skills, rapid learning, and the ability to solve novel problems ("zero-shot learning"). He argues that AI will automate many current tasks, freeing humans to concentrate on higher-level thinking and innovation, ultimately leading to increased productivity and creativity. The discussion also touches upon the accessibility of AI tools, suggesting that everyone will benefit from AI assistance, and that the future of work will involve humans guiding and directing AI systems rather than performing tasks manually. This segment discusses the future of AI development, emphasizing the need for more comprehensive and less biased datasets. It argues that AI will become a shared global infrastructure, requiring collaboration and distributed training to overcome limitations of current, primarily English-language-centric datasets.This segment focuses on the crucial role of local computing infrastructure in the future of AI, particularly in developing countries like India. It highlights the need for low-cost access to AI inference and the potential for innovation in this area, contrasting it with the current dominance of Nvidia in training infrastructure. Now the only memory that an LLM, the only two there's two types of memory that LM has. The first type is in the parameters in the coefficients that are adjusted during training. right? So they will learn something they it's not really kind of storing a piece of information.01:03:41If you train LLM on a bunch of novels, it cannot regurgitate the novels, but it will remember something about the statistics of the words in that novel and it might be able to answer questions, you know, general questions about about the story and things like this.01:03:56but you're not going to be able to regate all the words right. Um, kind of like humans, right? You read a novel, You can't, you can't remember all the words you you unless you spend a lot of efforts trying to do this.01:04:07Uh, so that's the first type of memory and then the second memory is the context, the the prompt that you type. And since the system can generate word and and and those words are or those tokens are injected in its input, it can use this as some sort of working memory, but it's a very limited, uh, form of of memory. Yan describes the difference between engineers and scientists as follows: scientists strive to understand the world, while engineers aim to create new things. He notes that these fields are interconnected, with scientific progress often reliant on technological advancements that enable data collection (e.g., telescopes leading to astronomical discoveries). His own work in AI combines both aspects: the scientific pursuit of understanding intelligence and the engineering task of building an intelligent machine. He believes that building an intelligent machine is the only way to truly uncover the mysteries of intelligence. Hi, Yan, Good morning. And you care. Thank you for doing this pleasure. The very first thing we like to do is get to know you a bit more, uh, how you came to be what you are today? Uh, could you tell us a little bit about where you were born where you grew up leading up to today? So I, I grew up near Paris. AI's multifaceted nature: AI is not easily defined, encompassing various aspects of intelligence like reasoning, learning, and perception (the "blind men and the elephant" analogy). Two main branches of AI: Early AI developed along two paths: (1) heuristic/logic-based systems focusing on problem-solving through search algorithms and rule-based systems; and (2) biologically-inspired approaches emphasizing learning through neural networks. Evolution of neural networks: Early neural networks (perceptrons) had limitations, but the development of multilayer networks and backpropagation in the 1980s overcame some of these, leading to a resurgence of interest. However, limitations in data and computing power initially hampered progress. Deep learning and self-supervised learning: Deep learning, a subset of machine learning, uses deep neural networks with multiple layers to extract complex features from data. Self-supervised learning is a crucial technique where models learn from data without explicit labels, by predicting missing parts or transforming inputs. This has been vital for advancements in natural language processing. Architectural components of neural networks: Convolutional Neural Networks (CNNs) excel at processing spatial data like images and audio, exploiting the local correlation of data. Transformers process sequential data by considering all inputs simultaneously, making them ideal for language models. Language models: Language models, initially based on statistical methods, predict the probability of the next word in a sequence. Modern large language models (LLMs) use neural networks and self-supervised learning to achieve remarkable performance. They learn from vast amounts of text data to generate human-like text. Reinforcement learning: This approach trains AI agents by rewarding desirable behaviors and penalizing undesirable ones. It's effective for training agents to play games but less so for real-world problems requiring large amounts of trial and error. The importance of data and computing power: The success of modern AI depends heavily on the availability of massive datasets and powerful computing resources. LLMs are powerful but limited: Large Language Models (LLMs) excel at manipulating language but lack true understanding of the physical world and possess limited memory. They are essentially sophisticated pattern-matching systems, not reasoning engines. The next challenge: understanding the physical world: The next major leap in AI involves creating systems that learn from videos and images, enabling them to understand and interact with the physical world. This requires architectures different from those used in LLMs. Limitations of autoregressive architectures: Autoregressive architectures, which work well for text prediction, are not suitable for predicting continuous data like video frames due to computational intractability. JEPA: A new architecture for world modeling: Joint Embedding Predictive Architecture (JEPA) offers a solution by predicting abstract representations of video data, rather than individual pixels, enabling more efficient learning and long-term prediction. The path to human-level intelligence: The path to human-level AI involves developing architectures that enable: world modeling (predicting consequences of actions), hierarchical planning (breaking down complex tasks), and persistent memory (similar to the human hippocampus). Open-source platforms and vertical applications: The future of AI is likely to be dominated by open-source platforms, creating opportunities for startups to build successful businesses by fine-tuning these models for specific vertical applications (e.g., legal, finance, healthcare, education). The future of work and human intelligence: AI will automate many tasks, freeing humans to focus on higher-level, more creative and strategic work. Human intelligence will shift towards problem-solving, decision-making, and abstract thinking. Investing in AI: Investment opportunities lie in supporting open-source platforms and companies that develop vertical applications using these platforms. The focus should be on democratizing access to AI and building robust, low-cost inference capabilities.