How to AI (Almost) Anything | Highlights and Annotations by Gistr.

Lecture 1 – Course Introduction (MIT How to AI Almost Anything, Spring 2025) TL;DR: This MIT Media Lab course, "How to AI," focuses on designing and building AI systems that interact with the real world through multiple senses to enhance human experiences, covering multi-sensory AI, large models, generative AI, and interactive agents. The Gist: Topic: An introductory course on designing and building AI systems for the real world, emphasizing multi-sensory understanding and human-AI interaction. Core Concept: The course aims to equip students with the principles to apply AI across various modalities, especially new ones, to build robust, safe, and human-centric AI systems that go beyond mimicking humans to improve collective outcomes. Key Learnings/Course Modules: AI for the World (Multi-Sensory AI): Explores AI systems that process and interact with multiple mediums beyond spoken words and typed text, including speech, vocal expressions, facial gestures, body language, robotic tactile information, and even smell (e.g., for allergy detection). Examples include AI in healthcare (X-rays, medical sensors) and robotics. , Large Models: Covers the core elements of modern AI, including pre-training, scaling, fine-tuning large language models (LLMs), and large multimodal models (e.g., vision-language, video-language). Modern Generative AI: Focuses on systems that can generate new data across various modalities, such as text, images, sensor data, videos, music, and art, often by translating between different input/output types. , Interactive AI: Delves into building AI agents capable of multi-step reasoning to solve complex problems and interact with humans through multiple actions, including embodied/tangible AI and human-AI interaction (e.g., web agents, controlling software). It also addresses safety and ethical considerations. , Grading and Assignments: 40% Reading Assignments: Involves reading assigned papers, summarizing them, relating them to broader research, and participating in group discussions. Students take on roles like "Reading Lead" (presenting summaries) and "Synopsis Lead" (reporting discussion outcomes). , 60% Research Project: Students are expected to complete a high-quality research project, often in teams, exploring new ideas in AI. This includes several checkpoints: a pre-proposal, a detailed proposal, initial implementation, a midterm report, and a final report and presentation. , , , Research Project Directions: New Modalities: Building AI systems for data types beyond language/vision/audio, such as time series, physiological sensors, tabular data, taste, art, music, smell, and tangible body systems. Reasoning & Interactive Agents: Developing AI systems that can reason robustly across multiple steps and act as interactive agents. Embodied & Tangible AI: Researching AI systems that operate quickly on physical devices and interact with the real world. Socially Intelligent AI: Building AI that understands human social interactions, relationships, and non-verbal cues. Human-AI Interaction: Creating new mediums for human-AI interaction, conveying uncertainty, and defining new collaborative tasks beyond imitation. Ethics & Safety: Addressing limitations in AI control, unsafe outputs, and developing new data or training objectives for safe deployment. Key Topics and IDs: Multi-sensory AI -> , Robotics & AI -> AI for Smell -> Generative AI -> , AI for Health & Wellness -> Interactive AI Agents -> , Large Language Models (LLMs) -> Multimodal Models -> Course Grading -> , Reading Assignments -> , Research Project -> , , , Project Timeline -> , AI Ethics & Safety ->