Agentic AI Summit - Frontier Stage, Morning Sessions | Highlights and Annotations by Gistr.

Agentic AI Summit - Frontier Stage, Morning Sessions Here are the core concepts and their brief explanations from the provided content: Agentic AI Summit Overview The Agentic AI Summit is introduced as a gathering featuring industry leaders from top-tier AI companies. The program includes five main sessions (Agent Architecture and Systems, AI for Science, Agent Applications, AI Security and Alignment, Foundations of Agents), along with lightning talks, poster sessions, and an awards ceremony. Natural Language Interaction Protocol (NLIP) NLIP is presented as a universal, open standard protocol based on natural language, designed to enable AI agents to communicate with each other and with humans, similar to how HTTP unified the web. It is structure-independent, using generative AI to translate between different message formats, and supports multi-turn requests to refine meaning. Enterprise AI System Challenges and the BAR Theorem This concept addresses the complexities of deploying multi-agent systems in enterprise environments, including managing diverse systems, ensuring security, and controlling costs. The "BAR theorem" is introduced, proposing that any generative AI or multi-agent system can prioritize at most two of three aspects: Budget (latency/compute cost), Authenticity (accuracy/hallucination prevention), and Reasoning (human-like thought chains). Master Gen: Multi-Agent Scaling System Master Gen is a new open-source project designed as a multi-agent scaling system. It orchestrates diverse agents in parallel, allowing them to share intelligence and refine each other's work through a shared collaboration hub, aiming to improve quality and integrate multiple ideas. Agency: Open Infrastructure for the Internet of Agents Agency (AG NTCY) is an open-source collective building an open, interoperable infrastructure for the "Internet of Agents." It aims to enable secure and reliable communication between non-deterministic, probabilistic AI agents across different frameworks and platforms, addressing challenges of discovery, identity, and observability. GPU-Accelerated Agent Workflows with Nemo Agent Toolkit Nemo Agent Toolkit is an open-source framework designed to connect existing agent workflows using "universal descriptors," which convert tools, agents, or memory into function calls. This enables GPU-accelerated multi-agent systems, with components like vector databases, LLMs, and embedding models running on GPUs for real-time performance. Automating Industrial SOPs with Event Agents and SOP Bench This concept focuses on building multi-agent frameworks to automate complex, ambiguous industrial Standard Operating Procedures (SOPs). A new benchmark, SOP Bench, is introduced, consisting of real-world industry use cases with long instructions, mock tool APIs, and human-validated datasets, to evaluate and improve agent performance beyond simple tasks. SN40L Chip for Agentic AI Acceleration The SN40L chip is presented as a full-stack solution for serving agentic AI, utilizing a data flow architecture where operations are mapped onto computing units to allow data to flow continuously. This design aims to provide high compute density and efficiency, enabling multiple large models to run on a single rack with low switching latency. Agent Declarative Language (ADL) for Chatbots This proposes the development of an Agent Declarative Language (ADL) to simplify the creation of customer service chatbots. ADL would allow developers to describe agent capabilities and relationships in a pure, agent-centric file, separate from implementation details, making development easier, enabling multi-vendor solutions, and simplifying maintenance. Evaluating AI Agents in Healthcare This concept addresses the critical need for robust evaluation of AI agents in healthcare scenarios. It highlights challenges in creating realistic scenarios, defining appropriate metrics (beyond traditional scores to task-specific rubrics and component-level accuracies), and ensuring safety (preventing PII leakage, tool misuse, and medical misinformation). AI Agents for Scientific Discovery This involves automating the scientific research process using AI agents. It includes assembling datasets to measure AI scientist performance (e.g., Labbench), building specialized tools for agents (e.g., Paper QA2 for literature search, Finch for data analysis), training them, and validating their discoveries to accelerate scientific breakthroughs. AI Agents for Biomedical Innovation This focuses on building highly specialized AI agents for programming genomes and cells to treat diseases. The approach involves developing biological-aware models, creating specific benchmarks (e.g., Genome Bench), and building agents (e.g., CRISPR GBT, Stem Cell GBT) that can interact with human scientists to design experiments and analyze data. Physics-Guided AI for Scientific Agents This research aims to ground foundation models for physical science by combining machine learning with physical laws. It involves building fast emulators to speed up complex simulations (e.g., climate models), and developing adaptive agents that learn when and how to use these expensive tools, providing physically consistent and scientifically valid results. Multi-Agent Text-to-SQL for Driving Scenarios This describes a multi-agent AI system that translates natural language text queries into SQL code to identify complex driving scenarios from logs. The system breaks down queries into robot and pedestrian-relevant features, combines them, and generates SQL, aiming for increased developer productivity, executability, correctness, and interpretability. Edge-Extended Agentic AI for IT Support This proposes a multi-agent framework that operates on both edge and cloud devices to provide autonomous IT customer support. It features distributed agent registries, diagnostic and remediation tools (via MCP), adaptive memory for personalization, and local/global rag agents to offer offline support and manage IT ecosystems. Verifiable Agency for Accountable AI This concept addresses the "accountable singularity" challenge, where the complexity of AI agents makes traditional oversight unviable. It proposes a paradigm shift to "verifiable agency," based on immutable reasoning (decentralized ledger), cryptographic proof of compliance (zero-knowledge proofs), and automated consequence (trustless enforcement mechanisms) to ensure accountability. Agentic Debugging for LLMs This involves developing frameworks for debugging LLMs, especially in complex agentic tasks. The "Trajune framework" dynamically analyzes execution traces to identify errors like hallucination or incorrect tool calls, and provides revised prompts. It also includes a prompt validator for static error checking, aiming to reduce hallucinations and optimize tool use. Small Vision Language Models for Agents This concept advocates for the development of open-source, multimodal, and small vision language models (VLMs) for real-world agent applications. Small models are particularly useful for highly data-private domains or edge deployments (robotics, IoT), addressing challenges in tool usage, reasoning, and multi-agent collaboration while being more efficient. 4K Agent: Image Super-Resolution The 4K Agent is a computer vision agent designed as a generalist system for upscaling and restoring any image to 4K resolution. It uses a multi-agent system with a profile module, a perception agent to analyze distortions and create a customized restoration plan, and a restoration agent that executes the plan using various tools. RLHF Framework for LLMs This introduces a flexible and efficient Reinforcement Learning from Human Feedback (RLHF) framework for fine-tuning Large Language Models (LLMs). The framework supports various algorithms and architectures, improving training and inference efficiency, and enabling features like server-based asynchronous rollout, multi-turn conversation, and tool usage for large-scale models. System Security Principles for AI Agents This concept emphasizes leveraging foundational system security principles to ensure strong security, privacy, and safety guarantees for AI agents. It argues that integrating GenAI into conventional software stacks allows for reliable control of success and failure modes, complementing GenAI's inherent robustness limitations and preventing system crashes. Agentic Code Generation for Hardware Acceleration This addresses the challenge of LLMs writing optimized code for hardware acceleration (e.g., NPUs, GPUs). It highlights that LLMs are not inherently good at optimization, requiring agentic solutions. A benchmark called "NPU eval" is introduced to measure and improve how agents write specialized code for hardware. Speeding Up LLM Reasoning This research explores whether LLMs can learn to perform similar tasks faster with repeated exposure, mimicking human cognitive processes. It proposes adding memory mechanisms and adaptive reasoning methods for LLMs. Benchmarking shows that LLMs can indeed achieve significant cost reduction when answering more similar questions over time. GISTR generated Core Concepts Agentic AI Summit - Frontier Stage, Morning Sessions TL;DR: The Agentic AI Summit's Frontier Stage brought together industry and research leaders to discuss the latest in multi-agent AI systems, covering new communication protocols, scalable architectures, evaluation methodologies, and diverse applications from scientific discovery to enterprise IT. The Gist: Topic: Agentic AI Summit - Frontier Stage Core Concept: This session featured presentations from leading AI researchers and industry professionals on the cutting edge of Agentic AI. The discussions centered on addressing the challenges of building, deploying, and scaling multi-agent systems, emphasizing the need for robust communication, efficient infrastructure, reliable evaluation, and practical applications across various domains. Key Approaches & Insights: Standardizing Agent Communication: Natural Language Interaction Protocol (NLIP) by IBM: Proposed as a universal, open standard for AI agents to communicate using natural language, similar to how HTTP unified the web. It uses generative AI for structure-independent communication, allowing , , , 9600:689585dd870a093e0d7286d3]], Enterprise Multi-Agent System Design: Nutanix AI's "BAR Theorem": Highlights the trade-offs in designing multi-agent systems across three dimensions: Budget (latency & compute), Authenticity (retrieval accuracy & hallucination), and Reasoning (human-like thought chains). Emphasizes the need for consistent, cost-effective, and secure enterprise-grade architectures, often leveraging open-source tools like PostgreSQL with PG vector. , , , Scaling Collaborative AI: MasterGen by Google Demand: An open-source multi-agent scaling system that runs diverse agents in parallel, sharing intelligence and refining each other's work through a shared collaboration hub. It allows mixing different models and configurations to improve quality and reduce latency, mimicking human collaborative reasoning. , , Open Infrastructure for Agents: Agency by Cisco Outshift (Linux Foundation Project): A collective building an open, interoperable "Internet of Agents" to enable secure and reliable communication across different agent frameworks (e.g., Lang , , , , GPU-Accelerated Agent Workflows: NVIDIA's Nemo Agent Toolkit: An open-source toolkit designed to connect existing agent workflows by converting tools, agents, and memory into universal function calls. It enables GPU-accelerated multi-agent frameworks , 9440:689585dd870a093e0d7286d3]], Automating Complex Industrial Processes: Amazon's SOPBench: Focuses on building multi-agent frameworks to automate complex, ambiguous industrial Standard , , Hardware for Agentic AI: SambaNova Systems' SN40L Chip: A full-stack solution designed for serving agentic AI, emphasizing high-speed token generation. Its data flow architecture and multi-tier memory (HBM, DDR) enable high compute density and efficiency, supporting multiple large models on a single rack with low switching latency, improving memory, power, and compute efficiency. , , , Declarative Languages for Agents: Agent Declarative Language (ADL) by UCSB: Proposes a declarative language for defining agents and their relationships, particularly for customer service chatbots. This approach aims to decouple , , 80:689585dd870a093e0d7286d3]] Evaluating Healthcare AI Agents: Innovacer's Evaluation Framework: Discusses the critical need for evaluating AI agents in healthcare, focusing on three challenges: generating realistic scenarios (using grounded simulations), defining , , , Automating Scientific Research: Future House's AI Scientists: Focuses on automating scientific discovery by building AI agents that can assemble datasets (e.g., Labbench for biology), build tools (e.g., Paper QA for literature search, Finch for data analysis), train agents, and validate discoveries , , , , , , , :689585dd870a093e0d7286d3]] Gene Editing & Cell Programming Agents: Stanford University's Specialized Agents: Develops highly specialized AI agents for programming genomes and cells, aiming to cure diseases. Their approach includes building biological-aware foundation models, establishing complex scientific benchmarks (e.g., GenomeBench), and designing reinforcement learning approaches (RL Router) to route challenging questions to the best models. They also explore self-evolving agents (Stella) and real-lab integration using VLM. , , , , , , , , Grounding Foundation Models for Physical Science: UC San Diego's Physics-Guided AI: Addresses the challenge of foundation models making "silly mistakes" due to a lack of physical reasoning. Their research integrates physical principles with deep learning to develop trustworthy AI for science, building , 86000:689585dd870a093e0d7286d3]], , , , Lightning Talks (Key Topics Covered): Zooks: Text-to-SQL for complex driving scenario search using multi-agent collaboration. Amazon: Edge-extended Agentic AI for enterprise IT support with a multi-agent framework running on edge and cloud devices. IBM: "Accountable Singularity" and "Verifiable Agency" for AI agents, using decentralized ledgers and zero-knowledge proofs for intrinsic cryptographic self-attestation. IBM Research: Agentic debugging frameworks for LLMs, analyzing execution traces to reduce hallucinations, token usage, and tool Virginia Tech: The need for small Vision Language Models (VLMs) for data-private or edge deployments, and methods to enhance their capabilities. Texas A&M: 4K Agent, a computer vision agent using a multi-agent system for generalist image super-resolution, enhancing various image types. Amazon: A flexible and efficient Reinforcement Learning (RL) framework for LLMs (RLHF), applied to Amazon's AI-powered shopping assistant, Rufus. :689585dd870a093e0d7286d3]] Washington University: Leveraging foundational system security principles to provide strong security, privacy, and safety guarantees for AI agents despite their inherent limitations. AMD Research: Agentic solutions for optimizing code for hardware acceleration (NPUs, GPUs), including the NPU Eval benchmark for evaluating LLM performance in this domain. Emory University: Research on "speeding up LLMs" by enabling them to learn and reduce reasoning costs for similar tasks, mimicking human cognitive processes through memory mechanisms and adaptive reasoning. GISTR generated summary This Agentic AI Summit features discussions on the advancements and future of AI agents. Here's a summary of the key topics: YouTube generated summary Natural Language Interaction Protocol (NLIP) ( 25:06 ): Ranjan Sinha from IBM Research introduces NLIP, a universal application-level protocol based on natural language, designed to enable AI agents to communicate with each other and humans more flexibly, similar to how HTTP unified the web. The BAR Theorem for Enterprise AI Systems ( 42:19 ): Pushkar Nadkarni from Nutanix AI discusses the "BAR theorem" (Budget, Authenticity, Reasoning), a conjecture for designing distributed agentic systems in enterprise settings, highlighting trade-offs and the importance of consistent architecture. Multi-Agent Scaling (MG) ( 57:26 ): Chuan Li from Google presents MG, a new open-source project for scaling multi-agent systems. It allows diverse agents to run in parallel, share intelligence, and refine each other's work to achieve consensus. Open Infrastructure for the Internet of Agents ( 1:15:45 ): PV Melo from Cisco discusses "Agency," an open-source initiative to build an interoperable internet of agents with common standards for agent description, discovery, communication, and identity management. GPU-Accelerated Agent Workflows ( 1:27:19 ): Jay from Nvidia talks about the Nemo Agent Toolkit, an open-source framework for building multi-agent workflows, leveraging Nvidia GPUs for real-time performance in areas like RAG (Retrieval-Augmented Generation). Evaluating AI Agents for Healthcare ( 1:52:41 ): Tabas from Innova discusses evaluating AI agents for healthcare scenarios, emphasizing the need for robust metrics and stress testing due to the high responsibility in this domain. Automating Scientific Research with AI Agents ( 2:01:16 ): Samuel Rod from Future House presents their work on automating scientific discovery using AI agents for tasks like literature search, data analysis, hypothesis generation, and experimental design. Grounding Foundation Models for Physical Science ( 2:25:06 ): Rosie Zhao from UC San Diego discusses grounding foundation models with physical reasoning to improve their accuracy and scientific validity, especially for complex tasks in physical sciences like weather forecasting. Short Talks (Lighting Talks) ( 2:37:17 ): Various speakers cover topics such as using multi-agent frameworks for text-to-SQL in finding complex driving scenarios, building AI support engineers for enterprise IT, debugging agentic workflows, small vision-language models for edge deployments, and using multi-agent systems for image super-resolution. Here are the key takeaways from the summit: YouTube generated key takeaways Standardization and Interoperability are Crucial : There's a strong push for open, universal protocols like NLIP and "Agency" to enable seamless communication and collaboration between diverse AI agents, much like HTTP did for the internet ( 26:25 , 1:16:10 ). Multi-Agent Systems are the Future : Complex problems are best solved by multiple AI agents working together, sharing information, and refining each other's work, rather than relying on a single large model ( 59:12 , 1:17:15 ). This also improves reasoning and reduces hallucinations. Enterprise Adoption Faces Challenges : Integrating AI agents into businesses requires addressing issues beyond just technology, including cost management, security, legal compliance, and upskilling human teams ( 44:15 , 52:44 ). Evaluation and Safety are Paramount : As agents become more autonomous, rigorous evaluation methods and robust safety measures (especially in sensitive domains like healthcare) are essential to ensure reliability, prevent misuse, and build trust ( 1:53:46 , 2:59:01 ). AI Agents are Expanding Capabilities : Agents are being developed to automate highly specialized and complex tasks, from scientific research and data analysis to customer service and healthcare applications ( 2:01:16 , 2:36:53 ).