This segment uses a relatable analogy of a manager assigning a task to colleagues (one creating content, the other reviewing) to explain the roles of encoder and decoder models in generative AI. It clarifies how these models work together, highlighting the strengths and limitations of each type and providing examples of models that fit each role (e.g., BART, T5). This makes complex concepts easily understandable. This segment explores the diverse applications of generative AI beyond text generation. It introduces the concept of embeddings as a way to pass data between models and discusses image generation, editing, and in-painting using models like Stable Diffusion. The examples provided showcase the versatility and potential of generative AI in various creative and practical applications. This Microsoft/GiCAL Generative AI series introduces foundational models & LLMs. LLMs are a subset of foundation models, differing in architecture, training data, and use cases. Open-source (e.g., Llama) and proprietary (e.g., GPT) models are discussed, along with embeddings and their applications (image generation, text generation). Deployment options include cloud services (Azure), prompt engineering, retrieval-augmented generation, fine-tuning, and training custom models. The optimal approach depends on resources and specific needs. This segment clearly explains the distinctions between foundation models and Large Language Models (LLMs), highlighting that while all LLMs are foundation models, not all foundation models are LLMs. It uses examples like GPT-3.5 to illustrate the concept and discusses open-source versus proprietary models, providing a foundational understanding for working with these powerful AI tools. Foundation Models vs. LLMs: Foundation models are large, unsupervised/self-supervised models trained on massive datasets, serving as a base for specialized models. LLMs are a type of foundation model focused on language tasks. Open Source vs. Proprietary LLMs: Open-source models offer flexibility and customization but may have older architectures. Proprietary models are maintained by companies, offering ease of use but potentially limiting customization and analysis. LLM Components and Applications: LLMs can be used for various tasks including text summarization, translation, image generation (e.g., DALL-E, Stable Diffusion), text and code generation (e.g., GitHub Copilot), and more. Embeddings are used to represent data for input into other models (surrogate models). Encoder-Decoder Models: Some LLMs function as encoders (reviewing/analyzing content) while others are decoders (generating content). Combining both allows for both creative generation and quality review. Azure Machine Learning for LLM Deployment: Azure provides a platform for managing the entire machine learning lifecycle, including finding, testing, fine-tuning, and deploying LLMs. It offers a catalog of foundation models and tools for model evaluation and deployment (real-time or batch). Deployment Approaches: Four main approaches for deploying LLMs are presented: Prompt Engineering: Using zero-shot, one-shot, or few-shot learning with varying levels of context in prompts. Retrieval Augmented Generation (RAG): Enhancing LLM performance by incorporating external data via vector databases. Fine-tuning: Customizing pre-trained models for specific tasks using custom training data. Training from Scratch: Building LLMs from the ground up, requiring significant resources and expertise. Only suitable for very specific use cases. Responsible AI Practices: The importance of responsible AI practices, transparency about model limitations, and ethical considerations in training and deployment is emphasized.