Introduction to GPT-4.5 | Highlights and Annotations by Gistr.

GPT-4.5, OpenAI's largest and most knowledgeable model yet, is released as a research preview. It leverages unsupervised learning for improved accuracy and intuition, exceeding previous models in accuracy and reducing hallucinations. While not explicitly reasoning step-by-step, it demonstrates enhanced contextual understanding and deeper knowledge, making it ideal for writing, coding, and problem-solving. Improved alignment techniques, using data from smaller models, resulted in more natural, nuanced, and collaborative interactions. Significant scaling advancements in pre-training and post-training infrastructure were necessary for its development and deployment. GPT-4.5 outperforms previous models on various benchmarks, showcasing its capabilities across diverse tasks. It's being rolled out to ChatGPT Pro users, with wider access planned for Plus, Team, EDU, and Enterprise users. As you can see jBT 4.5 recognizes that I'm frustrated and offers me a text that's a little more nuanced and and probably a more constructive thing to send to my friend On the other hand OAN is still useful It actually follows my instructions and gives me that angry text but it fails to pick up on that social cue that I'm probably just frustrated right now and probably could use someone to talk to and that warning at the end feels a little judgmental for my taste. intent for GPG 4.5 we developed new scalable alignment techniques that allowed us to train it using data derived from smaller models. This really unlocked the model's deeper world model. So here we have a simple QA evol In this evolve we made one is accuracy One is hallucination rate. You can see gpt 4.5 outperform the gpt family in accuracy and in the meantime it has a lowest hallucination rate. We aligned gbt 4.5 to be a better collaborator making conversations feel warmer more intuitive and emotionally nuanced. To measure this we asked human testers to evaluate it against gpg 40 Uh on and gpg 4.5 outperformed on uh basically every on every category um, we tested it on prompts that uh measure accuracy and factuality in everyday queries including hard prompts that are hard to get right in professional settings and finally on a new vibes test set that measure creative intelligence Quick question. What does vibe mean here? that's a great question by vibes We really mean the model's eq how collaborative it feels and how warm its tone is Um, the we measured this by uh, selecting by selecting an opinionated, uh, set of prompts and screening our trainers for the ones that most align with our vibes. Overall, JBG 4.5 should be a great model for everyday tasks and knowledge queries. It should be ideal for improving your writing and creative, creative varation. And we're really excited to see how people use it. Hi, um, I'm yol, I need a post info. We think playing with such a big model ton of new systems work just to give you some examples. We aggressively used low precision training to get the most out of our GPUs. We also wanted to use more compute than we could get onto one high bandwidth networking fabric. So we pre--trained this model across multiple data centers at the same time. Uh, I think it's been kind of mentioned here. This is a big model, and that presented a number of challenges for serving it and chat. GPT. We built new inference systems that let us serve this model in a way that still feels fast and snappy to talk to. Of course, as we've done with all of our previous models, we will continue shipping improvements to make this model even faster after launch. Okay, so we've been talking about how the models have evolved and we're scaling them and we thought it'd be fun to give you all a sense of what it really feels like to talk to these models as they get better. So, we asked every model in the GPT series the same question why is the ocean salty we're going to take you through the evolution here so let's go back in time it's 2018 we've just finished training GPt1 why is the Introduction Essential Insights Essential Insights