Hypothesis testing (ALL YOU NEED TO KNOW!) | Highlights and Annotations by Gistr.

This segment clearly defines the null and alternate hypotheses within the context of a practical example (surgical treatment outcomes). It emphasizes the statistician's pessimistic approach: assuming no difference (null hypothesis) and seeking evidence for a difference (alternate hypothesis). The explanation is concise and avoids unnecessary jargon. hypothesis testing. So say that you think one dollar coins are tail biased. And what I mean by that is that maybe they flip more tails than they do heads for whatever reason. Maybe there's subtle emore coin on the head side of the coin, so it weighs it down or something like that. I don't know. it's a silly example, but at least it gives us something we can visualize. The question is how you would test this hypothesis scientifically. Now you might tell me that firstly, you got to flip a bunch of coins. Obviously you got to take some kind of sample or do an experiment that's true. You then have to get out a pen and paper and note the proportion of tails you have in your sample. But then what what happens then, how do you know whether yes, this coin is now biased or no, this coin is not biased? Well, that's where hypothesis testing is going to come in and help you out. It's going to help you solve this riddle of yours that you've constructed for yourself. And I've This segment details the calculation of a test statistic for a hypothesis test comparing two proportions, explaining the process of determining whether to reject the null hypothesis based on the test statistic and the chosen significance level. It emphasizes the importance of inferential conclusions rather than definitive proof in statistical analysis. the question is, if the null hypothesis is true, how will the sampling statistic? Now, for the moment, let's just say the sampling statistic is sort of theta with a hat on it. What we can get from our sample, which will be our samples P1 minus our samples p. naught. Now, if the null hypothesis is true, how will it be distributed? Well, much like our slide on the intuition behind hypothesis testing, it'll be distributed like a nice bell--shaped curve. So again, the important part of this is that the null hypothesis here is assumed to be true. So if indeed, there is no difference in the population between the two groups, our sample should expect no difference between the two sample values. And when I say expect, I mean that that's the middle of the distribution due to random variation. Of course, there will be some kind of probability distribution around that, some kind of variance around this sample value of zero. But what variants, how can you describe this distribution? Well, the expected value of this distribution, as I just said, is zero. It's the middle of this plot, which makes sense because if the null hypothesis is true, if you sampled a hundred patients that have the operative treatment and 100 patients that have physio only, you would expect them to have the same proportion of successful outcomes. That should be the middle of the distribution. But what is the variance of this distribution? Well, you'll note that we actually have two random variables here, P 1 and P naught. And this just involves a little bit of your recollection of linear algebra. The variance of two uncorrelated random variables is just the variance of 1 plus the variance of the other. So the variance of a proportion is, in fact, that proportion times 1 minus the proportion divided by the number of observations. And the same will be true for the physio-only group. Now, if you recall when we were looking at the contents of the whole video, I noted that this particular subtopic among three subtopics takes the perspective of the null hypothesis. we're assuming the null hypothesis is true. So that means that p one and P naught are actually going to be the same. So what we can do here is that this distribution here assumes the null hypothesis is true, so that I can change this p 1 and p naught into just P, which is this sort of grouped mean or grouped proportion. So they're actually the same here. And so we can simplify this a little bit to be p times 1 minus p times 1 on n 1 plus 1 on n naught. So feeder hat, here we go, is distributed normally with a mean of 0 and a variance given by this. Now, again, I see a question someone, I'm sure is going to be asking me, why on earth, is this distributed normally? we have a difference between two proportions. These are essentially binomial distributions. Why is this normal? And this is the lovely thing about statistics. As soon as you sample enough, as soon as your sample size is big enough, everything just becomes normal. I feel like every statistician should have a CLT tattoo on them somewhere without it. We would be much worse off CLT standing for the central limit theorem which allows us to put this B in here and say that in large samples sampling statistics tend to become normally distributed. All right so let's go in a little bit further what's going to happen here is that we're going to construct well there'll be two separate rejection regions each of which will have well there'll be two separate rejection regions which in combination will sum up to 0.05. Now why does it sum up to 0.05 Well that's just our choice. we're gonna label this thing called alpha as 0.05 and all that means is that this is the probability that we're willing to accept we're willing to take on board in rejecting a null hypothesis that might be true right let's just rewind that a little bit if we get a sample in this yellow region here we're going to be rejecting the null hypothesis that's how we've constructed this hypothesis test but realize that this distribution goes on for infinity so this black line the distribution which is the probability density function function where the null hypothesis is true it goes on for infinity so it's still possible if our sample difference is 0.15 or 0.2 it's still possible for this null hypothesis to be true yet our sample just be very extreme so essentially we create artificially mind you this region beyond which we're sure enough and so this yellow area represents the chance we're willing to take on board that were actually going to be incorrectly rejecting a true null hypothesis now why do it. let's go to test statistics. So as I said our sample difference is distributed normally with a mean of zero and this is our variance p one minus P times one on n one plus one on n zero. Now if that's the case let's consider this test statistic that we're going to call T where it's going to be the sample difference divided by the standard error of that sample difference. So it's essentially Feder hat divided by the square root of the variance. Now if theta hat itself is distributed normally with a mean of 0 and a variance of all this junk, then T. our test statistic will be distributed normally with a mean of 0 and a variance of 1. It's quite simple to prove there. If you take the variance of this expression it's just going to be the variance of theta hat which is this divided by that squared, which is in fact, that again, so it'll cancel out and you'll get 0 and 1. And the good thing about doing that is that we've now created a very standardized test statistic, which we can compare to things like normal distribution tables and things like that. Another way of thinking about it is that the test statistic is just a scaled version of the sample difference. it's just scaled by a factor of the standard deviation, which is divided by the standard deviation. So to round that point home, this was the original distribution that we saw on the previous in the previous bubble. in the presentation. Notice that the sample difference here is that difference in proportions in our sample. And there's what we drew on the previous slide as well. But this axis here can be scaled so that it's the test statistic as well. So it will -so it too has particular critical values beyond which we're going to rejecting the null hypothesis. And because this is standardized we know what this critical value is. When you have a distribution with mean zero and variance one the critical value is 1.96. This critical value is the point above which lies 2.5% of the distribution. Why two point five percent? Well of course it's split in half. So five percent which is our level of significance divided by two is two a little chew on p--value. Now, the p--value is the proportion of repeated samples under the null hypothesis. That would be as extreme as the test statistic we generated. Okay, that seems like a mouthful. But let's read it again. The p--value is the proportion of repeated samples under the null hypothesis. that would be as extreme as the test statistic we generated. So again, like the previous two slides, we assume the null hypothesis is true. So the true difference is zero, meaning that our expected sample difference is also zero. The center of the distribution, in other words is zero. And of course, when we scale it to this test to this test statistic at T, the center of that distribution will also be zero. And our alternate hypothesis is that the difference is non-zero. So here's the probability distribution. Again, with alpha being 0.05 and it's centered on zero. Because we assume the null hypothesis is true, We found a test statistic of 1.99, which was just a shade to the right of this yellow bar here. So we're just in the rejection region. What a p--value is, is the remaining shaded region beyond our test statistic. What this red section is, is the number of repeated samples under the null hypothesis. So assuming the null hypothesis is true, it's the proportion of samples that would have a more extreme test statistic than ours, or I should say, a test statistic, which is as extreme or more extreme than ours. So our, the sample was extreme enough for us to reject the null hypothesis. But there would be some samples that are even more extreme. This red section summarizes all of those possible samples. And if you were to add up that whole red section as a proportion of this distribution, it's going to be 0.04. 7. Now it's actually very difficult to calculate that by hand, a computer program can do it for the purpose of this. let's not worry about how that's calculated. but you were probably going to guess that it was going to be something very close to 0.05, right? We only just rejected the null hypothesis from this test statistic here. So the p-value had to be pretty close to 0.05. And in fact it had to be just slightly less than 0.05 That shaded red region would be slightly less than the shaded yellow region, which is exactly 0.05. Hopefully you can see This segment explains how to construct a confidence interval around a sample statistic, specifically the difference between two proportions. It emphasizes the correct interpretation of a confidence interval as a range of values within which the true population parameter is likely to fall, with a specified level of confidence, rather than a range containing a variable population difference.This segment connects the statistical results (p-values and confidence intervals) to a real-world decision-making context. It demonstrates how confidence intervals can be used to assess whether a treatment difference is practically significant, even if statistically significant, meeting the requirements set by an orthopedic society.This segment provides a visual explanation of the concepts of power and sample size in hypothesis testing. Using a graphical representation of two probability distributions (one under the null hypothesis and one under the alternative hypothesis), it illustrates how the power of a test (the probability of correctly rejecting a false null hypothesis) is affected by the magnitude of the true difference between groups and the sample size.This segment explains how the proximity of two bell curves representing different groups affects the likelihood of rejecting the null hypothesis. It demonstrates that a smaller difference between groups reduces the power of a test to detect a difference, while increasing the sample size makes the curves narrower, increasing the power to detect even small differences. The explanation clearly links sample size, difference between groups, and the power of a statistical test, providing a foundational understanding of these concepts.This segment introduces a real-world example involving ankle fractures in children, comparing operative treatment versus physiotherapy. It sets the stage for a practical application of power and sample size calculations, outlining the problem and the specific questions to be addressed. The scenario is relatable and clearly explains the context of the statistical analysis to follow.This segment delves into the calculation of variance for both the null and alternative hypotheses in the ankle fracture example. It meticulously explains the formulas used, emphasizing the importance of understanding the underlying principles rather than simply plugging numbers into a formula. This detailed explanation is valuable for grasping the statistical reasoning behind the calculations.This segment walks through the process of calculating the power of the hypothesis test for the ankle fracture example. It explains the normalization of the distributions, the use of the cumulative distribution function (CDF), and the interpretation of the result. The detailed step-by-step approach makes the complex calculation process easier to understand, emphasizing the importance of visualizing the distributions and their relationship to the test's power.