In general you need to know the distribution in order to calculate p values, though there are statistical methods for deciding - with some confidence level - whether a sample conforms to some distribution.
I did ask chatgpt 5.2 how to calculate the p value the sets of means and variance and set the null hypothesis as the means being the same then used Pooled t-test. The ai determined that both samples were more than 13 than the p is less than 5%.
P value seems a concept with a mathematical descriptions, but then I run into a wall when it’s like how do you figure out probably of group A having the values it has given group B values. I would need to see how people actually calculate their p values and null hypothesis to get concrete examples
I do like how the Wikipedia page shows that a set of 20 coin flips having 14 heads would have a p value above .05
I don’t understand exactly what you did with chatgpt but I wouldn’t trust it on this. A textbook or Wikipedia would be a better source.
In practice p-values are used with a normality assumption. That assumption is widely valid because of the central limit theorem which means that normal distributions show up very very often.
And in practice they’re used as a formula to decide when a result is “statistically significant” i.e to give an idea of how likely an observed difference is due to a real phenomenon. So if people in a drug trial report feeling ill for two fewer days on average, calculating the p value will answer the question “what are the chances there’s actually a difference?”
I’d look for more examples - loaded dice examples are usually easy to understand too.
Ironically with loaded dice I would look distribution of results and see that it’s not uniform distribution after a billion tosses and say it’s not fair/ loaded. I would do that simply to avoid figuring out how to prove the probability of a given set of due results.
It’s more about the journey to the p value the calculate p value.
I have seen that 1 drug recovery time example and that’s the easiest given that it’s normal distribution and it can be put into a region that less than 5% probability
Ok but if you’ve done twelve throws, how off does the distribution have to look to cause suspicion? All sixes is obviously off, but what about four sixes? Five? At some point your assessment goes from “probably uniform” to “probably biased”. The p value quantifies that.
In general you need to know the distribution in order to calculate p values, though there are statistical methods for deciding - with some confidence level - whether a sample conforms to some distribution.
I did ask chatgpt 5.2 how to calculate the p value the sets of means and variance and set the null hypothesis as the means being the same then used Pooled t-test. The ai determined that both samples were more than 13 than the p is less than 5%.
P value seems a concept with a mathematical descriptions, but then I run into a wall when it’s like how do you figure out probably of group A having the values it has given group B values. I would need to see how people actually calculate their p values and null hypothesis to get concrete examples
I do like how the Wikipedia page shows that a set of 20 coin flips having 14 heads would have a p value above .05
I don’t understand exactly what you did with chatgpt but I wouldn’t trust it on this. A textbook or Wikipedia would be a better source.
In practice p-values are used with a normality assumption. That assumption is widely valid because of the central limit theorem which means that normal distributions show up very very often.
And in practice they’re used as a formula to decide when a result is “statistically significant” i.e to give an idea of how likely an observed difference is due to a real phenomenon. So if people in a drug trial report feeling ill for two fewer days on average, calculating the p value will answer the question “what are the chances there’s actually a difference?”
I’d look for more examples - loaded dice examples are usually easy to understand too.
I didn’t understand what chat gpt did entirely.
Ironically with loaded dice I would look distribution of results and see that it’s not uniform distribution after a billion tosses and say it’s not fair/ loaded. I would do that simply to avoid figuring out how to prove the probability of a given set of due results.
It’s more about the journey to the p value the calculate p value.
I have seen that 1 drug recovery time example and that’s the easiest given that it’s normal distribution and it can be put into a region that less than 5% probability
Ok but if you’ve done twelve throws, how off does the distribution have to look to cause suspicion? All sixes is obviously off, but what about four sixes? Five? At some point your assessment goes from “probably uniform” to “probably biased”. The p value quantifies that.