1. Select Alpha Illustration from the main menu.
2. Select a 1- or 2-tailed normal or student's, or a 1-tailed chi square.
3. Choose OK
You will see a theoretical distribution with a shaded area under the distribution (either all in one tail or divided between two tails). The shape of the distribution and the shading will vary, depending on the parameters that you select.
If you choose Normal, one-tailed: You will see a theoretical normal distribution with the shaded area in the right tail of the distribution (a one-tailed test could also have placed all of the shading in the left tail). If you look at the x axis of this distribution, you will see that its mean is 0 and it's standard deviation is 1. This is an idealized z distribution. The shaded area represents the most extreme (positive) z-scores in the distribution, and in area, represents 5% of the total area under the curve.
If you choose Normal, two-tailed: You will see an idealized normal distribution with its shaded areas divided between the right and left tails of the distribution. If you look at the x axis of this distribution, you will see that its mean is 0 and its standard deviation is 1. This is an idealized z distribution. The shaded areas represent the most extreme (both positive and negative) z-scores in the distribution, and in combined area, represent 5% of the total area under the curve. The shaded area at the left represents 2.5% of the total area, and the shaded area at the right represents 2.5% of the total area.
If you choose Student's, one-tailed: You will see a theoretical distribution of student's t, with a shaded area in the right tail of the distribution (a one-tailed test could also have placed the shading in the left tail). The shaded area represents the most extreme (positive) t-values in the distribution, and in area, represents 5% of the total area under the curve.
If you choose Student's, two-tailed: You will see a theoretical distribution of student's t with shaded areas divided between the right and left tails of the distribution. The shaded areas represent the most extreme t-scores in the distribution, and in combined area, represent 5% of the total area under the curve. The shaded area in the left tail represents 2.5% of the total area, and the shaded area in the right tail represents 2.5% of the total area.
If you choose Chi Square, one-tailed: You will see a theoretical distribution of chi-square. Since values of Chi Square can only be positive only the one-tailed test can be performed, and the shading will always be under the rightmost tail of the distribution, i.e., toward the most extreme positive values. The shaded area represents the most extreme scores in the distribution, and in area, represents 5% of the total area under the curve.
The purpose of the alpha illustration is to graphically depict alpha levels and Type I error. Using the alpha illustration in conjunction with other demonstrations and illustrations will enable us to examine various issues in hypothesis testing.
What are alpha, Type I error, and the null hypothesis?
Psychologists, and other scientists who use statistical techniques, don't like being wrong. They like to set limits on how often they are willing to make an incorrect decision. When they test hypotheses, there are two kinds of incorrect decisions that they can make: they can say that there is a difference between or among groups when there really isn't one (this is called a Type I error), or they can say that there is no difference when there really is one (this is called a Type II error). By convention, a maximum limit on Type I error is set at 5 times in 100, and this limit, called alpha, is set before conducting any research. (Type II error is discussed in the Power Illustration.) This translates into a probability of .05 that we will make a Type I error on any one statistical test.
In certain circumstances, we may want to place a greater limit our probability of making a Type I error. For example, a psychologist who is measuring depression levels in people who have vs. haven't been taking a new anti-depressant drug, may want to set alpha at .01, or .001. If there will be a consequence of finding that the people on the drug were less depressed than those who didn't take the drug, (for example, it will be recommended that this drug should be taken by all depressed people and not just the participants in the study), and there are unpleasant side effects or dangers associated with taking the drug, then a smaller probability of error might be called for. He or she might want to protect potential patients from the consequences of a false positive decision, in this case, deciding that someone with a depressive disorder should take a particular drug with dangerous side effects, when in fact the drug will not help. The alpha level that we select is more than a statistical decision; it's a decision based on the consequences of the research, which by convention, will not exceed .05.
In each of the distributions that you can select in this illustration, a certain area (or areas) has been shaded in. These are the critical regions for a z test, t test, or chi square test with an alpha set at .05. These regions represent the most extreme values in their distributions, and the proportion of the total area of the distribution that is occupied by scores in the shaded area(s) is 0.05. (In practice, we locate in a table the minimum value of z, t or chi square that is at the boundary of this region, so that we can compare our sample statistic with it - values at or above the critical value fall into the shaded region.) The critical region is also the place where we may make a Type I error. How do we know, when our sample statistic falls into the critical region, that there is a real difference- that we aren't making a Type I error? We don't. We just know what the chances are that we made a Type I error. To discuss this further, let's use this illustration in conjunction with another demonstration.
Remember Prof. D. Mented from the Central Limit Demonstration? She's been teaching intro stats for 45 years, using those weird quizzes that students can actually lose marks on. She has kept records of all these quiz scores, and knows that they are normally distributed with a mu of 0 and a standard deviation of 1. Well, recently the university hired a new faculty member, Prof. D. Ranged, who likes D. Mented's grading scheme, and uses it in his own statistics classes. Even though he's been teaching for less than a year, Prof. D. Ranged thinks that he's a better teacher than D. Mented. He bet Prof. D. Mented $10.00 that he's a better teacher, and she accepted. As they were starting a new term, they decided that D. Ranged would calculate a mean quiz score for his 25 students at the end of the course, and they would use that as their measure of teaching skills. (D. Mented was also quick to insist that a teaching assistant do all of the marking, in case D. Ranged couldn't help being a little generous in his marking.) They set an alpha of .05, because it's only $10.00, and because D. Mented doesn't want to show that she is nervous by demanding a stricter alpha level. They are going to use a z-test to settle their bet.
D. Ranged and D. Mented have two hypotheses: the null and the alternate. The null hypothesis is that their teaching skills are the same; one is neither more nor less skilled than the other. The alternate is that D. Ranged is a better teacher. The hypothesis that they will test is the null hypothesis. Why would they test the null hypothesis? Remember that teaching skills have been operationally defined as the mean quiz score. Because we know what D. Mented's distribution of scores looks like, we can take D. Ranged's mean, and see where it falls relative to where we would expect it to fall if there were no difference between his skills and Prof. D. Mented's skills.
At this point, you should start the Central Limit Demonstration, and build a Distribution of Sample Means with a sample size of 25 from a normal distribution (you can accept the default of 10,000 samples). The resulting distribution of sample means is a distribution of how D. Mented's sample means would be distributed if we randomly sampled 10,000 samples of size 25 from her population of quiz scores, and calculated and plotted the mean for each. Now start the Zentral Limit Demonstration, and select the same parameters. The resulting distribution of z scores is a distribution of how D. Mented's scores would be distributed if we randomly sampled 10,000 samples of size 25 from her population of quiz scores, calculated each sample mean, transformed each mean into a z-score, and then plotted the z-scores in a distribution. Now start the Alpha Illustration, and select Normal, one-tailed. This is an idealized z distribution, with the critical region for a one-tailed test, with an alpha of .05, shaded in.
Let's now say that D. Ranged has finished his course in stats, and has calculated his sample mean. The mean of his sample of 25 quiz scores is +0.33. This translates into a z score of +1.65. Remember that D. Mented's population mean is 0, and has a standard deviation of 1. When we randomly sample as many samples as pratical of size 25 and calculate and plot the means for each sample, this distribution of means will have a mean of 0, and a standard deviation (standard error) of 1/square root of 25. So the z score for Prof. D. Ranged's mean is ( 0.33 - 0 ) / ( 1 / 5 ) = 1.65. Because this z score is more extreme that 1.64 (our critical value of z), Prof. D. Ranged concludes that he is a better teacher than Prof. D. Mented.
Note however, that the distributions that you made of D. Mented's means and z scores in the Central and Zentral Limit Demonstrations contain samples with this high a mean (and z) and higher! If we randomly sampled as many random samples as pratical of size 25, and calculated and plotted the means for each of these samples, 5% of them would have a mean of .33 and above. Likewise, 5% would have a z of 1.64 and higher. So what does a z score of 1.65 mean in terms of who is the better teacher? Does D. Ranged's mean represent a sample that comes from a theoretical distribution of quiz scores that has, like Prof. D. Mented's, a mean of 0? Or does his mean belong to some other population distribution, which has a higher mean? We can't say. We make the statistical decision that Prof. D. Ranged is a better teacher, i.e., we make the statistical decision that D. Ranged's mean is unlikely to have come from a population with a mean of 0, but we acknowledge that our probability of making a Type I error is just less than .05. The strongest statement we can make with respect to the results of the statistical test is that we have rejected the null hypothesis: we have not proven that the null hypothesis is wrong, nor have we proven that the alternate hypothesis is right.
Also, rejection of the null hypothesis is a yes or no decision; it does not tell us if the difference in teaching skills is meaningful or important. Prof. D. Mented could argue that the difference is puny, and D. Ranged could argue that it's huge. They could use other statistical tests to determine what the size of the effect is within a certain range, but ultimately, judging a difference as meaningful or important is not a statistical decision.