The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.
Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions, or whether outcome frequencies follow a specified distribution.
In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.
One way in which a measure of goodness of fit statistic can be constructed, in the case where the variance of the measurement error is known, is to construct a weighted sum of squared errors:
where $σ^2$ is the known variance of the observation, O is the observed data and E is the theoretical data.
This definition is only useful when one has estimates for the error on the measurements, but it leads to a situation where a chi-squared distribution can be used to test goodness of fit, provided that the errors can be assumed to have a normal distribution.
The reduced chi-squared statistic is simply the chi-squared divided by the number of degrees of freedom:
where is the number of degrees of freedom, usually given by , where is the number of observations, and is the number of fitted parameters, assuming that the mean value is an additional fitted parameter.
The advantage of the reduced chi-squared is that it already normalizes for the number of data points and model complexity. This is also known as the mean square weighted deviation.
For example , a chi-square goodness of fit test. The test is applied when you have one categorical variable from a single population. It is used to determine whether sample data are consistent with a hypothesized distribution.
The chi-square goodness of fit test is appropriate when the following conditions are met:
- The sampling method is simple random sampling.
- The variable under study is categorical.
- The expected value of the number of sample observations in each level of the variable is at least 5.
- This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.
State the Hypotheses
Every hypothesis test requires the analyst to state a null hypothesis (H0) and an alternative hypothesis(Ha).
The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.
For a chi-square goodness of fit test, the hypotheses take the following form.
H0: The data are consistent with a specified distribution.
Ha: The data are not consistent with a specified distribution.
Typically, the null hypothesis (H0) specifies the proportion of observations at each level of the categorical variable. The alternative hypothesis (Ha) is that at least one of the specified proportions is not true.