Histogram: Calculate descriptive statistics
There are several statistics which are useful to describe and analyze a histogram. They are calculated to describe the area under the curve formed by its shape. These descriptive statistics can be calculated using software such as SQCpack.
The central location of a set of data points is where (on what value) the middle of the data set is located. Central location is commonly described by the mean, the median, and/or the mode. The mean is the average value of the data points. The median is the middle number in the data set when the data points are arranged from low to high. The mode is the value in the data set that occurs most often.
Both range and the standard deviation illustrate data spread. Range is calculated by subtracting the minimum data value from the maximum data value. The standard deviation is a measure that indicates how different the values are from each other and from the mean. There are two methods of calculating standard deviation using individual data points or using a samples average range. Both formulas are available here.
Skewness is the measure of the asymmetry of a histogram (frequency distribution). A histogram with normal distribution is symmetrical. In other words, the same amount of data falls on both sides of the mean. A normal distribution will have a skewness of 0. The direction of skewness is “to the tail.” The larger the number, the longer the tail. If skewness is positive, the tail on the right side of the distribution will be longer. If skewness is negative, the tail on the left side will be longer. The formula for skewness is available here.
Kurtosis is a measure of the combined weight of the tails in relation to the rest of the distribution. As the tails of a distribution become heavier, the kurtosis value will increase. As the tails become lighter the kurtosis value will decrease. A histogram with a normal distribution has a kurtosis of 0. If the distribution is peaked (tall and skinny), it will have a kurtosis greater than 0 and is said to be leptokurtic. If the distribution is flat, it will have a kurtosis value less than zero and is said to be platykurtic. The formula for kurtosis is available here.
Coefficient of variance
The coefficient of variance is a measure of how much variation exists in relation to the mean. It may also be described as a measure of the significance of the sigma in relation to the mean. The larger the coefficient of variance, the more significant the sigma, relative to the mean. For example, if the standard deviation is 10, what does it mean? If the process average (mean) is 1000, a sigma value of 10 is not very significant. However, if the average is 15, a standard deviation of 10 is VERY significant. The formula for coefficient of variance is available here.
In SPC, the chi-square statistic is used to determine how well the actual distribution fits the expected distribution. Chi-square compares the number of observations found in each cell in a histogram (actual) to the number of observations that would be found in an expected distribution. If the differences are small, the distribution fits the theoretical distribution. If the difference are large, the distribution probably does not fit the expected distribution.
Using Chi-square with the assumption of a normal distribution
- The calculated chi-square is compared to the value in the table of constants for chi-square based on the number of “degrees of freedom.”
- If the calculated chi-square is less than the value in the table, the chi-square test passes, affirming that the process has a normal distribution.
- If the chi-square is larger than the value in the table, the chi-square test fails. At this confidence level, you either do not have enough data to judge the process, or you should reject the assumption that the process has a normal distribution.
Note: Theoretical percent outside of specifications may be misleading.
The formula for chi-square is available here along with the degrees of freedom table.
Follow these steps to interpret histograms.
- Study the shape.
- Calculate descriptive statistics.
- Compare the histogram to the normal distribution.