Populations and Sampling
A population consists of all the possible elements or items associated with a situation; for example, all trout that are living in a lake. A sample refers to a portion of those elements or items. It is cost prohibitive to evaluate every member of a population and, in the case of destructive testing, may be impossible. For these reasons, manufacturers rely on sampling their data to cost-effectively make inferences of the population without measuring each piece.
- Effective sampling plans must be representative of the population being studied.
- In most cases, sampling plans need to be random and unbiased.
- Sampling frequency and subgroup size are also crucial to a successful sampling plan.
Rational Sampling and Subgrouping
Rational samples are taken with regard to the way the process output is measured (i.e., what, where, how, and when it is measured). Samples must be taken frequently enough to monitor any changes in the process. Samples should be selected with the goal of keeping the process stream intact. That is, in the context of manufacturing, a stream consists of a single part, process, and feature combination. Mixing any one of these parameters introduces ambiguity into the analysis. Odd sample sizes (3 and 5 are very common) are recommended because they have a natural median.
The correct sampling frequency depends on how fast the process is changing. To be representative of the population, samples must be taken often enough to catch any expected changes in the process, but with sufficient time between samples to display variation. Frequencies are usually defined in measurements of time (e.g., every 30 minutes, hourly, daily) but may also be defined using counts (e.g., every 100th product).
After the data have been sampled rationally, they must be subgrouped rationally as well. A rational subgroup contains parts that can be produced without any process adjustments – typically consecutively produced parts. Such a subgroup has little possibility of assignable cause variation within the subgroup. If only common cause variation exists within the samples, then any abnormal differences within or between the subgroups is attributable to assignable cause variation. Process streams should not be mixed within a subgroup. If the subgroup includes output of two or more process streams and each stream cannot be identified, then the sampling is not rational.
The subgroup size determines the sensitivity of a chart. As the sample size increases, the plotted statistic becomes more sensitive. That is, charts can detect smaller process shifts as the sample size increases.
Data must sometimes be grouped in subgroups of one. Subgroup size should be one when process adjustments or raw material changes must be made with each part or when only one value represents the monitored condition (e.g., daily yield, past week’s overtime). Subgroup size should also be one when sampling a known homogeneous batch.
In Advanced Topics in Statistical Process Control, Donald Wheeler suggests the following subgrouping principles:
- Never knowingly subgroup unlike things together.
- Minimize the variation within each subgroup.
- Maximize the opportunity for variation between the subgroups.
- Average across noise, not across signals.
- Treat the chart in accordance with the use of the data.
- Establish standard sampling procedures.
Random vs biased sampling
The purpose of a sample is to accurately represent the population. Statistical formulas that are used to estimate populations are based on the premise that the samples are random. In a random sample, every item in the population has an equal chance of being selected. A sample has bias when some of the items in a population have a greater chance of being sampled than others.
Example: Sampling pies
Suppose you are a taster in a pie factory. If a day’s production is one pie, then that pie is the population. To evaluate the population, you would need to eat the entire pie. However, you’d then be left with no pie to sell. A more effective option, assuming a uniform crust and homogeneous filling, would be to slice the pie into 12 equal sections and eat only one slice. By eating this sample slice, you can evaluate the quality of the entire pie and still be left with slices to sell.
If production increases to several pies per day, you may continue eating one slice from a pie and may not sample every pie. If you add a second shift or a second variety of pie, you would need to collect subgroups from these new sources of variation.
Imagine that you always take a sample slice from the same slice location for the pie samples. It may be possible that the location of that slice as the pie moves through the oven allows it to be perfectly cooked while the other side of the pie is slightly undercooked. This is another source of variation that needs to be considered with sampling. A true random sample would be one that is taken from different or random areas of each sampled pie.
5 Ws and 2 Hs of Sampling
Who will be collecting the data? Evaluate the abilities of the operator who collects the data. How much time does the operator have? Does the operator have adequate resources to collect the data?
What is to be measured? Focus on important characteristics. Remember that it costs money to sample, so you should focus on the characteristics that are critical to controlling the process or key features that measure product conformity.
Where or at what point in the process will the sample be taken? The sample should be taken at a point early enough in the process that allows the data to be used for process control.
When will the process be sampled? Samples must be taken often enough to reflect shifts in the process. A good rule of thumb is to sample two subgroups between process shifts.
Why is this sample being taken? Will the data be used for product control or process control? What question(s) are you trying to answer with the data?
How will the data be collected? Will samples be measured or evaluated manually, or will the data be retrieved from an automated measurement source?
How many samples will be taken? The sample quantity should be adequate for control without being too large.
The discussion so far has centered on the benefits of measuring variables data. But in many situations, there is no measurement value, only a pass/fail rating or a defect count. Even so, attribute data can also be plotted on control charts and be vital to understanding process control. There are two distinct types of attribute data: defects and defectives.
Defects data, also known as counts data, are used to describe data collection situations in which the number of occurrences within a given unit is counted. An occurrence may be a defect, observation, or an event. A unit is an opportunity region to find defects, sometimes called the area of opportunity. A unit may be a batch of parts, a given surface area or distance, a window of time, or any domain of observation.
For example, suppose the number of weave flaws is counted on a bolt of fabric. The bolt represents a unit, and the weave flaws represent occurrences. There might be an unlimited number of types of flaws on a given bolt of fabric. Some flaws might be more severe than others. A flaw might or might not cause the bolt to be scrapped. Consecutively produced bolts might or might not be of uniform size.
Defectives data, also known as go/no-go or pass/fail data, are used to describe data collection situations in which the unit either does or does not conform.
For example, light bulbs are tested in lots of 100. If a bulb lights up, it conforms and is accepted. If the bulb does not light, it is nonconforming. Or consider a filling operation. If a container is filled below the minimum weight, it is defective. Anything over the minimum weight is accepted. Either the fill volume meets the minimum requirements, or it does not.