Quality Advisor


A resource for data collection tools, including how to collect data, how much to collect, and how frequently to collect it.

What is it?

Sampling is a tool that is used to indicate how much data to collect and how often it should be collected. This tool defines the samples to take in order to quantify a system, process, issue, or problem.

To illustrate sampling, consider a loaf of bread. How good is the bread? To find out, is it necessary to eat the whole loaf? No, of course not. To make a judgment about the entire loaf, it is necessary only to taste a sample of the loaf, such as a slice. In this case the loaf of bread being studied is known as the population of the study. The sample, the slice of bread, is a subset or a part of the population.

Now consider a whole bakery. The population of interest is no longer a loaf, but all the bread that has been made today. A sample size of one slice from one loaf is clearly inadequate for this larger population. The sample collected will now become several loaves of bread taken at set times throughout the day. Since the population is larger, the sample will also be larger. The larger the population, the larger the sample required.

In the bakery example, bread is made in an ongoing process. That is, bread was made yesterday, throughout today, and will be made tomorrow. For an ongoing process, samples need to be taken to identify how the process is changing over time. Studying how the samples are changing with control charts will show where and how to improve the process, and allow prediction of future performance.

For example, the bakery is interested in the weight of the loaves. The bakery does not want to weigh every single loaf, as this would be too expensive, too time consuming, and no more accurate than sampling some of the loaves. Sampling for improvement and monitoring is a matter of taking small samples frequently over time. The questions now become:

  • How many loaves to weigh each time a sample is taken?
  • How often to collect a sample?

These two questions, “how much?” and “how often?” are at the heart of sampling.

When is it used?

  • Sampling is used any time data is to be gathered.
    Data cannot be collected until the sample size (how much) and sample frequency (how often) have been determined.
  • Sampling should be periodically reviewed.
    When data is being collected on a regular basis to monitor a system or process, the frequency and size of the sample should be reviewed periodically to ensure that it is still appropriate

How is it done?

  1. What questions are being asked of the data?
    Before collecting any data, it is essential to define clearly what information is required. It is easy to waste time and resources collecting either the wrong data, or not collecting enough information at the time of data collection. Try to anticipate questions that will be asked when analyzing the data. What additional information would be desirable? When collecting data, it is easy to record additional information; trying to track information down later is far more difficult, and may not be possible.
  2. Determine the frequency of sampling.
    The frequency of sampling refers to how often a sample should be taken. A sample should be taken at least as often as the process is expected to change. Examine all factors that are expected to cause change, and identify the one that changes most frequently. Sampling must occur at least as often as the most frequently changing factor in the process. For example, if a process has exhibited the behavior shown in the diagram below, how often should sampling occur in order to get an accurate picture of the process?
    Factors to consider might be changes of personnel, equipment, or materials. The questions identified in step 1 may give guidance to this step.Common frequencies of sampling are hourly, daily, weekly, or monthly. Although frequency is usually stated in time, it can also be stated in number: every tenth part, every fifth purchase order, every other invoice, for example. If it is not clear how frequently the process changes, collect data frequently, examine the results, and then set the frequency accordingly.
  3. Determine the actual frequency times.
    The purpose of this step is to state the actual time to take the samples. For instance, if the frequency were determined to be daily, what time of day should the sample be taken—in the morning at 8:00 am, around midday, or late in the day around 5:00 pm? This is important because inconsistent timing between data gathering times will lead to data that is unreliable for further analysis. For example, if a sample is to be taken daily, and on one day it is taken at 8:00 am, the next day at 5:00 pm, and the following day at midday, the timing between the samples is inconsistent and the collected data will also be inconsistent. The data will exhibit unusual patterns and will be less meaningful. Stating the time that the sample is to be taken will reduce this type of error. The actual time should be chosen as close to any expected changes in the process as possible, and when taking a sample will be convenient. Avoid difficult times, such as during a shift change or lunch break.”
  4. Select the subgroup (sample) size.
    A subgroup (or sample) is the number of items to be examined at the same time. The terms “subgroup” and “sample” may be used interchangeably. When doing calculations, subgroup size is denoted by the letter n. To choose the most appropriate subgroup size, determine first whether the data being collected is “variables data” or “attributes data.”
    For variables data: When measuring variables data, a subgroup size larger than one is preferable because larger subgroups sizes yield greater possibilities for analysis. However, it may not be possible to get a subgroup size larger than one. Some examples of this are electricity usage per month, profit per month, sales per month, temperature of a room, and the viscosity of a fluid. In situations such as these when a subgroup size larger than one does not make sense, the subgroup (or sample) size is equal to one.If a subgroup size larger than one can be chosen, the size is usually between three and eight. A subgroup size between three and eight has been determined to be statistically efficient. The most commonly-used subgroup size is five. When more data is desired, the frequency of taking samples, not the subgroup size, should be increased.

    When a sample is taken, it should be selected to assure that conditions within the sample are similar. If gathering a sample size of five, for example, take all five pieces in a row as they are produced in the process. This is known as a rational subgroup.

    For attributes data: The subgroup size for attributes data depends on the process being sampled. The general rule of thumb is to gather a large enough sample so that all possible characteristics being investigated will appear. That is, the sample is large enough that a “0” occurrence is rare.

    Begin by answering the question, “How many items does this process produce during the frequency interval (per hour, week, etc.)?” When that number is determined, the sample size should be at least the square root of that number. For instance, if a purchasing department processes 100 purchase orders per week, an appropriate sample size would be 10 purchase orders per week (the square root of 100 is 10.)

The above article is an excerpt from the “Sampling” chapter of Practical Tools for Continuous Improvement: Volume 1 – Statistical Tools. The full chapter provides more details on sampling.