How Undetected Process Changes Can Ruin Product Reliability

August 18, 2023
6 min read

By systematically detecting (and rectifying) sources of special cause variation upstream in the process, the important process outcomes become predictable.

SPC and Reliability

We often think of Statistical Process Control as a tool to help drive product quality by informing us when process changes occur. By systematically detecting (and rectifying) sources of special cause variation upstream in the process, the important process outcomes become predictable. Furthermore, a focus on reducing common cause variation drives higher levels of process capability and more consistent product performance.

So how does process control (or the lack of it) affect product reliability? Reliability is defined as
the probability that a material, component, or system will perform its intended function under defined operating conditions for a specified period of time. The importance of reliability methods differ by industry.

An automaker is interested in the reliability of the supplied components since this will determine the overall reliability of a vehicle. A food company will use reliability methods to determine shelf life of their products–the time until a specific characteristic such as taste or texture is affected or until bacteria counts become excessive. Pharmaceutical companies need to know how long the active ingredients will retain effectiveness. And most producers are interested in the reliability of their production equipment since this will greatly affect productivity.

Products that have high reliability over a product’s lifetime have a low probability of premature failure. Just as unintended process changes will certainly affect process capability, they may also adversely impact product reliability, and this will negatively affect customers.

Manufacturers design, develop, and warrant (or label) products based on an expected product lifetime. Many manufacturers conduct extensive reliability testing to minimize the risk that products will fail prematurely. To shorten the time required to complete testing, many manufacturers leverage Accelerated Life Testing (ALT) methods. ALT involves testing under stress conditions that will accelerate the failure mode(s) so that failure may be observed quickly. Based on failures observed at multiple stress conditions, the reliability at normal use conditions may be predicted.

Despite these efforts, unexpected failures occur due to design flaws, production process changes, or a misunderstanding of the product use environment. Premature failures alienate customers, significantly impair brand and company reputations, and may result in financial risks from recalls or product liability concerns.

A Case Study – Refrigerator Tubing Failures

A manufacturer of household appliances noticed a spike in the number of warranty claims being received due to poor cooling performance resulting from tubing failures (leaks). The failures resulted in significant costs for required repairs and loss of goodwill.

Upon performing a reliability/warranty analysis, it was clear that two distinct populations were present in the data. Units produced from August 2008 through November 2008 had far worse field performance (reliability) than those units produced prior to August 2008 and after November 2008. The following graph illustrates the estimated reliability curves for each group.

The plot shows time (in months) on the x-axis and Reliability (probability of not failing) on the y-axis. For units produced in months other than August 2008 through November 2008, the reliability slightly decreases as a function of time. At 12 months in service, the reliability is 0.993. However, for units produced between August 2008 and November 2008, the reliability falls dramatically as time increases. At 12 months in service, the reliability is estimated to be 0.719. Thus, there is about a 28% chance that a unit produced between August 2008 and November 2008 will fail by 1 year in service.

Together with the manufacturer, we conducted an investigation and analysis to determine the root cause of the failures. Based on the failure analysis and root cause analysis, we concluded that inadequate weld strength most likely led to failures and the resulting leaks. Unfortunately, although a significant amount of process data was being recorded, it was not being charted or used to monitor the process. SPC had been mostly discontinued in the tube mill operation, so it was not possible to prevent any issues resulting from welding process changes.

We designed and performed an experiment to better understand the factors affecting weld strength. Both Weld Current and Electrode Gap had strong effects on weld strength. Specifically, increases in Weld Current and increases in Electrode Gap caused a significant decrease in the weld strength. When historical weld current data (that was being recorded but not charted) was viewed, an interesting pattern emerged.

During the problematic production periods (represented by the middle block of data points, approximately data point numbers 30-150), the average weld current was about 15 amps or so higher that the left and right blocks of data points, blocks which contain few failures occurred. This picture along with the results of the designed experiment confirmed that high weld currents caused low weld strengths which then led to the costly field failures.

The use of a simple control chart on weld current would have detected this significant change in this important process parameter (very quickly) so that a process adjustment could have been made. A costly and embarrassing field problem would have been prevented.

More Examples

We have worked on countless projects where significant reliability problems have surfaced at a point in time following a previous period of excellent reliability performance. In some of the cases, the change in reliability performance coincided with an intentional change (such as design changes or supplier changes). However, in most of the cases, the reliability degradation resulted from an undetected change in an important characteristic that was not being monitored. Adequate SPC on important characteristics would have prevented these issues from occurring. Some examples of recent product safety recalls are:

  • Bicycle Pedals (safety recall – pedals are breaking and cracking during use causing riders to lose control)
  • Toasters (safety recall—the heating element can be energized even though the toaster lifter is in the “up” position—which can pose a fire hazard)
  • Golf Cars (safety recall—the threaded end of the rack rod ball joint can break, displacing the ball joint and causing a loss of steering control)
  • Elliptical Exercise Equip (safety recall—the foot plates detach from the machine during use)
  • Refrigerators (safety recall—the doors detach and pose a serious injury hazard)
  • GMC Trucks (1999-2003) – current NHTSA Defect Investigation due to Brake Line Corrosion Failure
  • VW 2011 Jetta TDI (safety recall due to fuel injector line leakage)

These are just 7 of hundreds of recent safety recalls within the past few months where the product was performing adequately, and then the performance (life/reliability) was changed due to an undetected change in manufacturing operations.


While SPC is typically linked with ensuring adequate process capability, the inability to control key characteristics can also have devastating consequences on product reliability. Many costly failures can be prevented by developing process understanding and establishing statistical process control on key characteristics.

Allise Wachs, PhD
Integral Concepts, Inc.

Integral Concepts provides consulting services and training in the application of quantitative methods to understand, predict, and optimize product designs, manufacturing operations, and product reliability.