A question:
Hi Mark — long time reader, first time sending a question…. I love your recent emphasis on eliminating “red/green” analyses, and moving toward more statistically sound tools like control charts. I’ve seen your slides and webinars on creating them.
But one question has bugged me. When selecting data on which to calculate control limits, you suggest (in one deck I saw) at least 20 data points. Does the implicitly assume that the data represents a “stable system”, or at least a stable time within a larger system? If there’s any undiagnosed special variation in that initial data set, are the resulting upper and lower control limits really valid?
Put another way, what considerations should we use when selecting the initial data set? Thanks so much!
My reply:
Thanks for checking out my work and for your question. Great question.
The Process Behavior Chart, when we create it, will answer the question of “Is this a predictable system over the timeframe of the baseline data?”
It doesn’t make any assumptions about “the larger system” (before or after the baseline period).
Don Wheeler’s work has shown that the limits are still valid even if the baseline if not a predictable system. The presence of signals tells us that we have work to do to eliminate the causes of those signals, in order to create an improved and now-predictable system.
See this article — (myth 4)
“Myth Four: It has been said that the process must be operating in control before you can place the data on a process behavior chart”
As to considerations for choosing the initial data set… I generally would use the 20 most recent data points for the purposes of creating baseline average and limits. You can experiment with a data set to see if using 15 or 20 or 25 or 30 data points makes a big difference (it might not).
I’d be careful if there’s a known system change that occurred during that time frame (but the PBC might show that, unfortunately, that the change didn’t lead to a signal in the chart).
Hope that helps.