You can't always get what you want
But if you try sometimes, well, you might find
You get what you need.
Those words of wisdom from the Rolling Stones could well have been written for a process improvement manual. Getting hold of data for your analysis can pose a problem. At the two extremities, there's Big Data and Small Data.
Big Data, with its multiple data sources, astronomical volumes and questionable algorithms does not necessarily equate to Big Information or Big Knowledge and the majority of projects will probably require something far more manageable. By contrast, Small Data is tempting but can be reduced so much that it hides the level of detail needed to discover a root cause.
So how do you find Goldilocks (of three bears fame) Data? Well the good news is that you don't need to taste multiple bowls of porridge, unless you really want to, or even be an expert on relational databases, but you do need to be very precise about what you're trying to improve and how you will measure success. These two factors will define the data that you need to analyse your problem.
If you’re looking at variation, then continuous data such as time, size, weight, or temperature will be of most use whereas if you're chasing down the source of errors then attribute data, such as the number and type of defects, will be better.
If suitable data is not available, then sometimes a proxy can be used, or you may need to take specific samples from an operational system. However you get it, finding Goldilocks is well worth the effort.