We needed to estimate the average number of years after purchase various consumer products broke down from a large dataset containing:
Such estimates are best provided using survival analysis. We outline below how we applied it in this case, and some of the issues that arose.
Survival analysis is an area of statistics that studies the (expected) length of time until some particular “event” occurs. This involves estimating the probability distribution of the length of time until the event happens, i.e., constructing a function whereby for each time-period, one can estimate the chances of the event having occurred in the time-period. In our case, the event is the “failure” / “breakdown” of the machine – its (end of) lifetime – whilst time is the age of machine when the particular event happens.
Clearly, we are actually interested in the cumulative probability distribution function, i.e., we wish to know, for any age, the probability that the breakdown has occurred at any point from purchase up to the stated age. For the object of interest is length of time before the event occurs. Indeed, since we are analysing product life-expectancy, it is far more useful to construct what’s known as the survival function S(t); this is the probability of the event not having happened for each age period, i.e., the probability of surviving beyond that age point.