An editorially independent publication supported by the Simons Foundation.                                  Data

When Data Is Missing, Scientists Guess. Then Guess Again.

submited by
Style Pass
2024-10-02 14:30:08

An editorially independent publication supported by the Simons Foundation.

Data is almost always incomplete. Patients drop out of clinical trials and survey respondents skip questions; schools fail to report scores, and governments ignore elements of their economies. When data goes missing, standard statistical tools, like taking averages, are no longer useful.

“We cannot calculate with missing data, just as we can’t divide by zero,” said Stef van Buuren, the professor of statistical analysis of incomplete data at the University of Utrecht.

Suppose you are testing a new drug to reduce blood pressure. You measure the blood pressure of your study participants every week, but a few get impatient: Their blood pressure hasn’t improved much, so they stop showing up.

You could leave those patients out, keeping only the data of those who completed the study, a method known as complete case analysis. That may seem intuitive, even obvious. It’s also cheating. If you leave out the people who didn’t complete the study, you’re excluding the cases where your drug did the worst, making the treatment look better than it actually is. You’ve biased your results.

Leave a Comment