In practice, missing data are very common in real data processing. The reasons may comprise data entry errors, information hiding, or fraud. In this a

Missing data imputation

submited by

Style Pass

2022-05-20 18:30:16

In practice, missing data are very common in real data processing. The reasons may comprise data entry errors, information hiding, or fraud. In this article, we will discuss in which cases incorrect handling of missing data by simple methods will lead to errors in models and decision-making.

Often there are missing data in the data requiring processing, so the analyst is faced with a choice: to ignore, discard, or fill in the missing values. Filling in the gaps often and quite reasonably seems to be the preferred solution. However, this is not always the case.

An unsuccessful choice of the method of filling in the gaps can not only be useless in terms of improvement, but can also deteriorate the results. This article discusses simple methods for processing of missing data that are widely used in practice, their advantages, and disadvantages.

Exclusion and ignoring of rows with missing values has become the default solution in some popular application packages, which may give novice analysts the impression that this solution is the right one. In addition, there are quite easy-to-implement and easy-to-use methods for missing data processing, called ad-hoc methods, the simplicity of which may be the reason for them being chosen: