A Primer on Missing Data


In the real world, datasets are often messy – it is common for values to be missing or corrupt. Examples include empty cells in spreadsheets, unanswered survey questions, or readings from faulty sensors. Unfortunately, despite the frequent occurrence of such defects, software engineers tend not to develop algorithms that are robust to missing values. As a result, many standard algorithms fail on such datasets. This talk briefly discusses the theory of missing data and practical approaches for dealing with missingness in real-world machine learning. Presented at IndabaX South Africa 2019.

Video Slides