Incomplete datasets are usually caused by missing values. That is, some attribute value(s) of the data samples are missing. The missing value problem occurs due to problems such as manual data entry procedures, incorrect measurements, equipment errors, and so on. As a result, this kind of incomplete datasets can lead to performance degradation for the data mining purpose. To solve this problem, the case deletion and missing value imputation can be considered. In this three-year project, the aim of the first year research is to review and survey related works of missing value imputation from 2000 to 2015 in order to figure out the limitations of related literatures. On the other hand, the applicability of using case deletion is also examined. That is, different types missing data (i.e. categorical, numerical, and mixed types) and different missing rates are studied. The second year research focuses on comparing statistical and supervised learning techniques for missing value imputation. In particular, six different algorithms will be compared. Finally, the aim of the third year research is to propose a hybrid learning based imputation method to improve the quality of missing value imputation.
|Effective start/end date||1/08/18 → 31/07/19|
In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):