TY - JOUR
T1 - An investigation of solutions for handling incomplete online review datasets with missing values
AU - Hu, Ya Han
AU - Tsai, Chih Fong
N1 - Publisher Copyright:
© 2021 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022
Y1 - 2022
N2 - Online review helpfulness prediction is an important research issue in electronic commerce and data mining. However, the collected datasets used for the analysis and prediction of the helpfulness of online reviews often contain some missing attribute values, such as reviewer background and rating information. In related literatures, many studies have either used the case deletion approach to remove the data containing missing values or considered the imputation of missing values by the mean/mode method. However, none of them consider the direct handling approach without missing value imputation for online review datasets by decision tree-related techniques. Therefore, in this paper, we investigate the suitability of different types of approaches to solve the incomplete dataset problem of online reviews. Specifically, for missing value imputation, several supervised learning techniques including MICE, KNN, SVM, and CART are examined. Moreover, for the direct handling approach without missing value imputation, CART is also performed for this task. The experimental results based on the TripAdvisor dataset for review helpfulness prediction show that the approach where incomplete online review datasets are handled directly without imputation by CART significantly outperforms the other approaches, including case deletion and missing value imputation approaches.
AB - Online review helpfulness prediction is an important research issue in electronic commerce and data mining. However, the collected datasets used for the analysis and prediction of the helpfulness of online reviews often contain some missing attribute values, such as reviewer background and rating information. In related literatures, many studies have either used the case deletion approach to remove the data containing missing values or considered the imputation of missing values by the mean/mode method. However, none of them consider the direct handling approach without missing value imputation for online review datasets by decision tree-related techniques. Therefore, in this paper, we investigate the suitability of different types of approaches to solve the incomplete dataset problem of online reviews. Specifically, for missing value imputation, several supervised learning techniques including MICE, KNN, SVM, and CART are examined. Moreover, for the direct handling approach without missing value imputation, CART is also performed for this task. The experimental results based on the TripAdvisor dataset for review helpfulness prediction show that the approach where incomplete online review datasets are handled directly without imputation by CART significantly outperforms the other approaches, including case deletion and missing value imputation approaches.
KW - Online reviews
KW - data mining
KW - incomplete datasets
KW - missing values
UR - http://www.scopus.com/inward/record.url?scp=85109618213&partnerID=8YFLogxK
U2 - 10.1080/0952813X.2021.1948920
DO - 10.1080/0952813X.2021.1948920
M3 - 期刊論文
AN - SCOPUS:85109618213
SN - 0952-813X
VL - 34
SP - 971
EP - 987
JO - Journal of Experimental and Theoretical Artificial Intelligence
JF - Journal of Experimental and Theoretical Artificial Intelligence
IS - 6
ER -