An investigation of solutions for handling incomplete online review datasets with missing values

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Online review helpfulness prediction is an important research issue in electronic commerce and data mining. However, the collected datasets used for the analysis and prediction of the helpfulness of online reviews often contain some missing attribute values, such as reviewer background and rating information. In related literatures, many studies have either used the case deletion approach to remove the data containing missing values or considered the imputation of missing values by the mean/mode method. However, none of them consider the direct handling approach without missing value imputation for online review datasets by decision tree-related techniques. Therefore, in this paper, we investigate the suitability of different types of approaches to solve the incomplete dataset problem of online reviews. Specifically, for missing value imputation, several supervised learning techniques including MICE, KNN, SVM, and CART are examined. Moreover, for the direct handling approach without missing value imputation, CART is also performed for this task. The experimental results based on the TripAdvisor dataset for review helpfulness prediction show that the approach where incomplete online review datasets are handled directly without imputation by CART significantly outperforms the other approaches, including case deletion and missing value imputation approaches.

原文???core.languages.en_GB???
期刊Journal of Experimental and Theoretical Artificial Intelligence
DOIs
出版狀態已被接受 - 2021

指紋

深入研究「An investigation of solutions for handling incomplete online review datasets with missing values」主題。共同形成了獨特的指紋。

引用此