TY - JOUR
T1 - Host-based intrusion detection with multi-datasource and deep learning
AU - Hwang, Ren Hung
AU - Lee, Chieh Lun
AU - Lin, Ying Dar
AU - Lin, Po Chin
AU - Wu, Hsiao Kuang
AU - Lai, Yuan Cheng
AU - Chen, C. K.
N1 - Publisher Copyright:
© 2023
PY - 2023/11
Y1 - 2023/11
N2 - Modern hackers display increasing sophistication. Intrusion detection systems, both network-based and host-based, now utilize machine learning for improved detection of such advanced attacks. While most of these systems rely on a single data source for training, practical scenarios often involve attack features scattered across multiple sources, posing challenges to the system's effectiveness in detection. This impairs their potential for attack detection. Thus, this study assesses three host-based data sources—network traffic, system logs, and host statistics. It evaluates and compares their combined detection capabilities across diverse attack stages and types. In the proposed framework, network traffic data is handled by a Convolutional Neural Network (CNN) for improved automatic feature selection. System log data are processed using Long Short-Term Memory (LSTM) and an attention model to enhance temporal relationship exploration. Host statistics are processed by a Deep Neural Network (DNN) to improve classification performance. Experimental results show that the F1-scores reach 1.0 for all considered attacks and attack stages when all three data sources are utilized in the detection process. Additionally, employing diverse models based on the data type leads to improved results, a fact exemplified by Lin et al. (2022) which exclusively utilized XGBoost. The host statistics were found to be highly effective in detecting attacks and were thus investigated further for different attack methods and attack stages. The results showed that the disk usage percentage (DSK), minor memory faults (MINFLT), major memory faults (MAJFLT), total virtual memory growth during the last interval (VGROW), and total resident memory growth during the last interval (RGROW) were primarily affected by all the attacks in the initial access and command and control stages. By contrast, in the impact attack stage, the affected system resources varied widely depending on the particular attack.
AB - Modern hackers display increasing sophistication. Intrusion detection systems, both network-based and host-based, now utilize machine learning for improved detection of such advanced attacks. While most of these systems rely on a single data source for training, practical scenarios often involve attack features scattered across multiple sources, posing challenges to the system's effectiveness in detection. This impairs their potential for attack detection. Thus, this study assesses three host-based data sources—network traffic, system logs, and host statistics. It evaluates and compares their combined detection capabilities across diverse attack stages and types. In the proposed framework, network traffic data is handled by a Convolutional Neural Network (CNN) for improved automatic feature selection. System log data are processed using Long Short-Term Memory (LSTM) and an attention model to enhance temporal relationship exploration. Host statistics are processed by a Deep Neural Network (DNN) to improve classification performance. Experimental results show that the F1-scores reach 1.0 for all considered attacks and attack stages when all three data sources are utilized in the detection process. Additionally, employing diverse models based on the data type leads to improved results, a fact exemplified by Lin et al. (2022) which exclusively utilized XGBoost. The host statistics were found to be highly effective in detecting attacks and were thus investigated further for different attack methods and attack stages. The results showed that the disk usage percentage (DSK), minor memory faults (MINFLT), major memory faults (MAJFLT), total virtual memory growth during the last interval (VGROW), and total resident memory growth during the last interval (RGROW) were primarily affected by all the attacks in the initial access and command and control stages. By contrast, in the impact attack stage, the affected system resources varied widely depending on the particular attack.
KW - DL-based anomaly detection
KW - HIDS
KW - Host statistics
KW - Multiple data sources
KW - Network traffic
KW - System logs
UR - http://www.scopus.com/inward/record.url?scp=85173896100&partnerID=8YFLogxK
U2 - 10.1016/j.jisa.2023.103625
DO - 10.1016/j.jisa.2023.103625
M3 - 期刊論文
AN - SCOPUS:85173896100
SN - 2214-2134
VL - 78
JO - Journal of Information Security and Applications
JF - Journal of Information Security and Applications
M1 - 103625
ER -