Host-based intrusion detection with multi-datasource and deep learning

Ren Hung Hwang, Chieh Lun Lee, Ying Dar Lin, Po Chin Lin, Hsiao Kuang Wu, Yuan Cheng Lai, C. K. Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Modern hackers display increasing sophistication. Intrusion detection systems, both network-based and host-based, now utilize machine learning for improved detection of such advanced attacks. While most of these systems rely on a single data source for training, practical scenarios often involve attack features scattered across multiple sources, posing challenges to the system's effectiveness in detection. This impairs their potential for attack detection. Thus, this study assesses three host-based data sources—network traffic, system logs, and host statistics. It evaluates and compares their combined detection capabilities across diverse attack stages and types. In the proposed framework, network traffic data is handled by a Convolutional Neural Network (CNN) for improved automatic feature selection. System log data are processed using Long Short-Term Memory (LSTM) and an attention model to enhance temporal relationship exploration. Host statistics are processed by a Deep Neural Network (DNN) to improve classification performance. Experimental results show that the F1-scores reach 1.0 for all considered attacks and attack stages when all three data sources are utilized in the detection process. Additionally, employing diverse models based on the data type leads to improved results, a fact exemplified by Lin et al. (2022) which exclusively utilized XGBoost. The host statistics were found to be highly effective in detecting attacks and were thus investigated further for different attack methods and attack stages. The results showed that the disk usage percentage (DSK), minor memory faults (MINFLT), major memory faults (MAJFLT), total virtual memory growth during the last interval (VGROW), and total resident memory growth during the last interval (RGROW) were primarily affected by all the attacks in the initial access and command and control stages. By contrast, in the impact attack stage, the affected system resources varied widely depending on the particular attack.

Original languageEnglish
Article number103625
JournalJournal of Information Security and Applications
Volume78
DOIs
StatePublished - Nov 2023

Keywords

  • DL-based anomaly detection
  • HIDS
  • Host statistics
  • Multiple data sources
  • Network traffic
  • System logs

Fingerprint

Dive into the research topics of 'Host-based intrusion detection with multi-datasource and deep learning'. Together they form a unique fingerprint.

Cite this