Encoding IP Address as a Feature for Network Intrusion Detection

2019-12-03T13:49:08Z (GMT) by Enchun Shao
As machine learning algorithms take on more important roles in various areas of big data analysis, more accurate research is needed. Although machine learning techniques have been applied in network intrusion detection, encoding IP addresses as a feature of network intrusion detection has not been discussed. Since the IP address is strongly relevant to network intrusion detection, it cannot be ignored when predicting a network attack. Therefore, three machine learning algorithms - random forest, support vector machine (SVM), and decision tree have been applied for the present study to examine three IP address encoding methods for NetFlow data: converting into four individual numbers, converting into binary integers, and one hot encoding. The pivot of the study was to analyze the F-1, precision, recall, and accuracy scores of the machine learning algorithms and determine the best method of encoding IP addresses for network intrusion detection. In addition, 21 features of the data set related to the destination port, packets, destination IP, source port, flow duration, source IP, flags, and labels were also considered, as well as speed. The study shows that the best method of encoding an IP address was splitting the IP address into four numbers and the decision tree, random forest, and SVM accuracy scores for this method were 0.9562, 0.9631, and 0.9296, respectively.