|dc.description.abstract||Currently, malware continues to represent one of the main computer security threats. It is difficult to have efficient detection systems to precisely separate normal
behavior from malicious behavior, based on the analysis of network traffic. This is due to the characteristics of malicious and normal traffic, since normal traffic is very complex, diverse and changing; and malware is also changeable, migrates and hides itself pretending to be normal traffic.
In addition, there is a large amount of data to analyze and the detection is required in real time to be useful. It is therefore necessary to have an effective mechanism to detect malware and attacks on the network. In order to benefit from multiple different classifiers, and exploit their strengths, the use of ensembling algorithms arises, which combine the results of the individual classifiers into a final result to achieve greater precision and thus a better result. This can also be applied to cybersecurity problems, in particular to the detection of malware and attacks through the analysis of network traffic, a challenge that we have raised in this thesis.
The research work carried out, in relation to attack detection ensemble learning, mainly aims to increase the performance of machine learning algorithms by combining their results. Most of the studies propose the use of some technique, existing ensemble learning or created by the authors, to detect some type of attack in particular and not attacks in general. So far none addresses the use of Threat Intelligence (IT) data in Ensemble Learning algorithms to improve the detection process, nor does it work as a function of time, that is, taking into account what happens on the network in a limited time interval. The objective of this thesis is to propose a methodology to apply ensembling in the detection of infected hosts considering these two aspects.
As a function of the proposed objective, ensembling algorithms applicable to network security have been investigated and evaluated, and a methodology for detecting infected
hosts using ensembling has been developed, based on experiments designed and tested with real datasets. This methodology proposes to carry out the process of detecting infected hosts in three phases. These phases are carried out each a certain amount of time.
Each of them applies ensembling with different objectives. The first phase is done to classify each network flow belonging to the time window, as malware or normal. The second phase applies it to classify the traffic between an origin and a destination, as malicious or normal, indicating whether it is part of an infection. And finally, the third phase, in order to classify each host as infected or not infected, considering the hosts that originate the communications.
The implementation in phases allows us to solve, in each one of them, one aspect of the problem, and in turn take the predictions of the previous phase, which are combined with the analysis of the phase itself to achieve better results. In addition, it implies carrying out the training and testing process in each phase. Since the best model is obtained from training, each time it is performed for a given phase, the model is adjusted to detect new attacks. This represents an advantage over tools based on firm rules or static rules, where you have to know the behavior to add new rules.||en_US