13 November 2021
POLITECNICO DI TORINO Repository ISTITUZIONALE
Machine Learning and other Computational-Intelligence Techniques for Security Applications / Marcelli, Andrea. - (2019 Sep 11), pp. 1-113.
Original
Machine Learning and other Computational-Intelligence Techniques for Security Applications
Publisher:
Published DOI:
Terms of use:
openAccess
Publisher copyright
(Article begins on next page)
This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository
Availability:
This version is available at: 11583/2751497 since: 2019-09-13T08:17:31Z
Politecnico di Torino
Politecnico di Torino
Machine Learning and other Computational-Intelligence Techniques for Security Applications
Supervisor:
Prof. Giovanni Squillero
Candidate:
Andrea Marcelli
Abstract
Machine learning and evolutionary computation are powerful tools that achieved in- credible results in the most variegate fields. While the techniques are quite known, their application requires a deep knowledge in the field of usage. This thesis explores the appli- cation of computational intelligence methodologies to open problems in computer security, mainly in the field of malware families detection.
Malware is a big business. With hundreds of thousands of malware delivered every day, manual analysis in not an option. Malicious samples are commonly detected using a combination of techniques, ranging from machine learning to hash-based content. How- ever, the industry mostly relies on signatures, which are patterns extracted from the code or behavior of selected samples. Generating effective signatures, with 0-false positives, and low false negatives rates, is a task that requires a considerable amount of time and resources from skilled experts, while automatically generating them is an open problem.
In this thesis, we propose a semi-supervised methodology for the automatic identi- fication of malware families, used to safely extend experts knowledge on new malicious samples, and to reduce the amount of applications to manually analyze. Then, newly discovered samples are submitted to an automatic signature generation procedure, which produces a formal rule which has a limited risk of detecting false positives in the future, yet it is general enough to catch future threats.
The effectiveness of the approach is assessed running experiments on 1.5 million An- droid applications, the largest dataset ever used in a public research on Android malware.
The procedure has been implemented in two frameworks which have been publicly re- leased: YaYaGen for Android applications, and YaYaGenPE for Windows. Furthermore, since January 2018, part of the proposed approach is in use in Koodous, a collaborative analysis platform for Android, developed by Hispasec.