POLITECNICO DI TORINO Repository ISTITUZIONALE

(1)

13 November 2021

POLITECNICO DI TORINO Repository ISTITUZIONALE

Machine Learning and other Computational-Intelligence Techniques for Security Applications / Marcelli, Andrea. - (2019 Sep 11), pp. 1-113.

Original

Machine Learning and other Computational-Intelligence Techniques for Security Applications

Publisher:

Published DOI:

Terms of use:

openAccess

Publisher copyright

(Article begins on next page)

This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository

Availability:

This version is available at: 11583/2751497 since: 2019-09-13T08:17:31Z

Politecnico di Torino

(2)

Politecnico di Torino

Machine Learning and other Computational-Intelligence Techniques for Security Applications

Supervisor:

Prof. Giovanni Squillero

Candidate:

Andrea Marcelli

Abstract

Machine learning and evolutionary computation are powerful tools that achieved in- credible results in the most variegate fields. While the techniques are quite known, their application requires a deep knowledge in the field of usage. This thesis explores the application of computational intelligence methodologies to open problems in computer security, mainly in the field of malware families detection.

Malware is a big business. With hundreds of thousands of malware delivered every day, manual analysis in not an option. Malicious samples are commonly detected using a combination of techniques, ranging from machine learning to hash-based content. How- ever, the industry mostly relies on signatures, which are patterns extracted from the code or behavior of selected samples. Generating effective signatures, with 0-false positives, and low false negatives rates, is a task that requires a considerable amount of time and resources from skilled experts, while automatically generating them is an open problem.

In this thesis, we propose a semi-supervised methodology for the automatic identi- fication of malware families, used to safely extend experts knowledge on new malicious samples, and to reduce the amount of applications to manually analyze. Then, newly discovered samples are submitted to an automatic signature generation procedure, which produces a formal rule which has a limited risk of detecting false positives in the future, yet it is general enough to catch future threats.

The effectiveness of the approach is assessed running experiments on 1.5 million An- droid applications, the largest dataset ever used in a public research on Android malware.

The procedure has been implemented in two frameworks which have been publicly re- leased: YaYaGen for Android applications, and YaYaGenPE for Windows. Furthermore, since January 2018, part of the proposed approach is in use in Koodous, a collaborative analysis platform for Android, developed by Hispasec.