Elena Peterson - Pacific Northwest National Laboratory
Flexible and Adaptive Malware Identification Using Techniques from Biology
Aug 19, 2020
Download: MP4 Video Size: 235.5MBWatch on YouTube
Abstract
Cyber security data in many ways mimics the behavior of organic systems. Individuals or groups compete for limited resources using a variety of strategies, the most effective of which are re-used and refined in later ‘generations'. Traditionally this behavior has made detection of malware very difficult because 1) recognition systems are often built on exact matching to a pattern that can only be ‘learned' after a malicious entity reveals itself and 2) the enormous volume and variation in benign code is an overwhelming source of previously unseen entities that often confound detectors. In addition, the enormous volume of malware artifacts is overwhelming anyone trying to categorize and characterize new additions to the many malware repositories as so much of the processing is done by hand.
To turn the tables of complexity on the attackers, we have developed a method for mapping the sequence of behaviors that make up a malicious artifact to strings of text and analyze these strings using modified bioinformatics algorithms. Bioinformatics algorithms optimize the alignment between text strings even in the presence of mismatches, insertions or deletions and do not require an a priori definition of the patterns one is seeking. Nor do they require any type of exact matching. This allows the data itself to suggest meaningful patterns that are conserved between binaries. These patterns can be used to identify zero-day malware and can help to automate the curation and characterization of large quantities of suspected malware. I will talk about our MLSTONES capabilities as an innovative and effective way of detecting and characterizing most types of malware artifacts. I'll also discuss how these capabilities can be used on other types of cyber security data.