Know thy enemy
NSF-funded project aims to mitigate malware and viruses by making them easily understandable
As the software development landscape evolves, new security vulnerabilities are surfacing. Traditionally, a software’s source code could shed light on its vulnerabilities, but acquiring high-quality source code for the purpose of finding weaknesses can be difficult because of “compiling.”
Compiling refers to the process of transforming and optimizing a program’s source code to generate a final executable, which is a file that causes a computer to perform indicated tasks according to the encoded instructions. While an executable performs well and runs quickly on computers, it no longer has any information about the original source code.
Today, more and more software is developed in high-level programming languages, such as C++, Go and Rust, because of their many advantages, including higher development speed and better software engineering practices. Most importantly, programs written in high-level languages are compiled into machine code, the elemental language of computers, and will execute on computers at what is known as native speed. Executing at native speed allows for the fastest results.
Unfortunately, cybercriminals have also joined the transition to high-level programming, meaning a growing number of computer viruses and malware are programmed using these languages. And existing techniques do not allow security analysts and researchers to uncover malevolent source code with satisfactory quality.
However, existing techniques do not allow security analysts and researchers to uncover source code with satisfactory quality.
Ruoyu “Fish” Wang, an assistant professor of computer science and engineering in the Ira A. Fulton Schools of Engineering at Arizona State University since 2018, is addressing this security concern with a 2022 National Science Foundation Faculty Early Career Development Program (CAREER) Award by discovering new techniques for recovering source code, a process known as decompilation.
“My project will develop a set of generic, automated decompilation techniques that transform these viruses and malware samples into accurate, concise and human-readable source code,” Wang says. “As an added benefit, this project will enable software hardening and vulnerability mitigation without accessing the high-level language source code of software, which will help improve the security portfolio in scenarios where legacy software is in use.”
Researchers have worked on binary decompilation for more than 25 years, yet a critical problem that continues to hinder progress is the lack of a clear metric to evaluate the output quality.
“A fundamental problem, as I see it, is that decompilation can lead to many different end goals, such as software behavior analysis, vulnerability discovery, generic hardening, patching and recompilation,” Wang says. “These goals may have vastly different requirements on various aspects of the output.”
Along with his students and colleagues in the School of Computing and Augmented Intelligence, one of the seven Fulton Schools, Wang will first develop a set of objectives under each end goal, then create standardized metrics for evaluating the quality of decompilation output.
“Guided by these metrics, we will develop novel techniques that will transform machine code into a high-level intermediate language known as angr IL, or AIL,” Wang says. “With different end goals, we may have different focuses or make different compromises during code transformation.”
The development of a new decompiler for each high-level programming language can be tedious and expensive. With that in mind, Wang and his team will aim to automatically generate programming-language-specific decompilation transformation rules by using a novel technique called Compiler Transformation Inference and Inversion, or CTII.
“We will use the latest progress in the fields of natural language processing and evolutionary computation to assist with the generation of these transformation rules,” Wang says. “We will open source all research artifacts under this award. The foundation of our research, angr and angr decompiler, are already available on GitHub.”
Wang’s research will take place in ASU’s Laboratory of Security Engineering for Future Computing, known as SEFCOM. Wang credits the skilled reputations of his SEFCOM colleagues — Assistant Professor Yan Shoshitaishvili, Associate Professor Adam Doupé and Assistant Professor Tiffany Bao — all of whom are computer science and engineering faculty in the School of Computing and Augmented Intelligence, as one of the reasons his project received NSF–funding.
“Our team is well known in the computer security community for conducting open, usable and reproducible research in binary analysis,” Wang says. “I like to work with fun and awesome people who share similar ideologies, and I firmly believe that modern systems research is only possible via a coordinated team effort. My colleagues and I form a great team at SEFCOM and ASU, and I do not see any possibility to enjoy the same level of productivity through teamwork anywhere else.”