By Robert Lemos
10th June 2006
The reverse engineer - better known amongst security researchers by
his nom de plume, Halvar Flake - created an automated system for
classifying software into groups, a process for which he believes
machines are much better suited.
Research using the system has underscored the sometimes-arbitrary
decisions humans make in classifying malicious programs, he said.
Among other anomalies, he found that Sasser.D has only a 69 per cent
correlation to previous members of the Sasser family, while two
examples of bot software, Gobot and Ghostbot, are more similar.
"It's like putting donkeys and bunnies in the same class because they
both have long ears," Dullien, the founder and CEO of
reverse-engineering tool maker Sabre Security, said in a recent
The current problems with classifying and naming viruses are among the
reasons that automated classification technology has once again become
a focus of research. The plethora of names for specific malicious
programs has caused confusion amongst consumers, despite a project
that seeks to provide guidance, if not to consumers, to software
analysts and incident responders.
In January, when a new computer virus appeared on the internet,
anti-virus companies rushed to issue alerts and inundated consumers
with a confusing array of names: Blackmal, Nyxem, MyWife, KamaSutra,
Blackworm, Tearec and Worm_Grew all describe the same mass-mailing
Several research projects hope to improve upon that record.
Last month, at the annual conference of the European Institute for
Computer Anti-Virus Research (EICAR), Microsoft released early results
of its development of a system to automate classification of malicious
software based on the actions performed by the code at runtime.
"A significant challenge we have today is the large number of active
malware samples, totaling on the order of tens of thousands, and
increasing rapidly," Microsoft researcher Tony Lee said in a recent
blog posting following the conference. "It has become apparent to us
that the traditional manual analysis process is not adequate in
dealing with malware of this order of magnitude, and that we should
seek automation technologies to aid human analysts."
The researchers modeled a piece of malicious software as the series of
actions that the software takes at the operating system level.
Referred to as "events" in a paper written by Lee and anti-malware
program team manager Jigar Mody, the actions can include data copying,
changing registry keys and opening network connections.
The researchers then trained a recognition engine using an adaptive
clustering algorithm - similar to self-organising maps - and
classified a previously unseen subset of malware using the trained
system. Using more clusters typically resulted in better
classification. When the software samples were classified based on 100
events, accuracy fell below 80 per cent, while classification based on
500 and 1,000 events typically has accuracy rates above 90 per cent.
Reverse engineer Dullien takes a different approach. Working with
other researchers at Sabre Security, he used automated tools to
deconstruct the actual code of virus and bot software, removing any
common libraries that the code might use and then comparing the
relationships between functions to characterise the software.
Using a database of 200 samples of bot software, a test case for the
automated process resulted in two major families of code, three
smaller groups, and several pairs and singletons. The system also
identified variants of bot software not recognised by a
signature-based anti-virus system.
Dullien believes that static analysis is a better approach to malware
classification than Microsoft's runtime analysis. Actions that a
malicious program does not perform right away - known as time-delayed
triggers - can foil runtime analysis, he said. And virus and
attack-tool writers could add a few lines of code to a program to
confuse runtime analysis, he added.
"The approach presented in the paper can be trivially foiled with very
minor high-level-language modifications in the source of the program,"
he stated in a blog entry analysing Microsoft's system.
Microsoft declined to make its researchers available for interviews.
However, in the paper, the authors argued that a combination of both
static analysis and runtime analysis would likely perform best. For
example, static analysis appears to deliver results more quickly;
Microsoft's behavioral classification requires three hours to cluster
400 files at the 1,000 event limit, according to the paper.
In some ways, software classification resembles the state of
biological classification back in the time of Carl Linnaeus. The 18th
century botanist pushed the scientific community of his day into
accepting a hierarchical classification system for plants and animals.
However, early classifications relied on external similarities, much
in the way that many of today's classifications rely on external
attributes of programs rather than their internal processes.
At least one other project hopes to help human analysts do a better
job of classification.
OffensiveComputing.net, a project founded by researchers Val Smith and
Danny Quist, aims to create a database of malware that records a
number of basic attributes of the code, including checksums,
anti-virus scanner results, and what type of packer the malware uses
to compress itself. The project started in response to the increase in
code sharing amongst virus and attack-tool writers and the faster
development of exploits and the faster incorporation of those exploits
into existing malicious software, OffensiveComputing's Smith said.
"The biggest benefit is more rapid response to complex threats. As the
synergy between viruses, Trojans, worms, rootkits and exploits grows,
waiting for a solution becomes more dangerous."
OffensiveComputing's database gives incident response workers and
analysts access to meaningful data about malicious software, which is
especially necessary until automated analysis programs, such as
Microsoft's and Dullien's classification systems, mature. The project
strives to be adaptable, involve the community, have measurable
results, and remain open, Smith said.
"There is an arms race going on between analysts and malware authors,
so any solution will have to keep pace with advances on both sides."
This article originally appeared in Security Focus.
Copyright =A9 2006, SecurityFocus
Attend the Black Hat Briefings and
Training, Las Vegas July 29 - August 3
2,500+ international security experts from 40 nations,
10 tracks, no vendor pitches.