The following document is copyrighted, 1989, by Tim Sankary -
all rights reserved.  It may be copied and distributed freely as long
as no changes are made and as long as this copyright notice remains
with the document
	
	I want to preface this document with a personal statement.  I
am aware that Jim Goodwin has published a partial list of his virus
disassemblies and I can imagine the controversy that will result.  I
do not have an inside track to the "truth" of this Distribute/Don't
Distribute issue, and I can frankly see both sides of the argument.  I
find it hard, however, to censure a colleague who has performed such
excellent and dedicated work as Jim has, and I have to admire his
courage in taking such a controversial step.  For those of you who
anticipate writing or designing Identification and Removal programs
(CVIA Class III programs) for viruses, I hope you will find something
of value in the following study that will be useful.  If you have
access to disassemblies, this document may provide some insights into
designing your own disinfectant.
	I would like to thank "Doc" John McAfee for his guidance and
help in developing this paper, and the Computer Virus Industry
Association for the outstanding visual aids that they contributed.
These figures have been referenced in the paper but I have been unable
to create ASCII representations of them for BBS distribution.  If you
obtained this document from an electronic source and would like a copy
of the figures, they can be obtained by sending a stamped, self
addressed envelope to the CVIA, 4423 Cheeney Street, Santa Clara, CA.
95054.  - Tim Sankary
	    From the Homebase BBS
	    408 988 4004



	DEVELOPING VIRUS IDENTIFICATION PRODUCTS
	

	In January of 1986, the world's first computer virus was
unleashed upon an unsuspecting and largely defenseless population of
global IBM personal computers users.  The virus originated in Lahore,
Pakistan, and spread rapidly from country to country through Europe
and across to the North American Continent.  In less than twelve
months it had infected nearly a half-million computers and was causing
minor havoc in hundreds of universities, corporations and government
agencies.
	This virus, later dubbed the "Pakistani Brain", caught the
user community unawares and the problems resulting from its many
infections demonstrated how unprepared we were for this phenomenon.
The computer systems targeted by the virus contained no specific
hardware or software elements that could prevent or even slow its
spread, and few utilities could even detect its presence after an
infection occurrence.  Fortunately, the virus was not destructive, and
it limited its infections to floppy diskettes; avoiding hard disks
entirely.
	The first defensive procedure developed to counteract this
virus involved a simple visual inspection of a suspected diskette's
volume serial label.  The virus erased every infected diskette's
volume label and replaced it with the character string - "@BRAIN".
Thus, any inspection of the volume label, such as performing a simple
DIRECTORY command, would indicate the presence or absence of the
virus.  An infected diskette could then be reformatted, or the virus
could be removed by replacing the boot sector.  This manual procedure
is a typical, if somewhat rudimentary, example of the type of
functions performed by a class of antiviral utilities commonly called
Infection Identification products.
	Infection identification products generally employ "passive"
techniques for virus detection.  That is; they work by examining the
virus in its inert state.  This contrasts with active detection
products which look for specific actions employed by a virus.  For
example, looking for a Format instruction within a segment of code on
a disk would be a passive method of detecting a potentially
destructive program.  If we detected the Format attempt during program
execution, however, we would be performing an active detection.
Passive methods concern themselves with the static attributes of
viruses, active methods concern themselves with the results of virus
execution.
	Example active indicators are: the attempted erasure of
critical files, destruction of the FAT table, re-direction of system
interrupt vectors, general slowdown of the system, or an attempt to
modify an executable program.  These indicators are generic; that is,
they are common to a large class of viruses.  Because so many viruses
perform these common activities, however, they are of little use in
identifying individual virus strains.  It is the passive virus
indicators that prove most useful to a positive identification: The
characteristic text imbedded within the virus, specific flags,
singular filenames or a distinctive sequence of instructions that are
unique to the virus.  These and other similar indicators can best be
ascertained by scanning system storage and examining the program files
and other inert data.
	
History
	Virus identification products have their genesis in the
utility programs first developed in 1982 and 1983 to check public
domain software for bombs or trojans before they were executed.  These
utility programs initially checked for questionable instructions in
the suspect program's object code.  Direct input/output instructions,
interrupt calls, format sequences and like instructions, if found,
were flagged and the user was notified.  Later versions included tests
for imbedded data strings that were typically used by trojan
designers.  Suspect programs were scanned for profanity, for keywords
like "gotcha" or "sucker", and for data strings that had been found in
specific trojan programs.  Some programs looked also for specific
names of files that were frequently used by trojans and bombs.
	These products, however, were seldom able to identify a
specific bomb or trojan.  Rather, they indicated that the suspect
program contained instructions or messages of a questionable nature -
implying that the program might be a generic trojan.  This, however,
is not sufficient for dealing with viruses.
	Viruses create entirely different problems than bombs or
trojans.  Viruses replicate, and can infect hundreds or even thousands
of programs within an installation.  They remain invisible for long
periods of time before they activate and cause damage.  And, they are
difficult to remove because they imbed themselves within critical
segments of the system.  It is not sufficient to know that a virus is
present, it is necessary to know which virus is present.  We must know
how it infects, what actions it takes, and, most importantly, what
must be done to de-activate and remove the virus.
	Thus, when the first virus identification products emerged in
1986 they didn't just look for generic code or messages, they looked
for specific indications that could identify the individual virus
strain.  This allowed the user to verify a specific infection
occurrence and take appropriate action.  Later versions of these
products went a step further.  They actually removed the virus when an
infection was identified.
	  
Techniques
	Before we discuss the techniques used by identification
products, we need to look briefly at how viruses insert themselves
into programs.  As shown in Figure 1, viruses actually modify the
structure of the programs that they infect.  Generally, the virus
replaces the program's start-up segment with a routine that passes
control to the main body of the virus.  This main body code may be
inserted within the program in a buffer area, or it may be added to
the beginning or the end of the program.  After execution of the
virus, the program's original start-up sequence is replaced and
control is passed to the program.
	When removing a virus from an infected program, it is crucial
to determine exactly how the virus modified the program.  Each virus
differs from other viruses in size, segmentation and technique.  Each
virus chooses a different area for infection, stores the start-up
sequence in a different location. and return control in a different
manner.  We must know exactly what the virus did during the infection
process in order to reverse the steps for removal.
	Thus, it should be clear that in order to develop an antidote
for a specific virus, we must first obtain a copy of the virus for
analysis.  A thorough analysis of the structure and design of the
virus will provide the answers to all of the above questions.
	When a virus has been disassembled and analyzed, we in theory
know all there is to know about the virus.  We are then able to create
an "attribute file" for the virus.  This file contains all of
characteristics of the virus that can be uniquely assigned to the
virus.  For example, we may find imbedded data within the virus that
we would not reasonably expect to find in any other program or data
file.  Or we may find an instruction sequence that is sufficiently
unusual that we would not expect any other program to use the exact
same sequence.  Figure 2 shows two virus examples that contain unique
imbedded data.	 In the Pakistani Brain example, it is clear that we
would not expect to find the exact same name, address and telephone
number in any other program.
	In addition to "identification" attributes, the attribute file
contains all information necessary to reverse the virus infection
process.  Common elements of an attribute file might be:
		- Executable code signatures
		- Volume label flags
		- Hidden file names
		- Absolute sector address contents
		- Key data at specific file offsets
		- Specific interrupt vector modifications
		- ASCII data content
		- Specific increases in bad sector counts
	When the attribute file has been created, it is inputted into
a program that scans all of the appropriate areas of system storage
looking for combinations of the attributes.  As more attributes are
discovered, the degree of assurance that the virus is present
increases.  For example, the character string "sUMsDOS" is common to
all versions of the Israeli virus.  It is conceivable, however, that
the same string could appear randomly in any text file.  Therefore,
the identification program will look for verification attributes, such
as the file offset where the character string was located, or a
sequence of instructions following the data.
	When the virus has been identified, the removal phase begins.
Since the infection attributes of the virus are known, the removal
process is fairly straightforward.  Usually it involves locating the
main body of the virus and all segments of the original program that
had been re-located by the virus.  The virus is erased, and the
program is then re-constructed.
	Clearly, multiple attribute files can be used by a single
program.  Thus, single identification products are able to identify
multiple strains of viruses (see Figure 3).

Product Advantages
	Infection identification products have a major advantage over
other types of virus protection products: They are able to determine
whether or not a system is already infected.  This is a serious
concern in many organizations.  Other classes of virus protection
products must assume that a given system is uninfected at the time the
products are installed.  They log the state of the system at the time
they are installed and periodically compare the current state to the
original state.  If a virus has infected the system in the interim,
the change will be detected.  If a virus has already infected the
system before such products are installed, however, the virus will be
logged as part of the original system, and no change will be detected.
	Infection identification products, on the other hand, are
specifically designed to look for and identify pre-existing
infections.  This ability to identify an existing infection is in many
cases crucial to the success of implementing antiviral measures.
Since a virus may remain dormant for months or even years before it
activates and damages the system, pre-existing infections could cause
widespread destruction in spite of our best efforts at implementing
protection programs.
	Automatic removal is the second advantage of identification
products.  Virus infections can sometimes involve hundreds or
thousands of programs within an organization.  When the virus is
discovered, the task of tracking down and disinfecting all of the
infected programs can become monumental.  In many cases, multiple
versions of a single program may be infected, or the original source
diskettes may have been lost or misplaced.  In some cases, infected
programs may be overlooked or incorrectly replaced, so that re-
infection becomes a problem.  These and other issues invariably cause
problems.  The identification products, however, automatically find,
identify and remove the infection, normally at a rate of a few seconds
per infected program.  The time savings alone can be enormous.
	A third advantage to identification programs is that they
cannot be circumvented by a known virus.  Other types of products that
use active methods for infection prevention or detection can be
specifically targeted by viruses.  The virus can seek out and destroy
or disable the active element of such products.  For example, if the
product is a filter type product that monitors all system I/O, the
virus can steal the interrupts from the monitor and thus bypass the
program's checking function.  Likewise, if a protection program uses a
checksum or other method to look for change within a program, the
virus can modify the program's checksum routine so that the change
caused by an infection will not be detected.  These and other
techniques have been used by many viruses to avoid interference by
antiviral programs that use active detection methods.
	Identification products, on the other hand, cannot be so
easily circumvented.  Since these products use passive techniques, the
virus has no control over the products' functions.  Keep in mind that
the virus and its resultant system modifications are merely a sequence
of inert bits as far as the identification product is concerned.  Also
the virus is not active at the time the product is being used (all
such products come with their own boot diskettes, and they run
stand-alone).  Thus, the virus can in no way affect the product's
operation, or even be aware of its presence.
	
Problem areas
	There are some drawbacks to identification products however.
The first problem is that these products only work for known viruses.
That is, a virus that has been around long enough to be noticed,
isolated, sampled, disassembled and analyzed.  This may take a
considerable time if the virus is unobtrusive and slow to activate.
When the virus has been discovered and analyzed, the identification
product must be designed, implemented, packaged, marketed and
distributed - a process that could take considerably more time.  Thus
identification utilities will lag new virus developments by months, or
in some cases, even years.  This time lag implies that there will
always be new viruses, and thus new dangers, against which no
identification utility will be effective.
	The second problem with these products is more thorny, and
requires a high level of product sophistication in order to resolve.  At
issue is a phenomenon that might be called the Uncertainty Factor, and
it is caused by the increasing tendency of hackers to collect existing
viruses, modify them and return them to the public domain.  These
modifications sometimes cause viruses to react differently from the ways
in which they were originally designed, yet they may leave key
identification attributes unchanged.
	For example, the Jerusalem virus was originally designed to slow
down the infected machine's processor one-half hour after an infected
program was executed.  This slowdown was a nuisance to the user of the
infected machine, but it severely limited the spread of the virus,
because the virus made itself known early in the infection process and
had limited time to replicate before being removed.  In the summer of
1988, an unknown hacker modified the virus by changing just one
instruction (see Figure 4).  This modification disabled the routine that
caused the system to slowdown, and as a result, the virus became many
times more infectious.
	Modifications like this, and other more substantial
modifications, are made almost daily to existing viruses.  The danger
that these modifications pose to identification products is substantial.
If an identification product is attempting to remove a virus that has
infected a program differently than the way in which the product
expects, then the results of the disinfection will be unpredictable.
Damage to the system may result, the program may be destroyed or, in the
worst case, the virus will still be active even though the product
thinks it has removed it.
	In order to minimize the risks posed by this problem,
identification products must be designed to cross reference as many
virus attributes as possible prior to attempting removal.  If any one of
the expected attributes has been changed, or is missing, the product
should notify the user of the potential problem and manual intervention
will be required.

Future Prospects
	Identification products clearly must play a major role in the
battle against computer viruses.  As viruses become more widespread and
as infections become more common, the need for utilities able to
identify and help remove viruses will become apparent.  It is probable
that these products will become the dominant form of virus protection in
the future.  A few technical advances, however would greatly aid their
general acceptance.
	One of the problems facing identification products is the time
required to fully scan attached storage devices when searching for a
virus.  For example, as many as ten or more minutes can be required to
fully scan a 40 megabyte drive while looking for just one virus.
Multiple virus checks require more time.  Because of this, it is
impractical to perform frequent scans of the system.  This is
unfortunate because it would be advantageous to perform a complete
identification check of a system each time the system was booted.  This
would provide a high degree of system security, assuming that the
identification product was kept up to date.  More sophisticated
algorithms for searching attached storage and creative techniques for
multiple virus scans could alleviate the time scan problem.
	A second desirable advance in the technology of these products
would be the development of techniques that could identify variations of
known viruses and still provide the capability to remove the modified
virus.  This advance would remove a major limitation of the current
products and would greatly increase their reliability.  Techniques for
removing variations have already been developed for a few root viruses,
but there currently exists no generic technique that is effective for a
large class of viruses.  I anticipate that this hurdle will be overcome
within a year or two.
	A final enhancement would be the ability to fully or partially
re- structure data that has been corrupted by a virus after it has
activated.  Currently, infection identification products are only useful
if they are used before a virus begins its destructive phase.  When the
destructive phase begins, the virus may destroy critical control tables,
data files, programs or even itself.  At this point all current virus
products have limited usefulness.
	It is possible in some cases, however, to reverse much of the
destruction caused by a virus provided: 1) We know the details of the
destruction process, and 2) The destructive phase has not gone on too
long.  For example, one of the common PC viruses scrambles the File
Allocation Table by reversing a number of the entries.  Since we know
the exact way in which the virus scrambles the information, we can
easily unscramble it.  However, after a few days of data scrambling, the
virus initiates a low level format of the hard disk.  At this point, no
recovery is possible.
	I anticipate that future products will incorporate recovery
capabilities for a large number of virus destructive acts.  This
capability, and others described above, should provide the best virus
protection that we can hope to achieve.