From: jmolini@nasamail.nasa.gov (JAMES E. MOLINI) Date: Wed, 4 Apr 90 14:03 PDT Message-Id: To: virus-l@ibm1.cc.lehigh.edu Cc: rdavis@nasamail.nasa.gov, lsnapp@nasamail.nasa.gov Subject: Universal Virus Detector? X-Lines: 272 I am working with a colleague on defining a robust virus detection utility. The paper below discusses an approach we are investigating. The work was undertaken as part of a research project sponsored by the National Aeronautics and Space Administration at the Johnson Space Center. Please look it over and tell us what you think. We would like to know what type of virus could be written to defeat this type of detector on a large scale. I know it is a rather long document, but you might find it interesting. Thanks in advance. A Universal Virus Detection Model by Chris Ruhl and James Molini Computer Sciences Corp. Email: jmolini@nasamail.arc.nasa.gov PREFACE This paper attempts to define an abstract model which will support the construction of a Universal Virus Detector. Although the restrictions imposed upon the model for 100% accuracy may be too severe to make such an implementation practical, it is quite feasible to achieve near universal virus detection in a user friendly fashion. This paper is distributed with the intent of discovering reasonable vulnerabilities in the concepts, or implementation. Comments are therefore encouraged. We have used an IBM PC Compatible running MS DOS 3.X as the candidate implementation platform for convenience. The paper does not discuss virus identification, which is a separate issue from detection. Although not "absolutely necessary," virus identification mechanisms dramatically reduce the time required to eradicate specific cases of virus infection. If you have questions, or see a flaw in the process, please let us know. We are building a virus detector, which could be placed into the public domain, that uses the techniques below to detect virus infections. Our initial tests have shown encouraging results. Please send comments/suggestions to Virus-L, or the authors at the Email address above. Please do not request code samples, or testing opportunities until we announce availability of the utility. Definitions Before proceeding with our discussion, it is important to define terms. The following definitions are taken (as faithfully as possible) from the most recent discussions about viruses on the various Email networks: VIRUS - A self-replicating program that must attach itself in some way to an existing executable on the target computer system in order to propagate. In doing so, no overt user action is required to further the replication process. TROJAN HORSE - A non-replicating malicious program that misleads the user in order to cause him/her to execute it's malicious code. Although it is malicious code, it is often hidden inside another piece of (apparently innocuous) code in order to escape detection. This type of program does not modify any existing executable files on the system. WORM - A self-replicating program that does not attach itself to other executable code in order to propagate. It relies upon some weakness in a multi-user system, or requires some sort of overt user action in order to operate. The technical feasibility of worms on single user computer systems is debatable. INFECTION - The act of modifying existing executable code in order to propagate a virus. MASKING - The act of preventing discovery by intervening at some point in the scanning process. Typically this effects an indication of a clean system, when, in fact, the environment under review has been modified. Understanding the scope of the virus problem, it is possible to define a circumstance in which a Universal Virus Detector (UVD) may be successful. We further scope the problem by NOT requiring that the UVD prevent an infection. Instead it can identify an infection after it has occurred. This principle is similar to the idea that smoke detectors are not responsible for preventing fires, although they may periodically work toward that end. They are actually responsible only for responding to indications that a fire may be present and warning the user of impending danger. UVD's must be scoped to a similar purpose for them to work. With this in mind, let us begin by defining the physic of computer viruses: A Virus Propagation Model Although a Virus is an abstraction (i.e. Program), the environment it attacks is not (i.e. IBM PC). Regardless of how creative the author is, he/she cannot change certain characteristics of the machine that the Virus inhabits. In order to develop this model the following assumptions are made: a.) The user will begin the detection process (we have proposed a CRC type routine) prior to infection. By doing so, the user has provided an uninfected baseline from which to judge future states. Although virus propagation may still be identifiable on an infected machine, the level of detection for subsequent states becomes indeterminate. b.) The user will avoid the introduction of self modifying code into the system. By doing so, he/she ensures that the target system maintains a given state of integrity. Given the assumptions above, we may now define the circumstances necessary to support a virus infection. Without the adherence to the following rules, it is impossible to define a circumstance in which a virus can propagate. Rule #1: A Virus infection, or propagation occurs when an executable file is altered. Proof: I) An un-altered executable will not be infected since by definition it is not altered. Here we are assuming that the original state of the machine is uninfected. II) An unaltered piece of code that performs malicious acts is a Trojan horse and thus, beyond the scope of this problem. III) A non-executable file, whether altered, or not, cannot further infect the system since by definition it is never executed. An altered non-executable is merely a damaged data file. Thus: Only altered executables can further infect the system. Note: In certain cases, a new executable file can be added to the system, but it still cannot infect the system, unless it is called from a modified file in the system, (violating I above) or unless the user intervenes, in which case the program is not a virus, but a worm, or Trojan Horse. Rule #2: Assuming that the detection mechanism is sufficiently robust, the only possible way to avoid detection is to mask the infection prior to having the detection results presented to the user. Proof: I) An un-altered procedure cannot mask an infected file since by definition its not altered to do so. (Masking requires foreknowledge of the code to be masked. Such a masking scenario would indicate a state of infection prior to installation of the UVD violating a basic assumption that you install it on a clean machine.) II) Masking requires some type of intervention in the file read/result presentation process. Here we assume that the computation of the checksum is sufficiently robust that no 2 different pieces of code can generate the same result. Therefore, since masking requires some type of modification of data in the path from storage to user and since the only 2 feasible parts to that path are either the read, or the delivery, any masking must be completed at one of the 2 ends of the pipe. Thus: Only altered procedures can mask the infection of executables. UVD CONSTRUCTION. >From the above discussion, we can begin defining a UVD with some degree of assurance that it will do the job. If a virus must modify the original state of the system in order to be successful, we can define a process that can detect that modification. (Insert your favorite Checksum/CRC/Cryptographic routine here.) Any program which circumvents the modification of existing code is not a virus. Then, to defeat the detection process, the virus must mask the infection in some way so that this verifiable change detection mechanism cannot present accurate results to the user. Any program which circumvents the detection mechanism must do so by modifying the data delivery process into, or out of the detector. Once again we are talking about code modification. We have recently seen an example of the masking effect. In that case the 4096 virus, masked all infected files prior to releasing the data to any process attempting to read them. Another masking attack would redirect all detector output to NULL on a PC, thereby depriving the user of any notification that the detector may have generated. Another option, which attempts to mask infections, is a directed attack against the utility. In order to prevent successful directed attacks, several methods can be used. The following methods attempt to validate the integrity of the detection code, or stored tables: a. Run the detector from a write-protected, bootable floppy, thus assuring a validated run time environment and providing a constant set of scan pointers. b. Use software to validate the detector prior to operation and validate the fact that the detector is operating with exclusive use of the CPU. Antigen is one example of code which validates the integrity of a program prior to execution. c. Prevent modification of computer checksums by prefixing each file with a set of detector specific state vectors. This approach obtains a set of memory resident pointers, or values that are specific to the region where the detector is run. These pieces of information are then prepended to each executable checked and act as a type of "fingerprint" for the virus detector. They will change from machine to machine and from version to version. As a result, no virus can intercept the data points and compute a substitute checksum. d. Finally, a simple way to hinder directed attacks in a general sense is to change file extensions, or stored text strings, which will make identification of the detector by directed viruses more difficult. Normally, this is only feasible when dealing with non- copyrighted programs. So to put our theoretical UVD into practice, on, for example, an IBM PC, we would do the following: a. Begin by validating the integrity of the detector code. This has been discussed above. b. Validate the integrity of the read process by checking the interrupt table and low memory for changes. This would stop the 4096 and viruses of its species, which place themselves in the memory resident read procedures and mask infections. c. Validate the correctness of the output process by checking screen write interrupt vectors and device paths. It could be done also by generating a direct write procedure to hardware addresses during the installation process. d. Validate the Boot sector of the disk and hidden R/O system files via direct sector reads. Knowing that the read process is unchanged, we can also be sure that the data coming into the CRC routine is correct. This then would defeat the Pakistani Brain and viruses of that sort. which relocate the boot sector and generate offset addresses. e. Once both ends of the pipe and the pipe itself are validated, we can begin checking all executables on the system for modifications. By doing this we have checked the entire data path and found it to be correct. We have also checked the correctness of the change detection procedure. This assumes that no other process has taken over the CPU during the scan, but this is no problem as long as we mask all external interrupts. Then only an actual hardware interrupt can cause the program to pause. And even these interrupts can be masked to a certain extent. Some have said in the past that the human is also a part of this process. We agree, but the user must be a part of any process. This utility must be designed to present the user with a reliable estimation of the integrity of executable files in the target machine. Running the utility in conjunction with the software update process guarantees that the user is aware of changes to the system configuration. Doing this in a controlled fashion will not violate the integrity of the model. Although the detection of authorized modifications may be a nuisance, it is necessary if we are to allow the system owner to make all risk related decisions on his/her system. If the user misses an infection once, it is fairly certain that the infection will be attempted again on the same machine (we saw over 400 infections on one machine). To this end, boot infections and memory infections are always flagged as serious. Beyond that, we can't effectively protect the user from himself.