Monitoring File System Integrity with Tripwire by Gene Kim and Gene Spafford Ellen runs a network of 50 networked Unix computers representing nearly a dozen vendors -- from PCs running Xenix to a Cray running Unicos. This morning, when she logged in to her workstation, Ellen was a bit surprised when the "lastlog" message indicated that "root" had logged into the system at 3 am. Ellen thought she was the only one with the root password. Needless to say, this was not something Ellen was happy to see. A bit more investigation revealed that someone -- certainly not Ellen -- had logged on as "root," not only on her machine but also on several other machines in her company. Unfortunately, the intruder deleted all the accounting and audit files just before logging out of each machine. Ellen suspects that the intruder (or intruders) ran the compiler and editor on several of the machines. Being concerned about security, Ellen is worried that the intruder may have thus changed one or more system files, thus enabling future unauthorized access as well as compromising sensitive information. How can she tell which files have been altered without restoring each system from backups? Poor Ellen is faced with one of the most tedious and frustrating jobs a system administrator can have -- determining which, if any, files and programs have been altered without authorization. File modifications may occur in a number of ways: an intruder, an authorized user violating local policy or controls, or even the rare piece of malicious code altering system executables as others are run. It might even be the case that some system hardware or software is silently corrupting vital system data. In each of these situations, the problem is not so much knowing that things might have been changed; rather, the problem is verifying exactly which files -- out of tens of thousands of files in dozens of gigabytes of disk on dozens of different architectures -- might have been changed. Not only is it necessary to examine every one of these files, but it is also necessary to examine directory information as well. Ellen will need to check for deleted or added files, too. With so many different systems and files, how is Ellen going to manage the situation? If Ellen has already installed the "Tripwire" program, produced under the COAST Project at Purdue University, then she can get a detailed list of changes by the time she gets back with a cup of coffee and a danish -- fortification for the follow-up task of restoring the system to normal! Design issues Consider for a moment the problem of detecting an unexpected, unauthorized change or set of changes on your system. A well-prepared system administrator may maintain checklists, comparison copies, checksum records, or a long history of backup tapes for this kind of contingency. However, these methods are costly to maintain, prone to error, and sometimes (even more insidious) susceptible to deliberate spoofing by a malicious intruder. For instance, one usual approach taken by system administrators is to simply generate a checklist of system files, perhaps using find(1) or ls(1). This list is then carefully saved, sometimes on auxiliary media. By using diff(1) to compare the original list with one generated later, the administrator can detect changed files if their modification times, ownership, or sizes are different. Added or deleted files also stand out. The ambitious administrator would make this scheme even more robust by storing each file's checksum generated by sum(8) or cksum(8) in the checklist. Perhaps an automated checklisting tool like the one described above might be enough to help Ellen; but, perhaps not. Several problems with checklists prevent them from being completely trustworthy and useful. First, the list of files and associated checksums may be tedious to maintain because of its size and lack of locality (files are located all over the disk). Second, using timestamps, checksums, and file sizes does not necessarily ensure the integrity of each file. After all, once an intruder gains root privileges, what's to prevent him or her from altering the timestamps, or even the checklist database? Furthermore, changes to a file may be made without changing its length. A devious intruder might even know how to make the checksum for a compromised program match the original one computed by the sum(8) program! And this entire approach presumes that ls(1) and the other programs have not been compromised! In the case of a serious attack, a conscientious administrator must not assume that these files have remained unchanged without strong proof. But what proof can be offered that is sufficient for this situation? A wishlist A successful checklist scheme requires a high level of automation -- both in generating the output list and in generating the input list of files. If the system is difficult to use, it may not be used often enough -- or worse, used improperly. The automation scheme should include a simple way to describe portions of the filesystem to be traversed. Additionally, in cases where files are likely to be added, changed, or deleted, it must be easy to update the checklist database. For instance, files such as /etc/motd may change weekly, or even daily. It should certainly not be necessary to regenerate the entire database every time this single file changes to maintain database accuracy. Ideally, our hypothetical automated checklisting program (sometimes called an integrity checker) could be run regularly from cron(8) to enable detection of file changes in a timely manner. It should also be possible to run the program manually to check a smaller set of files for changes. As the administrator is likely to compare the differences between the `base' checklist and the current file list frequently, it is important that the program be easy to invoke and use. A useful integrity checker must generate output that is easy to scan. A checker generating three hundred lines of output for the system administrator to analyze daily would be self-defeating -- this is probably far too much to ask of even Ellen, our amazingly dedicated system administrator! Thus, the program must allow the specification of filesystem "exceptions" that can change without being reported, and hence reduce "noise." For example, changes in system log file sizes are expected, but a change in inode number, ownership, or file modes is cause for alarm. However, a change in any value stored in the inodes (except for the access timestamp) for system binaries in /bin should be reported. Properly specified, the integrity checker should operate unobtrusively, notifying Ellen when a file changes outside the specified bounds. Finally, assuming that Ellen wants to run the integrity checker on every machine in her network, the integrity checker should allow the reuse and sharing of configuration files wherever possible. For example, if Ellen has twenty identical workstations, they should be able to share a common configuration file, even allowing machine-specific oddities (i.e., some software package installed on only one machine). The configuration should thus support reuse, because this reduces the chances of operator error. Signatures and checksums Detecting an intruder via modifications to the filesystem hinges upon being able to detect the modifications. The database contains the size, timestamps, and checksum for each file. Is this information enough to ensure the integrity of the filesystem? For instance, the contents of a 75 KByte file are represented by a two byte checksum. Is this good enough? Or can a clever intruder cover his or her tracks? Generically, a "signature function" is any function that takes an arbitrary file as input and yields a fixed-sized output, called the "signature." (Some refer to this as a "message digest" or "secure hash"; some pedants object to the use of the word "signature" unless a cryptographic key and encryption are involved.) If the contents of a file are changed in any way, then the signature should also change. One has "broken" a signature if an identical value can be generated using a different input file. If a file's signature can be easily broken while retaining the original file size, then the purpose of storing signatures in an integrity checking scheme is defeated -- an intruder could modify the changed file so that it has the same signature as the original. The 16-bit additive checksum is among the simplest signature functions. It adds the values of all the bytes in the file and outputs the lower 16-bits (remainder). However, breaking this kind of checksum signature is trivial -- any difference in the checksum created by changes introduced to a file can be negated by changing any 16-bit field to offset the change. 16 and 32-bit CRC functions use the mathematical remainder of polynomial division, making it somewhat (but not very) difficult to reverse-engineer a desired CRC. In fact, it isn't even difficult to do a simplistic brute-force search for a duplicate value: there are only 2^32 possible checksumsto examine. For instance, a brute-force search on a Sun SparcStation 1+ can find a duplicate 16-bit CRC signature for a 48Kbyte file in about ten minutes. A MasPar MP-1, with its 16,384 processors, takes only 0.7 seconds. The 32-bit CRC signatures are more difficult to break because the search space increased by its square. But with the help of a massively parallel machine, a successful brute-force search takes only four hours. Of course, a well-equipped attacker need not undertake measures such as this -- algorithmic solutions also exist that can be applied with much less effort. There are even programs available on some bulletin boards to do exactly this, and for precisely the purpose of fooling existing checksum programs! Message digest algorithms with roots in cryptographic methods have been developed to address the problem of uniquely identifying a file with a hard-to-spoof signature. These functions use "one-way" functions that are difficult to invert and that usually generate a large value, making exhaustive searches for duplicate signatures more computationally difficult than those with 32 bits. Several of these signature algorithms are publically available. Among them are MD2, MD4, MD5 (the RSA Data Security, Inc. Message Digesting Algorithms), Snefru (the Xerox Secure Hash Function), and SHS (the Secure Hash Standard proposed by NIST). These algorithms all generate signatures larger than 64 bits, making a brute-force attack like the ones used for CRCs computationally unrealistic (probably requiring many thousands of years on even the MasPar MP-1 with 16,384 processors). Furthermore, the algorithms involved are very, very difficult to invert; for some of them, it is not clear that they can be inverted at all. Thus, these functions provide signatures in which we can place considerably more confidence. The one-way nature of message digest signatures opens up many useful possibilities. For example, operating system vendors could make available databases containing signatures of all the files in a given release. This would enable easy comparisons of system files against those shipped by the vendor. Ascertaining whether any system file matches those shipped by the vendor would be much simpler than doing a binary comparison against a release tape. The system administrator need only compare the signature of the file residing on the system against the signature shipped with the operating system. The availability of a free, highly-general tool would provide a basis for vendors to do precisely that -- and such a tool now exists: Tripwire. Tripwire Tripwire, our portable integrity checker, is available in C source code form for a wide variety of Unix systems. Given the requirements for an effective integrity checker enumerated above, it almost sounds like a problem that could be solved with a few hundred lines of Perl. However, as the available signature functions are written in C and computationally intensive, a "Perlian" integrity checker would be less than optimal. Furthermore, for maximum portability (and to provide one less avenue of compromise), it makes more sense to write this program in a compiled language, like C. The Tripwire program uses a simple ASCII configuration file to name directories and files to be monitored. Each entry specifies some combination of file attributes to watch, along with a set of signature functions to maintain. The functions may be chosen from the built-in set of MD2, MD4, MD5, Snefru, CRC-16, and CRC-32. There are also hooks for users to include their own functions, such as a true cryptographic signature routine (e.g., based on CBC-mode DES and requiring a password each time the program is run). Tripwire includes pre-configured system-specific configuration files for BSD and System V-derived operating systems. It also includes vendor-specific configuration files for Suns, Convexes, SGIs, Sequents, Pyramids, Crays, HPs, NeXTs, and others. On a SparcStation running SunOS 4.1, the two-hundred line Tripwire configuration file describes about 1150 system files and directories to be monitored. For this system, configuring and compiling Tripwire takes less then ten minutes. Tuning Tripwire to reflect a specific filesystem is an iterative process, which may take up to a few hours to complete in special cases. Once the initial database is generated (five to fifteen minutes), Tripwire is ready to run as an integrity checker. Tripwire has been designed to run from read-only storage as well as regular disk. If the program and its databases are stored on read-only media, an attacker will be unable to alter them, thus leading to the highest possible confidence in the Tripwire output. Of course, Tripwire will also run from a regular disk, although we recommend that read-only media be used. Tripwire underwent a rigorous six week beta-testing period, with over 125 testers participating worldwide. Not only did the testers help eliminate bugs and increase portability, they also lobbied for changes of practicality, ease-of-use, "taste," and efficiency. Because the source code is freely available, many users have added site-specific changes since the initial release. These modifications are especially useful in their local environments. Some of these have been sent back to us for issue in later releases of Tripwire. The current release of Tripwire includes a set of contributed extensions to the Tripwire code, all provided in source. Obtaining Tripwire Tripwire is available in source form at no cost. It has been submitted to comp.sources.unix on the Usenet, and is available via anonymous FTP from cs.purdue.edu in pub/spaf/COAST/Tripwire. For those without Internet access, the sources and patches can be obtained via email. Just mail to tripwire-request@cs.purdue.edu with the single word "help" in the message body to get instructions. A mailing list has been created for the purpose of discussing the future direction of Tripwire. For now, we envision vendors shipping Tripwire databases with their operating systems to allow system administrators to quickly compare their setup against the "official" baseline database. The list may be joined using the server, described above. We regret that do not have the resources available to make tapes or diskette versions of Tripwire for anyone other than COAST Project sponsors. Therefore, we ask that you not send us media for copies -- it will not be returned. Tripwire is copyrighted, but there is no charge for non-commercial use. It is the first project from the COAST Project, an effort to provide effective tools for computer security. Readers interested in further information about the Tripwire program or the COAST effort should consult the files available on the ftp site or through the mail server. -------------------- About the authors: Gene Kim is a senior at Purdue studying computer science. He dreams of graduation, and of researching problems involving massively parallel and distributed computing. Gene Spafford is on the faculty of the Department of Computer Sciences at Purdue University, and is the director of the COAST Project. He is co-author of the definitive "Practical Unix Security" book, published by O'Reilly & Associates, 1991. Spaf is widely known for his work in security, in software engineering, and for advocacy of responsible use of computing technology.