Monitoring File System Integrity with Tripwire
             by Gene Kim and Gene Spafford

    Ellen runs a network of 50 networked Unix computers representing
nearly a dozen vendors -- from PCs running Xenix to a Cray running
Unicos.  This morning, when she logged in to her workstation, Ellen
was a bit surprised when the "lastlog" message indicated that "root"
had logged into the system at 3 am.  Ellen thought she was the only
one with the root password.  Needless to say, this was not something
Ellen was happy to see.

    A bit more investigation revealed that someone -- certainly not
Ellen -- had logged on as "root," not only on her machine but also on
several other machines in her company.  Unfortunately, the intruder
deleted all the accounting and audit files just before logging out of
each machine.  Ellen suspects that the intruder (or intruders) ran the
compiler and editor on several of the machines.  Being concerned about
security, Ellen is worried that the intruder may have thus changed one
or more system files, thus enabling future unauthorized access as well
as compromising sensitive information.  How can she tell which files
have been altered without restoring each system from backups?

    Poor Ellen is faced with one of the most tedious and frustrating
jobs a system administrator can have -- determining which, if any,
files and programs have been altered without authorization.  File
modifications may occur in a number of ways: an intruder, an
authorized user violating local policy or controls, or even the rare
piece of malicious code altering system executables as others are run.
It might even be the case that some system hardware or software is
silently corrupting vital system data.

    In each of these situations, the problem is not so much knowing
that things might have been changed; rather, the problem is verifying
exactly which files -- out of tens of thousands of files in dozens of
gigabytes of disk on dozens of different architectures -- might have
been changed.  Not only is it necessary to examine every one of these
files, but it is also necessary to examine directory information as
well.  Ellen will need to check for deleted or added files, too.  With
so many different systems and files, how is Ellen going to manage the
situation?

    If Ellen has already installed the "Tripwire" program, produced
under the COAST Project at Purdue University, then she can get a
detailed list of changes by the time she gets back with a cup of
coffee and a danish -- fortification for the follow-up task of
restoring the system to normal!

Design issues

    Consider for a moment the problem of detecting an unexpected,
unauthorized change or set of changes on your system.  A well-prepared
system administrator may maintain checklists, comparison copies,
checksum records, or a long history of backup tapes for this kind of
contingency.  However, these methods are costly to maintain, prone to
error, and sometimes (even more insidious) susceptible to deliberate
spoofing by a malicious intruder.

    For instance, one usual approach taken by system administrators is
to simply generate a checklist of system files, perhaps using find(1)
or ls(1).  This list is then carefully saved, sometimes on auxiliary
media.  By using diff(1) to compare the original list with one
generated later, the administrator can detect changed files if their
modification times, ownership, or sizes are different.  Added or
deleted files also stand out.  The ambitious administrator would make
this scheme even more robust by storing each file's checksum generated
by sum(8) or cksum(8) in the checklist.

    Perhaps an automated checklisting tool like the one described
above might be enough to help Ellen; but, perhaps not.  Several
problems with checklists prevent them from being completely
trustworthy and useful.  First, the list of files and associated
checksums may be tedious to maintain because of its size and lack of
locality (files are located all over the disk).  Second, using
timestamps, checksums, and file sizes does not necessarily ensure the
integrity of each file.  After all, once an intruder gains root
privileges, what's to prevent him or her from altering the timestamps,
or even the checklist database?  Furthermore, changes to a file may be
made without changing its length. A devious intruder might even know
how to make the checksum for a compromised program match the original
one computed by the sum(8) program!  And this entire approach presumes
that ls(1) and the other programs have not been compromised!  In the
case of a serious attack, a conscientious administrator must not
assume that these files have remained unchanged without strong proof.
But what proof can be offered that is sufficient for this situation?


A wishlist

    A successful checklist scheme requires a high level of automation
-- both in generating the output list and in generating the input list
of files.  If the system is difficult to use, it may not be used often
enough -- or worse, used improperly. The automation scheme should
include a simple way to describe portions of the filesystem to be
traversed. Additionally, in cases where files are likely to be added,
changed, or deleted, it must be easy to update the checklist database.
For instance, files such as /etc/motd may change weekly, or even
daily.  It should certainly not be necessary to regenerate the entire
database every time this single file changes to maintain database
accuracy.

    Ideally, our hypothetical automated checklisting program
(sometimes called an integrity checker) could be run regularly from
cron(8) to enable detection of file changes in a timely manner.  It
should also be possible to run the program manually to check a smaller
set of files for changes.  As the administrator is likely to compare
the differences between the `base' checklist and the current file list
frequently, it is important that the program be easy to invoke and
use.

    A useful integrity checker must generate output that is easy to
scan.  A checker generating three hundred lines of output for the
system administrator to analyze daily would be self-defeating -- this
is probably far too much to ask of even Ellen, our amazingly 
dedicated system administrator!  Thus, the program must allow
the specification of filesystem "exceptions" that can change without
being reported, and hence reduce "noise."  For example, changes in
system log file sizes are expected, but a change in inode number,
ownership, or file modes is cause for alarm.  However, a change in any
value stored in the inodes (except for the access timestamp) for
system binaries in /bin should be reported.  Properly specified, the
integrity checker should operate unobtrusively, notifying Ellen when a
file changes outside the specified bounds.

    Finally, assuming that Ellen wants to run the integrity checker on
every machine in her network, the integrity checker should allow the
reuse and sharing of configuration files wherever possible.  For
example, if Ellen has twenty identical workstations, they should be
able to share a common configuration file, even allowing
machine-specific oddities (i.e., some software package installed on
only one machine).  The configuration should thus support reuse,
because this reduces the chances of operator error.

    
Signatures and checksums

    Detecting an intruder via modifications to the filesystem hinges
upon being able to detect the modifications.  The database contains
the size, timestamps, and checksum for each file.  Is this information
enough to ensure the integrity of the filesystem?  For instance, the
contents of a 75 KByte file are represented by a two byte checksum.
Is this good enough?  Or can a clever intruder cover his or her
tracks?

    Generically, a "signature function" is any function that takes an
arbitrary file as input and yields a fixed-sized output, called the
"signature." (Some refer to this as a "message digest" or "secure
hash"; some pedants object to the use of the word "signature" unless a
cryptographic key and encryption are involved.)  If the contents of a
file are changed in any way, then the signature should also change.
One has "broken" a signature if an identical value can be generated
using a different input file.  If a file's signature can be easily
broken while retaining the original file size, then the purpose of
storing signatures in an integrity checking scheme is defeated -- an
intruder could modify the changed file so that it has the same
signature as the original.

    The 16-bit additive checksum is among the simplest signature
functions.  It adds the values of all the bytes in the file and
outputs the lower 16-bits (remainder).  However, breaking this kind of
checksum signature is trivial -- any difference in the checksum
created by changes introduced to a file can be negated by changing any
16-bit field to offset the change.  16 and 32-bit CRC functions use
the mathematical remainder of polynomial division, making it somewhat
(but not very) difficult to reverse-engineer a desired CRC.  In fact,
it isn't even difficult to do a simplistic brute-force search for a
duplicate value: there are only 2^32 possible checksumsto examine.

    For instance, a brute-force search on a Sun SparcStation 1+ can
find a duplicate 16-bit CRC signature for a 48Kbyte file in about ten
minutes.  A MasPar MP-1, with its 16,384 processors, takes only 0.7
seconds.  The 32-bit CRC signatures are more difficult to break
because the search space increased by its square.  But with the help
of a massively parallel machine, a successful brute-force search takes
only four hours.  Of course, a well-equipped attacker need not
undertake measures such as this -- algorithmic solutions also exist
that can be applied with much less effort.  There are even programs
available on some bulletin boards to do exactly this, and for
precisely the purpose of fooling existing checksum programs!

    Message digest algorithms with roots in cryptographic methods have
been developed to address the problem of uniquely identifying a file
with a hard-to-spoof signature.  These functions use "one-way"
functions that are difficult to invert and that usually generate a
large value, making exhaustive searches for duplicate signatures
more computationally difficult than those with 32 bits.

    Several of these signature algorithms are publically available.
Among them are MD2, MD4, MD5 (the RSA Data Security, Inc.  Message
Digesting Algorithms), Snefru (the Xerox Secure Hash Function), and
SHS (the Secure Hash Standard proposed by NIST).  These algorithms all
generate signatures larger than 64 bits, making a brute-force attack
like the ones used for CRCs computationally unrealistic (probably
requiring many thousands of years on even the MasPar MP-1 with 16,384
processors).  Furthermore, the algorithms involved are very, very
difficult to invert; for some of them, it is not clear that they can
be inverted at all.  Thus, these functions provide signatures in which
we can place considerably more confidence.

    The one-way nature of message digest signatures opens up many
useful possibilities.  For example, operating system vendors could
make available databases containing signatures of all the files in a
given release.  This would enable easy comparisons of system files
against those shipped by the vendor.  Ascertaining whether any system
file matches those shipped by the vendor would be much simpler than
doing a binary comparison against a release tape.  The system
administrator need only compare the signature of the file residing on
the system against the signature shipped with the operating system.
The availability of a free, highly-general tool would provide a basis
for vendors to do precisely that -- and such a tool now exists:
Tripwire.


Tripwire

        Tripwire, our portable integrity checker, is available in C
source code form for a wide variety of Unix systems.  Given the
requirements for an effective integrity checker enumerated above, it
almost sounds like a problem that could be solved with a few hundred
lines of Perl.  However, as the available signature functions are
written in C and computationally intensive, a "Perlian" integrity
checker would be less than optimal.  Furthermore, for maximum
portability (and to provide one less avenue of compromise), it makes
more sense to write this program in a compiled language, like C.

        The Tripwire program uses a simple ASCII configuration file to
name directories and files to be monitored.  Each entry specifies some
combination of file attributes to watch, along with a set of signature
functions to maintain.  The functions may be chosen from the built-in
set of MD2, MD4, MD5, Snefru, CRC-16, and CRC-32.  There are also
hooks for users to include their own functions, such as a true
cryptographic signature routine (e.g., based on CBC-mode DES and
requiring a password each time the program is run).  Tripwire includes
pre-configured system-specific configuration files for BSD and System
V-derived operating systems.  It also includes vendor-specific
configuration files for Suns, Convexes, SGIs, Sequents, Pyramids,
Crays, HPs, NeXTs, and others.

   On a SparcStation running SunOS 4.1, the two-hundred line Tripwire
configuration file describes about 1150 system files and directories
to be monitored.  For this system, configuring and compiling Tripwire
takes less then ten minutes.  Tuning Tripwire to reflect a specific
filesystem is an iterative process, which may take up to a few hours
to complete in special cases.  Once the initial database is generated
(five to fifteen minutes), Tripwire is ready to run as an integrity
checker.

    Tripwire has been designed to run from read-only storage as well
as regular disk.  If the program and its databases are stored on
read-only media, an attacker will be unable to alter them, thus
leading to the highest possible confidence in the Tripwire output.  Of
course, Tripwire will also run from a regular disk, although we
recommend that read-only media be used.

    Tripwire underwent a rigorous six week beta-testing period, with
over 125 testers participating worldwide.  Not only did the testers
help eliminate bugs and increase portability, they also lobbied for
changes of practicality, ease-of-use, "taste," and efficiency.
Because the source code is freely available, many users have added
site-specific changes since the initial release.  These modifications
are especially useful in their local environments.  Some of these have
been sent back to us for issue in later releases of Tripwire.  The
current release of Tripwire includes a set of contributed extensions
to the Tripwire code, all provided in source.


Obtaining Tripwire

    Tripwire is available in source form at no cost.  It has been
submitted to comp.sources.unix on the Usenet, and is available via
anonymous FTP from cs.purdue.edu in pub/spaf/COAST/Tripwire.  For
those without Internet access, the sources and patches can be obtained
via email.  Just mail to tripwire-request@cs.purdue.edu with the
single word "help" in the message body to get instructions.

    A mailing list has been created for the purpose of discussing the
future direction of Tripwire.  For now, we envision vendors shipping
Tripwire databases with their operating systems to allow system
administrators to quickly compare their setup against the "official"
baseline database.  The list may be joined using the server, described
above.

    We regret that do not have the resources available to make tapes
or diskette versions of Tripwire for anyone other than COAST Project
sponsors.  Therefore, we ask that you not send us media for copies --
it will not be returned.

    Tripwire is copyrighted, but there is no charge for non-commercial
use.  It is the first project from the COAST Project, an effort to
provide effective tools for computer security.  Readers interested in
further information about the Tripwire program or the COAST effort
should consult the files available on the ftp site or through the mail
server.

-------------------- 
About the authors:

   Gene Kim is a senior at Purdue studying computer science.  He
dreams of graduation, and of researching problems involving massively
parallel and distributed computing.

    Gene Spafford is on the faculty of the Department of Computer
Sciences at Purdue University, and is the director of the COAST
Project.  He is co-author of the definitive "Practical Unix Security"
book, published by O'Reilly & Associates, 1991.  Spaf is widely known
for his work in security, in software engineering, and for advocacy of
responsible use of computing technology.