Reports and Papers Archive - Reports & Papers

Channel Model and Operational Capacity Analysis of Printed Text Documents

CERIAS TR 2007-72

A. K. Mikkilineni, P.-J. Chiang, G. T.-C. Chiu, J. P. Allebach, E. J. Delp

Download: PDF

In todayâ€™s digital world securing different forms of content is very important in terms of protecting copyright and verifying authenticity. One example is watermarking of digital audio and images. We believe that a marking scheme analogous to digital watermarking but for documents is very important. In this paper we describe the use of laser amplitude modulation in electrophotographic printers to embed information in a text document. In particular we describe an embedding and detection process which allows the embedding of between 2 and 8 bits in a single line of text. For a typical 12 point document this translates to between 100 and 400 bits per page. We also perform an operational analysis to compare two decoding methods using different embedding densities.

Added 2007-10-03

Detecting Anomalous Access Patterns in Relational Databases

CERIAS TR 2007-73

Ashish Kamra, Evimaria Terzi, Elisa Bertino

Download: PDF

A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance and security. An important component of any strong security solution is represented by Intrusion Detection (ID) techniques, able to detect anomalous behavior of applications and users. To date, however, there have been few ID mechanisms proposed which are specifically tailored to function within the DBMS. In this paper, we propose such a mechanism. Our approach is based on mining SQL queries stored in database audit log files. The result of the mining process is used to form profiles that can model normal database access behavior and identify intruders. We consider two different scenarios while addressing the problem. In the first case, we assume that the database has a Role Based Access Control (RBAC) model in place. Under a RBAC system permissions are associated with roles, grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals while holding a specific role, behave differently than expected. An important advantage of providing an ID technique specifically tailored to RBAC databases is that it can help in protecting against insider threats. Furthermore, the existence of roles makes our approach usable even for databases with large user population. In the second scenario, we assume that there are no roles associated with users of the database. In this case, we look directly at the behavior of the users. We employ clustering algorithms to form concise profiles representing normal user behavior. For detection, we either use these clustered profiles as the roles or employ outlier detection techniques to identify behavior that deviates from the profiles. Our preliminary experimental evaluation on both real and synthetic database traces shows that our methods work well in practical situations.

Added 2007-10-03

A Black-box Router Profiler

CERIAS TR 2007-74

Roman Chertov, Sonia Fahmy, Ness B. Shroff

Download: PDF

Simulation, emulation, and wide-area testbeds exhibit differ- ent strengths and weaknesses with respect to fidelity, scalability, and man- ageability. Fidelity is a key concern since simulation or emulation inaccura- cies can lead to a dramatic and qualitative impact on the results. For exam- ple, high-bandwidth denial of service attack floods of the same rates have very different impact on the different platforms, even if the experimental scenario is supposedly identical. This is because many popular simulation and emulation environments fail to account for realistic commercial router behaviors, and incorrect results have been reported based on experiments conducted in these environments. In this paper, we describe the architecture of a black-box router profil- ing tool which integrates the popular ns-2 simulator with the Click mod- ular router and a modified network driver. We use this profiler to collect measurements on a Cisco router. Our preliminary results demonstrate that routers and other forwarding devices cannot be modeled as simple output port queues, even if correct rate limits are observed. We discuss our fu- ture work plans for using our data to create high-fidelity networksimula-tion/emulation models that are not computationally prohibitive.

Added 2007-10-03

International Conference on Software Maintenance

C. Liu, X. Zhang, J. Han, Y. Zhang, and B. Bhargava

Download: PDF

Recent software systems usually feature an automated failure reporting component, with which a huge number of failures are collected from software end-users. With a proper support of failure indexing, which identifies failures due to the same fault, the collected failure data can help developers prioritize failure diagnosis, among other utilities of the failure data. Since crashing failures can be effectively indexed by program crashing venues, current practice has seen great success in prioritizing crashing failures. A recent study of bug characteristics indicates that as excellent memory checking tools are widely adopted, semantic bugs and the resulting noncrashing failures have become dominant. Unfortunately, the problem of how to index noncrashing failures has not been seriously studied before. In previous study, two techniques have been proposed to index noncrashing failures, and they are T-PROXIMITY and R-PROXIMITY. However, as T-PROXIMITY indexes failures by the profile of the entire execution, it is generally not effective because most information in the profile is fault irrelevant. On the other hand, although R-PROXIMITY is more effective than T-PROXIMITY, it relies on a sufficient number of correct executions that may not be available in practice. In this paper, we propose a dynamic slicing-based approach, which does not require any correct executions, and is comparably effective as R-PROXIMITY. A detailed case study with gzip is reported, which clearly demonstrates the benefits of the proposed approach.

Added 2007-09-29

International Conference on Software Maintenance

V. Nagarajan, R. Gupta, X. Zhang, M. Madou, B. De Sutter, and K. De Bosschere

Download: PDF

In many application areas, including piracy detection, software debugging and maintenance, situations arise in which there is a need for comparing two versions of a program that dynamically behave the same even though they statically appear to be different. Recently dynamic matching was proposed by us which uses execution histories to automatically produce mappings between instructions in the two program versions. The mappings then can be used to understand the correspondence between the two versions by a user involved in software piracy detection or a comparison checker involved in debugging of optimized code. However, if a programâ€™s control flow is substantially altered, which usually occurs in obfuscation or even manual transformations, mappings at instruction level are not sufficient to enable a good understanding of the correspondence. In this paper, we present a comprehensive dynamic matching algorithm with the focus on call graph and control flow matching. Our technique works in the presence of aggressive control flow transformations (both interprocedural such as function inlining/outlining and intraprocedural such as control flow flattening) and produces mappings of interprocedural and intraprocedural control flow in addition to mapping between instructions. We evaluated our dynamic matching algorithms by attempting to match original program with versions that were subjected to popular obfuscation and control flow altering transformations. Our experimental results show that the control flow mappings produced are highly accurate and complete, for the programs considered.

Added 2007-09-29

International Symposium on Software Testing and Analysis

B. Xin and X. Zhang

Download: PDF

Capturing dynamic control dependence is critical for many dynamic program analysis such as dynamic slicing, dynamic information flow, and data lineage computation. Existing algorithms are mostly a simple runtime translation of the static definition, which fails to capture certain dynamic properties by its nature, leading to inefficiency. In this paper, we propose a novel online detection technique for dynamic control dependence. The technique is based upon a new definition, which is equivalent to the existing one in the intraprocedural case but it enables an efficient detection algorithm. The new algorithm naturally and efficiently handles interprocedural dynamic control dependence even in presence of irregular control flow. Our evaluation shows that the detection algorithm slows down program execution by a factor of 2.57, which is 2.54 times faster than the existing algorithm that was used in prior work.

Added 2007-09-29

International Symposium on Software Testing and Analysis

S. Tallam, C. Tian, X. Zhang, and R. Gupta

Download: PDF

Debugging long running multithreaded programs is a very challenging problem when using tracing-based analyses. Since such programs are non-deterministic, reproducing the bug is non-trivial and generating and inspecting traces for long running programs can be prohibitively expensive. We propose a framework in which, to overcome the problem of bug reproducibility, a lightweight logging technique is used to log the events during the original execution. When a bug is encountered, it is reproduced using the generated log and during the replay, a fine-grained tracing technique is employed to collect control-flow/dependence traces that are then used to locate the root cause of the bug. In this paper, we address the key challenges resulting due to tracing, that is, the prohibitively high expense of collecting traces and the significant burden on the user who must examine the large amount of trace information to locate the bug in a long-running multithreaded program. These challenges are addressed through execution reduction that realizes a combination of logging and tracing such that traces collected contain only the execution information from those regions of threads that are relevant to the fault. This approach is highly effective because we observe that for long running multithreaded programs, many threads that execute are irrelevant to the fault. Hence, these threads need not be replayed and traced when trying to reproduce the bug. We develop a novel lightweight scheme that identifies such threads by observing all the interthread data dependences and removes their execution footprint in the replay run. In addition, we identify regions of thread executions that need not be replayed or, if they must be replayed, we determine if they need not be traced. Following execution reduction, the replayed execution takes lesser time to run and it produces a much smaller trace than the original execution. Thus, the cost of collecting traces and the effort of examining the traces to locate the fault are greatly reduced.

Added 2007-09-29

Joint 10th European Software Engineering Conference and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering

Xiangyu Zhang, Rajiv Gupta

Download: PDF

We develop a method for matching dynamic histories of program executions of two program versions. The matches produced can be useful in many applications including software piracy detection and several debugging scenarios. Unlike some static approaches for matching program versions, our approach does not require access to source code of the two program versions because dynamic histories can be collected by running instrumented versions of program binaries. We base our matching algorithm on comparison of rich program execution histories which include: control flow taken, values produced, addresses referenced, as well as data dependences exercised. In developing a matching algorithm we had two goals: producing an accurate match and producing it quickly. By using rich execution history, we are able to compare the program versions across many behavioral dimensions. The result is a fast and highly precise matching algorithm. Our algorithm first uses individual histories of instructions to identify multiple potential matches and then it refines the set of matches by matching the data dependence structure established by the matching instructions. To test our algorithm we attempted matching of execution histories of unoptimized and optimized program versions. Our results show that our algorithm produces highly accurate matches which are highly effective when used in comparison checking approach to debugging optimized code.

Added 2007-09-29

Using Trust to Understand Risk Perceptions and Economic Benefits in Online Environments

CERIAS TR 2007-70

Fariborz Farahmand, Shari Lawrence Pfleeger, Eugene H. Spafford

Added 2007-09-26

Tracing Worm Break-in and Contaminations via Process Coloring: A Provenance-Preserving Approach

CERIAS TR 2007-69

Xuxian Jiang, Florian Buchholz, Aaron Walters, Dongyan Xu, Yi-Min Wang, Eugene H. Spafford

To detect and investigate self-propagating worm attacks against networked servers, the following capabilities are desirable: (1) raising timely alerts to trigger a worm investigation, (2) determining the break-in point of a worm, i.e. the vulnerable service from which the worm infiltrates the victim, and (3) identifying all contaminations inflicted by the worm during its residence in the victim. In this paper, we argue that the worm break-in provenance information has not been exploited in achieving these capabilities and thus propose process coloring, a new approach that preserves worm break-in provenance information and propagates it along operating system level information flows. More specifically, process coloring assigns a “color”, a unique system-wide identifier, to each remotely-accessible server process. The color will be either inherited by spawned child processes or diffused transitively through process actions. Process coloring achieves three new capabilities: color-based worm warning generation, break-in point identification, and log file partitioning. The virtualization-based implementation enables more tamper- resistant log collection, storage, and real-time monitoring. Beyond the overhead introduced by virtualization, process coloring only incurs very small additional system overhead. Experiments with real-world worms demonstrate the advantages of processing coloring over non-provenance-preserving tools.

Added 2007-09-20

Run-Time Label Propagation for Forensic Audit Data

CERIAS TR 2007-68

Florian Buchholz and Eugene H. Spafford

Download: PDF

It is desirable to be able to gather more forensically valuable audit data from computing systems than is currently done or possible. This is useful for the reconstruction of events that took place on the system for the purpose of digital forensic investigations. In this paper, we propose a mechanism that allows arbitrary meta-information bound to principals on a system to be propagated based on causality influenced by information flow. We further discuss how to implement such a mechanism for the FreeBSD operating system and present a proof-of-concept implementation that has little overhead compared to the system without label propagation.

Added 2007-09-20

Some Challenges in Digital Forensics

CERIAS TR 2006-58

Eugene H. Spafford

Added 2007-09-20

Perceptions of Information Security Risks and Implications for Public Policy

CERIAS TR 2007-67

Fariborz Farahmand, Eugene H. Spafford, and Melissa J. Dark

Added 2007-09-20

Categories of Digital Investigation Analysis Techniques Based on the Computer History Model

CERIAS TR 2006-57

Brian D. Carrier, Eugene H. Spafford

Download: PDF

Several digital forensic frameworks have been proposed, yet no conclusions have been reached about which are more appropriate. This is partly because each framework may work well for different types of investigations, but it hasnâ€™t been shown if any are sufficient for all types of investigations. To address this problem, this work uses a model based on the history of a computer to define categories and classes of analysis techniques. The model is more lower-level than existing frameworks and the categories and classes of analysis techniques that are defined support the existing higher-level frameworks. Therefore, they can be used to more clearly compare the frameworks. Proofs can be given to show the completeness of the analysis techniques and therefore the completeness of the frame-works can also be addressed.

Added 2007-09-19

An Exploration of Highly Focused, Coprocessor-based Information System Protection

CERIAS TR 2007-66

Paul Williams and Eugene H. Spafford

Download: PDF

Most past and present intrusion detection systems architectures assume a uniprocessor environment or do not explicitly make use of multiple processors when they exist. Yet, especially in the server world, multiple processor machines are commonplace; and with the advent of technologies such as Intel and ANID’s multi-core or Hyperthreading technologies, commodity computers are likely to have multiple processors.

This research explores how explicitly dividing the system into production and security components and running the components in parallel on different processors can improve the effectiveness of the security system. The production component contains all user tasks and most of the operating system while the security component contains security monitoring and validating tasks and the parts of the O/S that pertain to security. We demonstrate that under some circumstances this architecture allows intrusion detection systems to use monitoring models with higher fidelity, particularly with regard to the timeliness of detection, and will also increase system robustness in the face of some types of attacks.

Empirical results with a prototype co-processing intrusion detection system (Cu-PIDS) architecture support the feasibility of this approach. The construction of the prototype allowed us to demonstrate the implementation costs of the architecture are reasonable. Experimentation using fine-grained protection of real-world applications resulted in about a fifteen percent slowdown white demonstrating CuPIDS’ ability to quickly detect and respond to illegitimate behavior.

Added 2007-09-19