The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

Using Relational Knowledge Discovery

CERIAS TR 2007-82
Jennifer Neville, John Komoroske, Kelly Palmer, David Jensen
Download: PDF

We describe an application of relational knowledge discov-  ery to a key regulatory mission of the National Associa-  tion of Securities Dealers (NASD). NASD is the world’s largest private-sector securities regulator, with responsibil-  ity for preventing and discovering misconduct among secu-  rities brokers. Our goal was to help focus NASD’s limited regulatory resources on the brokers who are most likely to engage in securities violations. Using statistical relational learning algorithms, we developed models that rank brokers with respect to the probability that they would commit a serious violation of securities regulations in the near future.  Our models incorporate organizational relationships among brokers (e.g., past coworker), which domain experts consider important but have not been easily used before now. The learned models were sub jected to an extensive evaluation using more than 18 months of data unseen by the model developers and comprising over two person weeks of effort by NASD staff. Model predictions were found to correlate highly with the sub jective evaluations of experienced NASD examiners. Furthermore, in all performance measures, our models performed as well as or better than the handcrafted rules that are currently in use at NASD.

Added 2007-10-05

ODSBR: An On-Demand Secure Byzantine Routing Protocol

CERIAS TR 2007-81
Baruch Awerbuch, Reza Curtmola, David Holmer, Cristina Nita-Rotaru, Herbert Rubens
Download: PDF

A common technique used by routing protocols for ad hoc wireless networks is to establish the routing paths on-demand, as opposed to continually maintaining a complete routing table. Since in an ad hoc network nodes not in direct range communicate via intermediate nodes, a significant concern is the ability to route in the presence of Byzantine failures which include nodes that drop, fabricate, modify,  or mis-route packets in an attempt to disrupt the routing service.  We propose the first on-demand routing protocol for ad hoc wireless networks that provides resilience to Byzantine failures caused by individual or colluding nodes. The protocol relies on an adaptive probing technique that detects a malicious link after log n faults have occurred, where n is the length of the path. Problematic links are avoided by using a weight-based mechanism that multiplicatively increases their weights and by using an on-demand route discovery protocol that finds a least weight path to the destination. Our protocol bounds the amount of damage that an attacker or a group of colluding attackers can cause to the network.

Added 2007-10-05

Stealthy Malware Detection Through VMM-Based

CERIAS TR 2007-80
Xuxian Jiang, Xinyuan Wang, Dongyan Xu
Download: PDF

An alarming trend in malware attacks is that they are armed with stealthy techniques to detect, evade, and subvert malware detection facilities of the victim. On the defensive side, a fundamental lim-  itation of traditional host-based anti-malware systems is that they run inside the very hosts they are protecting (“in the box”), making them vulnerable to counter-detection and subversion by malware.  To address this limitation, recent solutions based on virtual ma-  chine (VM) technologies advocate placing the malware detection facilities outside of the protected VM (“out of the box”). However,  they gain tamper resistance at the cost of losing the native, seman-  tic view of the host which is enjoyed by the “in the box” approach,  thus leading to a technical challenge known as the semantic gap.  In this paper, we present the design, implementation, and evalua-  tion of VMwatcher – an “out-of-the-box” approach that overcomes the semantic gap challenge. A new technique called guest view casting is developed to systematically reconstruct internal seman-  tic views (e.g., files, processes, and kernel modules) of a VM from the outside in a non-intrusive manner. Specifically, the new tech-  nique casts semantic definitions of guest OS data structures and functions on virtual machine monitor (VMM)-level VM states, so that the semantic view can be reconstructed. With the semantic gap bridged, we identify two unique malware detection capabilities: (1)  view comparison-based malware detection and its demonstration in rootkit detection and (2) “out-of-the-box” deployment of host-  based anti-malware software with improved detection accuracy and tamper-resistance. We have implemented a proof-of-concept pro-  totype on both Linux and Windows platforms and our experimen-  tal results with real-world malware, including elusiverootkits, demonstrate its practicality and effectiveness.

Added 2007-10-05

Channel Model and Operational Capacity Analysis of Printed Text Documents

CERIAS TR 2007-72
A. K. Mikkilineni, P.-J. Chiang, G. T.-C. Chiu, J. P. Allebach, E. J. Delp
Download: PDF

In today’s digital world securing different forms of content is very important in terms of protecting copyright and verifying authenticity. One example is watermarking of digital audio and images. We believe that a marking scheme analogous to digital watermarking but for documents is very important. In this paper we describe the use of laser amplitude modulation in electrophotographic printers to embed information in a text document. In particular we describe an embedding and detection process which allows the embedding of between 2 and 8 bits in a single line of text. For a typical 12 point document this translates to between 100 and 400 bits per page. We also perform an operational analysis to compare two decoding methods using different embedding densities.

Added 2007-10-03

Detecting Anomalous Access Patterns in Relational Databases

CERIAS TR 2007-73
Ashish Kamra, Evimaria Terzi, Elisa Bertino
Download: PDF

A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance and security. An important component of any strong security solution is represented by Intrusion Detection (ID) techniques, able to detect anomalous behavior of applications and users. To date, however, there have been few ID mechanisms proposed which are specifically tailored to function within the DBMS. In this paper, we propose such a mechanism. Our approach is based on mining SQL queries stored in database audit log files. The result of the mining process is used to form profiles that can model normal database access behavior and identify intruders. We consider two different scenarios while addressing the problem. In the first case, we assume that the database has a Role Based Access Control (RBAC) model in place. Under a RBAC system permissions are associated with roles, grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals while holding a specific role, behave differently than expected. An important advantage of providing an ID technique specifically tailored to RBAC databases is that it can help in protecting against insider threats. Furthermore, the existence of roles makes our approach usable even for databases with large user population. In the second scenario, we assume that there are no roles associated with users of the database. In this case, we look directly at the behavior of the users. We employ clustering algorithms to form concise profiles representing normal user behavior. For detection, we either use these clustered profiles as the roles or employ outlier detection techniques to identify behavior that deviates from the profiles. Our preliminary experimental evaluation on both real and synthetic database traces shows that our methods work well in practical situations.

Added 2007-10-03

A Black-box Router Profiler

CERIAS TR 2007-74
Roman Chertov, Sonia Fahmy, Ness B. Shroff
Download: PDF

Simulation, emulation, and wide-area testbeds exhibit differ-  ent strengths and weaknesses with respect to fidelity, scalability, and man-  ageability. Fidelity is a key concern since simulation or emulation inaccura-  cies can lead to a dramatic and qualitative impact on the results. For exam-  ple, high-bandwidth denial of service attack floods of the same rates have very different impact on the different platforms, even if the experimental scenario is supposedly identical. This is because many popular simulation and emulation environments fail to account for realistic commercial router behaviors, and incorrect results have been reported based on experiments conducted in these environments.  In this paper, we describe the architecture of a black-box router profil-  ing tool which integrates the popular ns-2 simulator with the Click mod-  ular router and a modified network driver. We use this profiler to collect measurements on a Cisco router. Our preliminary results demonstrate that routers and other forwarding devices cannot be modeled as simple output port queues, even if correct rate limits are observed. We discuss our fu-  ture work plans for using our data to create high-fidelity networksimula-tion/emulation models that are not computationally prohibitive.

Added 2007-10-03

Emulation versus Simulation: A Case Study of TCP-Targeted Denial of Service Attacks

CERIAS TR 2007-76
R. Chertov, S. Fahmy, N. B. Shroff
Download: PDF

In this paper, we investigate the applicability of simulation and emulation for denial of service (DoS) attack experimentation. As a case study, we consider low-rate TCP-targeted DoS attacks. We design con-  structs and tools for emulation testbeds to achieve a level of control com-  parable to simulation tools. Through a careful sensitivity analysis, we ex-  pose difficulties in obtaining meaningful measurements from the DETER and Emulab testbeds with default system settings, and find dramatic differ-  ences between simulation and emulation results for DoS experiments. Our results also reveal that software routers such as Click provide a flexible ex-  perimental platform, but require understanding and manipulation of the underlying network device drivers. We compare simulation and testbed re-  sults to a simple analytical model for predicting the average size of the con-  gestion window of a TCP flow under a low-rate TCP-targeted attack, as a function of the DoS attack frequency. We find that the analytical model and ns-2 simulations closely match in typical scenarios. Our results also illus-  trate that TCP-targeted attacks can be effective even when the attack fre-  quency is not tuned to the retransmission timeout. The router type, router buffer size, attack pulse length, attack packet size, and attacker location have a significant impact on the effectiveness and stealthiness of the attack.

Added 2007-10-03

On Mutually-Exclusive Roles and Separation of Duty

CERIAS TR 2007-75
Ninghui Li, Ziad Bizri, Mahesh V. Tripunitara
Download: PDF

Separation of Duty (SoD) is widely considered to be a fundamental principle in computer security. A Static SoD (SSoD) policy states that in order to have all permissions necessary to complete a sensi-  tive task, the cooperation of at least a certain number of users is re-  quired. In Role-Based Access Control (RBAC), Statically Mutually Exclusive Role (SMER) constraints are used to enforce SSoD poli-  cies. In this paper, we pose and answer fundamental questions re-  lated to the use of SMER constraints to enforce SSoD policies. We show that directly enforcing SSoD policies is intractable (coNP-  complete), while checking whether an RBAC state satisfies a set of SMER constraints is efficient. Also, we show that verifying whether a given set of SMER constraints enforces an SSoD policy is intractable (coNP-complete) and discuss why this intractability result should not lead us to conclude that SMER constraints are not an appropriate mechanism for enforcing SSoD policies. We show also how to generate SMER constraints that are as accurate as pos-  sible for enforcing an SSoD policy.

Added 2007-10-03

Towards optimal k-anonymization

CERIAS TR 2007-77
Tiancheng Li *, Ninghui Li
Download: PDF

When releasing microdata for research purposes, one needs to preserve the privacy of respondents while maximizing data utility. An approach that has been studied extensively in recent years is to use anonymization techniques such as gen-  eralization and suppression to ensure that the released data table satisfies the k-anonymity property. A major thread of research in this area aims at developing more flexible generalization schemes and more efficient searching algorithms to find better anonymizations (i.e., those that have less information loss).  This paper presents three new generalization schemes that are more flexible than existing schemes. This flexibility can lead to better anonymizations. We present a taxonomy of generalization schemes and discuss their relationship. We present enumeration algorithms and pruning techniques for finding optimal generalizations in the new schemes. Through exper-  iments on real census data, we show that more-flexible generalization schemes produce higher-quality anonymizations and the bottom-up works better for small k values and small number of quasi-identifier attributes than the top-down approach.

Added 2007-10-03

t-Closeness: Privacy Beyond k-Anonymity and

CERIAS TR 2007-78
Ninghui Li, Tiancheng Li, Suresh Venkatasubramanian
Download: PDF

The k-anonymity privacy requirement for publishing mi-  crodata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with re-  spect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The no-  tion of

Added 2007-10-03

Usable Mandatory Integrity Protection for Operating Systems

CERIAS TR 2007-71
Ninghui Li, Ziqing Mao, Hong Chen
Download: PDF

Existing mandatory access control systems for operat-  ing systems are difficult to use. We identify several prin-  ciples for designing usable access control systems and in-  troduce the Usable Mandatory Integrity Protection (UMIP)  model that adds usable mandatory access control to oper-  ating systems. The UMIP model is designed to preserve system integrity in the face of network-based attacks. The usability goals for UMIP are twofold. First, configuring a UMIP system should not be more difficult than installing and configuring an operating system. Second, existing ap-  plications and common usage practices can still be used under UMIP. UMIP has several novel features to achieve these goals. For example, it introduces several concepts for expressing partial trust in programs. Furthermore, it leverages information in the existing discretionary access control mechanism to derive file labels for mandatory in-  tegrity protection. We also discuss our implementation of the UMIP model for Linux using the Linux Security Mod-  ules framework, and show that it is simple to configure, has low overhead, and effectively defends against a number of network-based attacks.

Added 2007-10-03

International Conference on Software Maintenance

C. Liu, X. Zhang, J. Han, Y. Zhang, and B. Bhargava
Download: PDF

Recent software systems usually feature an automated failure reporting component, with which a huge number of failures are collected from software end-users. With a proper support of failure indexing, which identifies failures due to the same fault, the collected failure data can help developers prioritize failure diagnosis, among other utilities of the failure data. Since crashing failures can be effectively indexed by program crashing venues, current practice has seen great success in prioritizing crashing failures. A recent study of bug characteristics indicates that as excellent memory checking tools are widely adopted, semantic bugs and the resulting noncrashing failures have become dominant. Unfortunately, the problem of how to index noncrashing failures has not been seriously studied before. In previous study, two techniques have been proposed to index noncrashing failures, and they are T-PROXIMITY and R-PROXIMITY. However, as T-PROXIMITY indexes failures by the profile of the entire execution, it is generally not effective because most information in the profile is fault irrelevant. On the other hand, although R-PROXIMITY is more effective than T-PROXIMITY, it relies on a sufficient number of correct executions that may not be available in practice. In this paper, we propose a dynamic slicing-based approach, which does not require any correct executions, and is comparably effective as R-PROXIMITY. A detailed case study with gzip is reported, which clearly demonstrates the benefits of the proposed approach.

Added 2007-09-29

International Conference on Software Maintenance

V. Nagarajan, R. Gupta, X. Zhang, M. Madou, B. De Sutter, and K. De Bosschere
Download: PDF

In many application areas, including piracy detection, software debugging and maintenance, situations arise in which there is a need for comparing two versions of a program that dynamically behave the same even though they statically appear to be different. Recently dynamic matching was proposed by us which uses execution histories to automatically produce mappings between instructions in the two program versions. The mappings then can be used to understand the correspondence between the two versions by a user involved in software piracy detection or a comparison checker involved in debugging of optimized code. However, if a program’s control flow is substantially altered, which usually occurs in obfuscation or even manual transformations, mappings at instruction level are not sufficient to enable a good understanding of the correspondence. In this paper, we present a comprehensive dynamic matching algorithm with the focus on call graph and control flow matching. Our technique works in the presence of aggressive control flow transformations (both interprocedural such as function inlining/outlining and intraprocedural such as control flow flattening) and produces mappings of interprocedural and intraprocedural control flow in addition to mapping between instructions. We evaluated our dynamic matching algorithms by attempting to match original program with versions that were subjected to popular obfuscation and control flow altering transformations. Our experimental results show that the control flow mappings produced are highly accurate and complete, for the programs considered.

Added 2007-09-29

International Symposium on Software Testing and Analysis

B. Xin and X. Zhang
Download: PDF

Capturing dynamic control dependence is critical for many dynamic program analysis such as dynamic slicing, dynamic information flow, and data lineage computation. Existing algorithms are mostly a simple runtime translation of the static definition, which fails to capture certain dynamic properties by its nature, leading to inefficiency. In this paper, we propose a novel online detection technique for dynamic control dependence. The technique is based upon a new definition, which is equivalent to the existing one in the intraprocedural case but it enables an efficient detection algorithm. The new algorithm naturally and efficiently handles interprocedural dynamic control dependence even in presence of irregular control flow. Our evaluation shows that the detection algorithm slows down program execution by a factor of 2.57, which is 2.54 times faster than the existing algorithm that was used in prior work.

Added 2007-09-29

International Symposium on Software Testing and Analysis

S. Tallam, C. Tian, X. Zhang, and R. Gupta
Download: PDF

Debugging long running multithreaded programs is a very challenging problem when using tracing-based analyses. Since such programs are non-deterministic, reproducing the bug is non-trivial and generating and inspecting traces for long running programs can be prohibitively expensive. We propose a framework in which, to overcome the problem of bug reproducibility, a lightweight logging technique is used to log the events during the original execution. When a bug is encountered, it is reproduced using the generated log and during the replay, a fine-grained tracing technique is employed to collect control-flow/dependence traces that are then used to locate the root cause of the bug. In this paper, we address the key challenges resulting due to tracing, that is, the prohibitively high expense of collecting traces and the significant burden on the user who must examine the large amount of trace information to locate the bug in a long-running multithreaded program. These challenges are addressed through execution reduction that realizes a combination of logging and tracing such that traces collected contain only the execution information from those regions of threads that are relevant to the fault. This approach is highly effective because we observe that for long running multithreaded programs, many threads that execute are irrelevant to the fault. Hence, these threads need not be replayed and traced when trying to reproduce the bug. We develop a novel lightweight scheme that identifies such threads by observing all the interthread data dependences and removes their execution footprint in the replay run. In addition, we identify regions of thread executions that need not be replayed or, if they must be replayed, we determine if they need not be traced. Following execution reduction, the replayed execution takes lesser time to run and it produces a much smaller trace than the original execution. Thus, the cost of collecting traces and the effort of examining the traces to locate the fault are greatly reduced.

Added 2007-09-29