Author
Joon S. Park, Pratheep Chandramohan, Avinash T. Suresh, and Joseph Giordano
Abstract
As information systems develop into larger and more complex implementations, the need for survivability increases. Also, as the need to protect information systems becomes increasingly vital as new threats are identified each day, it becomes more and more difficult to build systems that will identify and recover from such threats. This is particularly pressing for distributed mission-critical systems, which cannot afford a letdown in functionality even though there are internal component failures or compromises with malicious codes, especially in a downloaded component from an extremal organization. Therefore, when using such a component, we should check to see if the source of the component is trusted and that the code has not been modified in an unauthorized manner since it was created. Furthermore, once we find failures or malicious codes in the component, we should fix those problems and recover the original functionality of the component in runtime so that we can support survivability in the mission-critical system. In this paper we define our definition of survivability, discuss the survivability challenges in component-sharing in a large distributed system, identify the static and dynamic survivability models, and discuss their trade-offs. Consequently, we propose novel approaches for component survivability in runtime. Finally, we prove the feasibility of our ideas by implementing component recovery against component failures and malicious codes.