Dependable Computing Systems Laboratory


Home	Projects	Publications	Presentations	People	News	Activities	About DCSL	Internal


Project Title: Adepts (Intrusion Tolerant Distributed Systems) Description As distributed systems are deployed for running critical applications, there is an increasing need to make such systems resilient. The distributed applications running on such platforms need continuous uptime, as downtime translates directly to financial losses, loss of prestige, or endangerment of human lives. Examples of such applications abound in the domains of banking, finance, airline, and military. The huge impacts caused by failures of distributed systems has led to significant progress in the development of runtime detection systems. They fundamentally operate by matching runtime signatures with either expected behaviors or known failure patterns. In addition, there are a growing number of intrusion prevention systems, which aim to prevent intrusions by taking pro-active measures, such as, updating virus definitions (for anti-virus products) and attack signatures (for network based intrusion detection systems). We contend that to build robust distributed systems, it is not enough to provide detection and prevention mechanisms, but one also needs to provide runtime diagnosis and response mechanisms in the security infrastructure. Diagnosis identifies the services that have been affected by the failure. Response has two goals – limit the damage due to the current attack and increase the robustness to future attacks by learning from past observations. The existing work on diagnosis and response for security attacks has often been predicated on determinism and perfect knowledge, implying that the effect of the response action is deterministic, the state of the system is one of a set of pre-computed states, and there exists knowledge of all the ways in which an attack may occur. However, in practice, these assumptions are often violated – there are likely to be unknown failure paths, uncertainty in determination of root cause, and imperfect detectors, which have to be handled by the distributed response system. Compared to the problem of detection, automated diagnosis and response have received far less research attention. These have typically been considered the domain of system administrators who manually “patch” or reconfigure the system in response to the detected failures. However, as distributed services become complex and automated scripted attacks necessitate response in machine time rather than human time, systems for providing diagnosis and response gain importance. Existing detectors often come equipped with the capability for statically configured local responses, such as, Snort detecting a malicious network packet can terminate the network connection from the source address, or Norton anti-virus detecting a file to be infected can quarantine the file. However, such responses may be sub-optimal or untimely in distributed systems with multi-stage attacks. An intrusion response system (IRS) for distributed systems is very different from one for a stand-alone system. Defending a distributed system implies defending not just the individual hosts and services but also the networks that connect them. One needs to consider the interaction effects among the multiple services both to accurately identify patterns of the intrusions relevant to the response process and to identify the effectiveness of the deployed response mechanism. The few distributed IRSs that exist ([41]-[49]) do not take into account the multiplicity of often-conflicting factors that determine a reasonable response, such as the disruptivity caused by the response to normal users, the effectiveness of the response for the specific attack type, or the confidence that an attack is indeed in progress. Importantly, the response systems have not been proposed with the ability to learn from past sequences of attacks, dealing with imperfect knowledge about the state of the system or the attack path, or incorporating domain specific knowledge for, say, the optimized response to a particular attack. Any of these can be a fatal inadequacy for a real-world distributed application. In this project, we have been designing an IRS, called Adepts, that is capable of providing online and offline response, including containment, under all of the challenging conditions mentioned above. The work in the project has resulted in the development of an IRS integrated with detectors that will be applied to real-world application systems. Our work develops algorithms that are aware of the dynamics of distributed application systems – spread of failures through messages, legitimate interaction paths between the services, and evidence of a failure reported from multiple points, and is therefore uniquely applicable to distributed systems. The basic approach is to use two knowledge bases – representation of paths for the spread of multi-stage attacks through the system and legitimate interactions of the services in the system. Adepts takes as input alerts from detectors embedded in the application system. The algorithms make decisions based on runtime alerts and the pre-configured knowledge bases to contain the effect of the current attack and to make the application system resilient to future attacks of a similar kind. Adepts applies techniques from formal learning theory for making greedy, computationally efficient response decisions that will maximize the high level benefit to the owner of the system. The decisions are evaluated as further alerts are observed and result in updates to the knowledge base. Our current work is proceeding in three directions: Application of heuristic based search through the search space of response choices Intrusion response with incomplete knowledge bases Use of Bayesian learning to diagnose affected services from incomplete and imperfect detector alerts Students involved Current Yu-Sung Wu (PhD), Bingrui Foo (PhD), Gaspar Modelo-Howard (PhD), Ratsameetip Wita (Exchange student) Past Yu-Chun Mao, Matthew Glause

465 Northwestern Avenue, West Lafayette, IN 47907 | dcsl@ecn.purdue.edu | +1 765 494 3510