Updated 19-Sep-2005
|
Willow Survivablity Architecture
Introduction |
Papers |
Software |
Related Projects
Introduction
The Willow system is designed to support the survivability of
large distributed information systems. As part of its approach, Willow
deals broadly with their faults, applying:
- fault avoidance by disabling vulnerable network elements
intentionally when a threat is detected or predicted,
- fault elimination by replacing system software elements when faults
are discovered, and
- fault tolerance by reconfiguring the system if non-maskable damage
occurs.
The reactive component of Willow supplements the usual information system
fabric with a comprehensive fault-tolerance mechanism referred to as a
survivability architecture or information survivability control system
(paper). The key to the architecture is a powerful reconfiguration
mechanism that is combined with a general control loop structure in which
network state is sensed, analyzed, and required changes effected.
The main challenges Willow overcomes are those of scalability and
complexity. The information systems of interest are composed of (1)
complex arrangements of very large numbers of computing nodes (hundreds of
thousands to millions), (2) apply extensive communication facilities, and
(3) operate across many domains of ownership and responsibility.
Resultingly, the damage to such systems may be complex and widespread.
This damage then necessitates extensive, complex error recovery. Our
implementation strategy explictly deals with precisely these issues of
scalability and complexity in both detection and response.
Willow is based on declarative specification of large-scale
fault-tolerance programming. These specifications of fault detection and
reaction are used directly to generate a specific instantiation of Willow.
Willow provides demonstrably efficient, automated fault tolerance over
large and complex systems with an innovative distributed architecture. The
Willow system is fully implemented, allowing us to investigate its
application and efficiency with respect to survivability problems.
The major components of Willow are:
- Siena - a content-based networking infrastructure that allows
scalable and efficent event notification services
- Selective Notification - a unified notification paradigm where
notifications can be targeted based on formally specified sender
qualifications, notification contents, or receiver attributes
- Laminarflow - a workflow system and language that allows for both
process and constraint tasks with enumerated, bounded-time conflict
resolution via reasons and intentions
- ANDREA - a command system that uses Selective Notification to
project workflows to approriate receivers and provide scalable feedback
- SPARTAN - a hierarchical, adaptive control system to detect and
respond to network sensor events
- TEDL - an object-oriented specification language that uses sets of
events to detect abstract events that require response
We are also applying the paradigms of Willow to the domain of emergency response in the STILT project.
Selected Papers
- Rowanhill, Jonathan C., Philip E. Varner and John C. Knight.
Efficient Hierarchic Management For Reconfiguration of Networked
Information Systems
Accepted to DSN 2004
(PDF)
- Hill, Jonathan C., John C. Knight
Selective Notification: Combining Forms of Decoupled Addressing for Internet-Scale Command
and Alert Dissemination
Technical Report CS-2003-14, University of Virginia, Department of Computer
Science (April 2003)
(PDF)
-
Varner, Philip E.
Policy Specification for Non-Local Fault Tolerance in Large Distributed Information
Systems
M.S. Thesis, May 2003 (PDF)
-
Knight, John C., Dennis Heimbigner, Alexander Wolf, Antonio Carzaniga, Jonathan Hill, Premkumar
Devanbu, Michael Gertz
The Willow Architecture: Comprehensive Survivability for Large-Scale Distributed
Applications
submitted to: DSN-2002 The International Conference on Dependable Systems and Networks,
Washington DC (June 2002)
(PDF)
Software
Willow is fully implememented in Java as a distributed architecture.
If you are interested in the software please contact us.
Related Projects
|