Autonomia : Autonomic Control and Management Environment

Overview

The evolution of computing research has gone through many generations starting from a single process running on a single computer system to multiple processes running on geographically dispersed heterogeneous computers that could span several continents (e.g., Grid). The techniques used to design Grid systems and services have been also evolutionary and ad hoc. Initially, the designers of such systems were mainly concerned about performance, and focused intensive research on parallel processing and high performance computer architectures and applications to address this requirement. As the scale and distribution of computer systems and applications evolved to cover critical areas where failures can be catastrophic, the reliability and availability of the systems and applications become a major concern. This, in turn has driven a separate research activities that focus on reliability and fault tolerance computing. In a similar manner, the research in computing security has also evolved to address the needs to protect the integrity and security of computing systems and their services without consideration to other important system attributes such as performance, reliability, and configurability because security and system assurance are the main objective of such systems and services.

This ad hoc and mainly isolated research activities have resulted in the design and development of specialized computing systems and applications that can efficiently optimize a few of the system attributes or functionalities. However, the emerging systems and applications are dynamic, complex, and their requirements change continuously within one application or a class of applications. Consequently, their high performance, fault tolerance, security, availability, configurability requirements might change dynamically at runtime. Hence, it is very critical for future computing systems and/or software architecture to be adaptive in all its attributes and functionalities (performance, security, fault tolerance, configurability, maintainability, etc.). There has been very little research to integrate all these techniques into one computing system and is mainly being characterized as ad hoc. However, the integration of these isolated techniques into one system produces a system that is complex, unpredictable, unmanageable and insecure (see Figure 1); the actions performed by the security technique might cancel the actions taken by the high performance computing technique and so on. It is a challenging research problem to integrate all these isolated techniques such that it is feasible to maintain simultaneously and efficiently in real-time all system functionality and attributes such as performance, fault, and security. In fact, we need urgently a new paradigm to design and develop large-scale complex and adaptive systems and applications.


Figure 1: Integration of isolated solutions

This has led researchers to consider alternative programming paradigms and management techniques that are based on strategies used by biological systems to deal with complexity, dynamism, heterogeneity and uncertainty C a vision that has been referred to as autonomic computing. Autonomic computing is inspired by the human autonomic nervous system that handles complexity and uncertainties, and aims at realizing computing systems and applications capable of managing themselves with minimum human intervention. There have been several efforts [1,2] to characterize the main features that make a computing system or an application autonomic. However, most of these techniques agree that an autonomic system must at least support the following four features:

(1) Self-Protecting: an autonomic system is equally prone to attacks and hence it should be capable of detecting and protecting its resources from both internal and external attacks and maintaining the overall system security and integrity.(2) Self-Optimizing: an autonomic system should be able to detect sub-optimal behaviors and intelligently perform self-optimization functions. (3)Self-Healing: an autonomic system must be aware of potential problems and should have the ability to reconfigure itself to continue to function smoothly. (4) Self-Configuring: an autonomic system must have the ability to dynamically adjust its resources based on its state and the state of its execution environment.

It is an important to note that an autonomic computing system addresses these issues in an integrated manner rather than being treated in isolation as is currently done. Consequently, in autonomic computing, the system design paradigm takes a holistic approach that can integrate all these attributes seamlessly and efficiently. The autonomic system can be viewed as a collection of autonomic components or building blocks that each can support the four properties outlined above. That means, each AC can be dynamically and automatically configured (self-configuration), seamlessly tolerate any component failure (Self-healing), automatically detect attacks and protect against them (Self-protection), and automatically change its configuration parameters to improve performance once it deteriorates beyond certain performance threshold (Self-optimization). Once these autonomic components become available, we can dynamically build an autonomic computing system to meet any static or dynamic requirement such as cost-effective high performance systems, high performance and secure systems, etc. as shown in Figure 2.


Figure 2: Holistic Approach: Autonomic Computing System

Our research goal is to investigate a holistic approach that can control and management networked resources and services. Our autonomic computing environment --- Autonomia [3, 4], provides system administrator with all the tools required to specify the appropriate control and management schemes and the services to configure the required software and hardware resources. Our autonomic management is implemented using two software modules: Component Management Interface (CMI) that enables us to specify the configuration policies and operational policies associated with each component that can be a hardware resource or a software component; and Component Runtime Manger (CRM) that monitors the component operational state through well defined management interface ports.

top 

Research Summary

1. Autonomic Component

Our approach to autonomize any software module or network resource is based on augmenting the resource with two entities [6]: Component Management Interface (CMI) and Component Runtime Manager (CRM), as shown in Figure 1. It extends traditional components (e.g., CORBA and Java beans) with provisions to support autonomic features (e.g., self-configuring, self-protecting, and self-healing) and thus behave autonomically. This architecture can either be used to implement autonomic components from scratch or add autonomizing provisions to existing software modules or resources.


Figure 1: Autonomizing a software system or hardware resource

CMI consists of three ports: Configuration Port, Control Port and Operation Port. It is a passive module that stores all control and management policies that govern the operation of a component that can be a hardware resource or a software module and its interaction with the environment. However, CRM represents the active module that aims at enforcing the policies specified in CMI at runtime. Consequently, CRM continuously monitors and analyzes the execution of its component and interrupts its operation once it detects that the execution environment of the current component cannot meet the desired operational and functional requirements. The planning module will determine the appropriate corrective actions that will be performed by the execution module.

1.1 Component Management Interface (CMI)

CMI provides three ports to specify the control and management requirements associated with each software component and/or network resource, as briefly described below:

Configuration Port: It defines the configuration attributes required to automate the deployment of the component/resource and how to setup its execution environment. A component configuration attributes include its name, resource requirement, configuration parameters, and dependency specifications.

Control port:It consists of two modules: sensor and action modules. The senor module defines all the measurement attributes that must be monitored at runtime, while the action module defines all the actions (e.g., stop, migrate, invoke) that can be performed on the component or resource in order to force the component/resource to behave autonomically.

Operation port:It defines policies for normal/acceptable component/resource operations. These policies are described using two types of rules: Behavior Rules and Interaction Rules. The behavior rules describe the rules that govern the normal operations as a stand-alone component (e.g., CPU utilization, memory usage, CPU-MEM interactions), while the interaction rules specifies the rules that describe how this component should interact with its environment when it is behaving normally (e.g., when a component is compromised and started attacking its environment, such behavior should be detected by the component interaction rules). Operation port rules can be expressed as

IF Condition1 AND Condition2 .. AND Conditionm, THEN Action1 AND Action2..AND Actionn

Each rule is expressed with a conjunction of one or more conditions and actions. Condition is the logical combination of a component/resource state and/or measurement attributes or an external event triggered by other components or resources. Action can invoke the sensor/actuator functions specified in the control port or send out events to other components.

1.2 Component Runtime Manager (CRM)

CRM is the local control system associated with a component that continuously monitors the component operations, analyzes the current state, plans the appropriate corrective actions if needed, and executes these actions to bring the component back to acceptable normal state of operation. The CRM control and management algorithm is shown in Figure 2.

CRM keeps checking the CMI for any changes to the current management strategies (step 2). If changes are specified, CRM will then read and parse the CMI file (steps 2~4). CRM also monitors the state of the component using all the available component sensor functions that are specified in the CMI sensor module (step 5). The monitored information is then analyzed by the analysis module to determine if there are any severe deviations from the desired state (step 7). Furthermore, the CRM also monitors any events or messages that might have been sent to this component (step 6). If the analysis module determines that the component is violating its operational and functional requirements, the planning module is called to determine the appropriate actions (step 8) that will be performed by the execution module to bring the component to normal operations (step 9 and 10). If any event is received, then the appropriate actions for this event will also be performed by the execution module (steps 12~14).


Figure 2: control and management algorithm of the CRM

The CRM algorithm manages in an integrated approach all the component requirements with respect to performance, fault, security and configuration. For example, if the component state deviation caused by malicious attacks, the planning module will determine the appropriate protective security actions to mitigate and prevent the current attacks. Similarly, when the component state deviates from normal behavior due to failures, the planning module will determine the appropriate actions to recover from the failures and continue normal operations. Consequently, CRM can handle in a holistic way all the desired component attributes rather than having each attribute handled by separate mechanisms.

2.Autonomia Architecture

Autonomia [5, 6, 7, 8] provides users with all the tools required to specify the appropriate control and management schemes and the services to configure and deploy the required software and network resources and then manage their operations to meet the overall system requirements. The main Autonomia modules include System Management Editor, Autonomic Management Web Services, Component Runtime Manager (CRM) and Compound Component Runtime Manager (CCRM).

The System Management editor is used to specify the component management policies according to the specified CMI schema. The Autonomic Management Web Services such as fault tolerant service and performance service are provided and can be invoked by the Compound Component Runtime Manager (CCRM). CRM is a runtime manager that aims at monitoring the component behavior and controls its operation in order to maintain the desired component attributes and functionalities. Several autonomic components (e.g., autonomic servers, clusters, and software systems) can be controlled and managed by one autonomic system that we refer to as an Autonomic Compound Component (ACC) (see Figure 3), and the corresponding CRM refers to compound CRM (CCRM). In a similar way, larger autonomic systems can be built by composing several autonomic compound components.


Figure 3: Autonomia Architecture
top 

Programmable Autonomia

The Autonomia system was used as a base model for developing the programmable Autonomia model. Changes were made to the system so that the applications components and network systems can be managed in a better way.

Database

In the Autonomia the sensed values were stored in a text file. Each sensor was associated with a variable. After sensing the value the result is written into a text file called CIB file. The text file gets over written every time the sensor senses the component attribute. So the text file saves only the current value of the component attribute. This was our motivation of saving the attributes in a database rather than saving it in a text file. We can also specify the maximum number of records of the variable we need to store. If we give x to the maximum number of records it will store the last x records of the variable.

Local Variables

In the programmable Autonomia we have included the feature to declare local variables. Each CMI/CRM module can have its own set of local variables. These local variables can be used as counters. These local variables can also be used to define the state of the CMI/CRM. The local variables are declared in the configuration port with a tag of variables.

Sensors

The new Autonomia has the sensed values saved in a database rather than being stored in a text file. This facility as described before helps us in having multiple variables associated with the sensor and also the ability to access the older instances of the variable. In the new Autonomia we have included the feature to check the status of multiple variables belonging to different sensors at the same time. In the behavior policy as well as in the interaction policy we can check the sensor condition along with the local variable condition.

Machine Name

The new Autonomia has the ability to get the machine name by itself. Also in the behavior policy and interaction policy we can include a check for the machine name. This feature allows us to have the CMI same for all the client modules and all policies condition will have an additional check for the machine name. By doing this only those actions that corresponds to the particular client gets executed in that client machine.

Passing variables

In the new Autonomia it is possible to send local variable values to remote machines. This is a very powerful tool as it enables the user to send the state of a machine to another system.

top 

People


Youssif Al-Nashif
email:
website: http://www.ece.arizona.edu/~alnashif

Research Areas and Interests: Network Security, Autonomic Computing & Management, Autonomic Faults Managements, Data Mining, AI, Distributed Computing, High Performance Computing, Grid Computing, Scientific Visualization Simulation and modeling.


Sankaranarayanan Veeramoni Mythili
email:
website: http://ece.arizona.edu/~sankar

Research Areas and Interests: Computer Architecture, Operating Systems, Distributed Systems, Computer Networks, Autonomic Computing.

top 

Related work

top 

Publications

Journals

1. Salim Hariri, Guangzhi Qu, Huoping Chen, Youssif Al-Nashif, Mazin Yousif, Autonomic Network Security Management: Design and Evaluation, submitted to ACM Transactions on Autonomous and Adaptive Systems - Special Issue on Adaptive Learning in Autonomic Communication, 2007

2. S. Hariri, B. Khargharia, H. Chen, Y. Zhang, B. Kim, H. Liu and M. Parashar, "The Autonomic Computing Paradigm", Cluster Computing: The Journal of Networks, Software Tools and Applications, Special Issue on Autonomic Computing, Vol. 9, No. 2, 2006, Springer-Verlag.

3. Salim Hariri, Guangzhi Qu, Modukuri Ramkishore, Huoping Chen, Mazin Yousif, "Quality of Protection (QoP) C an Online Monitoring and Self Protection Mechanism", IEEE Journal --- Special Area in Communication (J-SAC) special issue "Recent Advances in Managing Enterprise Network Services", October 2005.

Conference Proceedings

4. Huoping Chen, Youssif B. Al-Nashif, Guangzhi Qu, and Salim Hariri, Self-Configuration of Network Security, Submitted to IEEE International Conference of Autonomic Computing (ICAC), 2007

5. Youssif B. Al-Nashif, Guangzhi Qu, Huoping Chen, and Salim Hariri, Autonomic Network Defense (AND) System: Design and Analysis, , Submitted to IEEE International Conference of Autonomic Computing (ICAC), 2007

6. Huoping Chen, Salim Hariri,Autonomic Computing Design Methodology, 1st IEEE International Workshop on Modeling Autonomic Communications Environments (MACE 2006), October 25-26

7. Huoping Chen, Salim Hariri, and Fahd Rasal; "An Innovative Self-Configuration Approach for Networked Systems and Applications"; 4th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), 2006

8. Jingmei Yang, Huoping Chen, Salim Hariri, and Manish Parashar, "Self-Optimization of Large Scale Wildfire Simulations", International Conference on Computational Science (ICCS), 2005

9. Jingmei Yang, Huoping Chen, Salim Hariri, and Manish Parashar, "Autonomic Runtime Manager for Large Scale Adaptive Distributed Applications", IEEE HPDC 2005

10. Huoping Chen, Byoung Kim, Jingmei Yang, Salim Hariri, Manich Parashar; "Autonomic Runtime System for Large Scale Parallel and Distributed Applications"; UPP Workshop (Unconventional Programming Paradigms), Sept 15-17, 2004

11. Huoping Chen, S. Hariri, Byoung Kim, Ming Zhang, Yeliang Zhang, Bithika Khargharia, Manish Parashar; "Self-Deployment and Self-Configuration of Network Centric Service"; IEEE International Conference on Pervasive Computing (IEEE ICPC), 2004

12. Salim Hariri, S., Lizhi Xue, Huoping Chen, Ming Zhang, Pavuluri, S., Soujanya Rao; "AUTONOMIA: an autonomic computing environment"; 2003. Conference Proceedings of the 2003, IEEE IPCCC.

13. vGrid: A Framework for Development and Execution of Autonomic Grid Applications, B. Khargharia, S. Hariri, B. Kim*, M. Zhang*, P. Vadlamani, and M. Parashar, New Frontiers in High-Performance Computing, Proceedings of the Autonomic Applications Workshop, 10th International Conference on High Performance Computing (HiPC 2003), Hyderabad, India, Elite Publishing, pp 269 - 285, December 2003.

top 

References

[1] Paul Horn, Autonomic Computing: IBMs perspective on the State of Information Technology http://researchweb.watson.ibm.com/autonomic/

[2] Autonomic Distributed Computing in Scientific Applications. International Workshop on Future Directions in Distributed Computing. 3-7 June 2002, Bertinoro, Italy.

[3] http://www.research.ibm.com/autonomic.

[4] Gail Kaiser, Phil Gross, Gaurav Kc, Janak Parekh, Giuseppe Valetto: An Approach to Autonomizing Legacy Systems; IBM Almaden Institute Symposium, 4/2002

[5] David Patterson, Aaron Brown, et al, Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies; Computer Science Technical Report UCB//CSD-02-1175, U.C. Berkeley March 15, 2002

[6] Salim Hariri, C.S. Raghavendra, Yonhee Kim, Rinda P. Nellipudi, et al; CATALINA: A Smart Application Control and Management.Active Middleware Services Conference, 2000.

[7] Brown, A. and D. A. Patterson. Embracing Failure: A Case for Recovery-Oriented Computing (ROC). 2001 High Performance Transaction Processing Symposium, Asilomar, CA, October 2001.

[8] S. Hariri, Y. Kim, M. Djunaedi. Design and Analysis of a Proactive Application Management System. Proc. of NOMS2000; April 2000.

[9] R. Koo, S. Toueg. Checkpointing and Recovery-Rollback for Distributed Systems. IEEE Transactions on Software Engineering; Vol. SE-13, No. 1; pp. 23-31; 1987.

[10] M. A. Iverson, F. Ozguner, L. C. Potter. Statistical Prediction of Task Execution Times through Analytic Benchmarking for Scheduling in a Heterogeneous Environment. Eighth Heterogeneous Computing Workshop (HCW'99).

[11] Michael O. Neary, Bernd O. Christiansen, Peter Cappello, and Klaus E. Schauser: Javelin: Parallel Computing on the Internet. Future Generation Computer Systems. October 1999

[12] Hiromitsu Takagi, S. Matsuoka, H. Nakada, et al, Ninflet: A Migratable Parallel Objects Framework using Java, In proc. of the ACM 1998 Workshop on Java for High Performance Network Computing, 1998.

[13] Michael O. Neary, Alan Phipps, Steven Richman, and Peter Cappello, Javelin 2.0: Java-Based Parallel Computing on the Internet, In Euro-Par 2000.

[14] Adam J. Ferrari, Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems. Technical Report CS-96-15, Department of Computer Science, University of Virginia, Charlottesville, VA, October 10, 1996.

[15] Nathalie Furmento, Anthony Mayer, Stephen McGough, et al, Optimisation of Task-based Applications within a Grid Environment, SuperComuting 2001

[16] R. Armstrong, D. Gannon, A. Geist, et al, Toward a Common Component Architecture for High-Performance Scientific Computing. In Proc. of the 8th High Performance Distributed Computing, 1999.

[17] K.A.Hawick and H.A.James, Dynamic Cluster Configuration and Management using JavaSpaces, 2001 IEEE International Conference on Cluster Computing, 2001.

[18] http://java.sun.com/products/jms/

[19] http://java.sun.com/products/jini/

[20] Luis F. G. Sarmenta, Satoshi Hirano, Bayanihan: Building and Studying Web-Based Volunteer Computing Systems Using Java.

[21] F.Cristian, Understanding Fault Tolerant Distributed System, Communication on ACM, vol34, 1991.

[22] A. Baratloo, M. Karaul, Z. Kedem, P. Wyckoff, Charlotte: Metacomputing on the Web. In proc. Of the 9th International Conference on Parallel and Distributed Computing Systems, 1996.

[23] Hiromitsu Takagi, S. Matsuoka, H. Nakada, et al, Ninflet: A Migratable Parallel Objects Framework using Java, In proc. of the ACM 1998 Workshop on Java for High Performance Network Computing, 1998.

[24] Michael O. Neary, Alan Phipps, Steven Richman, and Peter Cappello, Javelin 2.0: Java-Based Parallel Computing on the Internet, In Euro-Par 2000.

[25] A.Beitz, S.Kent, and P.Roe. Optimizing Heterogeneous Component Migration in the Gardens Virtual Cluster Computer. In Heterogeneous Computing Workshop, May 2000.

[26] A. Baratloo, M. Karaul, H. Karl, Zvi M. Kedem. An Infrastructure for Network Computing with Java Applets. In Proceedings of ACM workshop on Java for High Performance Network Computing, February 1998.

[27] H. Topcuoglu, S. hariri, D. Kim, Y. Kim, X. Bing, B. Ye, I. Ra, J. Valente, The Design and Evaluation of a Virtual Distributed Computing Environment", The Journal of Networks, Software Tools and Applications(Cluster Computing), 1998

top 


Sponsors




 

 

 
Phone Number: (520) 621-9915 Room 251, ECE Dept. 1230 E. Speedway Tucson, AZ 85721-0104
ACL - © Copyright 2007, Webmaster: Youssif Al-Nashif
All Rights Reserved