In the research and academic community, scientific computing and engineering simulation provides a solution to problem insolvable by traditional theoretical and experimental method. These applications are usually so computational intensive and require huge amount computing resource. A potential solution is aggregating the computing power of a collect of computers connected by the network to provide user a virtual computing environment to meet the requirement of the applications.
In our project, we present a robust infrastructure that facilitates the constitution of such a virtual computing environment, manages resource and applications automatically, and provides fault tolerance service to applications transparently based on the technology of Jini and Mobile Agent.
The infrastructure situated between the applications and the underlying execution environment. Its position is denoted in figure 1. It insolates the heterogeneous and dynamic properties of the environment from the applications and provides user a unified virtual computing environment. Any computer or cluster accessible through network can provide its computing resource to the environment no matter its hardware architecture and operating system platform. The JFTI uniformly manage and use these resources. Users wonít be bothered with the details of the available resources if he wants to use the environment. All the user needs to do is only submitting the application and getting the result from the environment. With the Jini technology, the JFTI not only itself is robust but also offers fault tolerance to application without the user interference. The JFTI build a hierarchical system to monitor the status of resources and application execution. In case of system failure or resource unavailable any more, all affected application will be restarted from or migrated to other available resources and resume its execution from where it left. The function is implemented via checkpointing and migration mechanism.
Figure 1†††† †system hierarchical structure
The system architecture and its components interaction are shown in Figure 2.
Jini technology provides facilities to transport service on the network. In order to enable the automatic discovery of resources and deployment of application in our system, both of hosts and components are programmed as Jini services. A host registers its resource as well as its system properties with Resource Repository. On the registration, a mobile agent system is installed on that host so that the host will be able to run mobile agents. The adding and removal of a resource in the Resource Repository will be reflected to ADM. Some basic functions, which are common in scientific computing, are abstracted as components. They are wrapped as Jini services no matter what language they are programmed. Then they are registered with Component Repository and will be used to compose usersí applications. †After a composed application is submitted to the ADM, the ADM will find select a proper host from the Resource Repository according to the criteria along with each component and delegate the deployment task to a mobile agent. After the agent arrived the destination host, it downloads the execution code and starts the execution. During the execution, the component regularly checkpoint its state and write the checkpoint to the JavaSpaces. Meanwhile, the JFTIís monitor system watches the execution condition of each component. Once a failure is detected, the fault handlers will be notified. The fault handlers will take action according to the recovery stratege and the failure type. The application affected by the error will be restarted or migrated to other available resources and resume its execution from the latest checkpoint. The user is not aware of the changing of the underlying execution environment. ††
Figure 2†††††††††† JFTI System Architecture
ormal" align="center" style="text-align: center">JFTI