Nova Publishers
My Account Nova Publishers Shopping Cart
HomeBooksSeriesJournalsReference CollectionseBooksInformationSalesImprintsFor Authors
  Top » Catalog » Books » Computer Science and Robotics » My Account  |  Cart Contents  |  Checkout   
Quick Find
Use keywords to find the product you are looking for.
Advanced Search
What's New? more
Doxycycline: Medical Uses and Effects
Shopping Cart more
0 items
Shipping & Returns
Privacy Notice
Conditions of Use
Contact Us
01.Robotics in Surgery: History, Current and Future Applications
02.From Problem Toward Solution: Wireless Sensor Networks Security
03.Introduction to Graph and Hypergraph Theory
04.Intelligent Vehicle Systems: A 4D/RCS Approach
05.Artificial Intelligence in Energy and Renewable Energy Systems
06.Computer Vision and Robotics
07.MOSFETs: Properties, Preparations and Performance
08.Expert Systems Research Trends
09.Progress in Autonomous Robot Research
10.Quantum Dots: Research, Technology and Applications
Notifications more
NotificationsNotify me of updates to Computation Checkpointing and Migration
Tell A Friend
Tell someone you know about this product.
Computation Checkpointing and Migration
Retail Price: $280.00
10% Online Discount
You Pay:

Authors: Vipin Chaudhary, Amherst, NY; Hai Jiang, Arkansas State University; John Paul N. Walters, Detroit, MI 
Book Description:
Computational clusters have long provided a mechanism for the acceleration of high performance computing (HPC) applications. With today’s supercomputers now exceeding the petaflop scale, however, they are also exhibiting an increase in heterogeneity. This
heterogeneity spans a range of technologies, from multiple operating systems to hardware accelerators and novel architectures. Because of the exceptional acceleration some of these heterogeneous architectures provide, they are being embraced as viable tools for HPC applications. Given the scale of today’s supercomputers, it is clear that scientists must consider the use of fault-tolerance in their applications. This is particularly true as computational clusters with hundreds and thousands of processors become ubiquitous in large-scale scientific computing, leading to lower mean-times-to-failure. This forces the systems to effectively deal with the possibility of arbitrary and unexpected node failure. In this book the address the issue of fault-tolerance via checkpointing. They discuss the existing strategies to provide rollback recovery to applications – both via MPI at the user level and through application-level techniques. Checkpointing itself has been studied extensively in the literature, including the authors’ own works. Here they give a general overview of checkpointing and how it’s implemented. More importantly, they describe strategies to improve the performance of checkpointing, particularly in the case of distributed systems.

We’ve partnered with Copyright Clearance Center to make it easy for you to request permissions to reuse Nova content.
For more information, click here or click the "Get Permission" button below to link directly to this book on Copyright Clearance Center's website.

Table of Contents:
Chapter 1. Introduction, pp. 1-7
1.1 Introduction to Checkpointing
1.2 Background on Checkpointing
1.2.1. LAM/MPI
1.2.2. Checkpointing Distributed Systems
1.2.3. Distributed State and Consistency

Chapter 2. Application-level Checkpointing/Migration, pp. 9-38
2.1 Thread Migration
2.2 Adaptive DSM Systems
2.2.1. Background
2.2.2. Strings
2.2.3. Thread Scheduling
2.2.4. DSM Migration Policy
2.2.5. Adaptation Points

2.3 Thread States

2.4 Compile-time Support
2.4.1. Function Call Graph
2.4.2. Data Variables
2.4.3. Pointers
2.4.4. Function Parameters
2.4.5. Program Counter
2.4.6. Adaptation Positions
2.4.7. Preprocessor

2.5 Run-time Support
2.5.1. Stacks
2.5.2. Memory Segments in Heaps
2.5.3. Thread State Transfer
2.5.4. State Restoration and Pointer Translation

2.6 Performance Analysis
2.7 Microbenchmarks
2.8 Experimental Results
2.9 Summary

Chapter 3. Migration Safety
3.1 Checkpointing/Migration-Unsafe Factors
3.1.1. Pointer Casting
3.1.2. Pointers in Unions
3.1.3. Library Calls
3.1.4. State-Carrying Instructions
3.1.5. Incompatible Data Conversion

3.2 Pointer Representations in C, pp. 39-60
3.2.1. Data Types in C
3.2.2. Data Updating Operations
3.2.3. Pointer Casting

3.3 Pointer Inference
3.3.1. Pointer Inference Rules
3.3.2. Static Analysis
3.3.3. Dynamic Check
3.3.4. Complexity

3.4 Microbenchmarks
3.5 Experimental Results
3.6 Related Research
3.7 Summary

Chapter 4. Heterogeneity Support, pp. 61-92
4.1 Data Representations in Heterogeneous Environments
4.1.1. Tags
4.1.2. Canonical Intermediate Form
4.1.3. Receiver-Makes-Right (RMR)

4.2 Data Conversion Issues
4.2.1. Endianness
4.2.2. Character Sets
4.2.3. Floating Point Standards
4.2.4. Data Alignment and Padding
4.2.5. Loss of Precision
4.2.6. Pointers

4.3 Coarse-grained Tagged RMR in MigThread
4.3.1. Tagging and Padding Detection
4.3.2. Data Restoration
4.3.3. Data Resizing
4.3.4. Address Resizing
4.3.5. Plug-and-Play

4.4 The Compile Time Support Module
4.5 The Run-time Support Module
4.6 Complexity Analysis
4.7 Microbenchmarks
4.8 Experimental Results
4.9 Related Research
4.10 Summary

Chapter 5. User-Level Checkpointing with LAM, pp. 93-106
5.1 User-Level Checkpoint/Fault Tolerance
5.2 User-Level LAM Checkpointing with Arbitrary Restart Structure
5.2.1. Existing Implementation
5.2.2. Enhancments to LAM's Checkpointing

5.3 Checkpoint Storage, Resilience, and Checkpointing
5.3.1. Dedicated Checkpoint Servers versus Checkpointing to Network Storage
5.3.2. Checkpoint Replication
5.3.3. The Degree of Replication
5.3.4. Restarting Computation
5.3.5. Scalability

Chapter 6. HPC and Virtualization, pp. 107-127
6.1 Virtualization within HPC
6.2 Virualization Background
6.2.1. Overview of Test Virtualization Implementation

6.3 Performance Results
6.3.1. Network Performance
6.3.2. File System Performance
6.3.3. Single Node Benchmarks
6.3.4. MPI Benchmarks

6.4 Fault Tolerant
6.5 Checkpointing/Restart System Design
6.5.1. System Startup
6.5.2. Checkpointing
6.5.3. Restarting
6.5.4. Data Resiliency to Node Failures

6.6 Checkpoint/Replication Analysis
6.7 Performance Results
6.7.1. Replication Overhead


      Embedded and High Performance Computing (Laurence Tianruo Yang)
   Binding: Hardcover
   Pub. Date: 2010
   ISBN: 978-1-60741-840-5
   Status: AV
Status Code Description
AN Announcing
FM Formatting
PP Page Proofs
FP Final Production
EP Editorial Production
PR At Prepress
AP At Press
AV Available
Special Focus Titles
01.Violent Communication and Bullying in Early Childhood Education
02.Cultural Considerations in Intervention with Women and Children Exposed to Intimate Partner Violence
03.Chronic Disease and Disability: The Pediatric Lung
04.Fruit and Vegetable Consumption and Health: New Research
05.Fire and the Sword: Understanding the Impact and Challenge of Organized Islamism. Volume 2

Nova Science Publishers
© Copyright 2004 - 2020

Computation Checkpointing and Migration