*
The moose likes Distributed Java and the fly likes Checkpoint question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Distributed Java
Bookmark "Checkpoint question" Watch "Checkpoint question" New topic
Author

Checkpoint question

jason williams
Greenhorn

Joined: Nov 17, 2004
Posts: 14
I am learning to program system which needs to survive over process crash in the cluster environment. And after reading and searching papers on the internet, I vaguely understand that would require program to provide checkpoint so that the state can be saved to stable (replicated) storage and recover later from there. I understand to achieve fault tolerance it would require other components e.g. failure detector, etc., but at the moment I want to gain more understanding on checkpoint issue.

However, most of the papers emphasize more on abstraction level. For instance, `Design Patterns for Checkpoint-Based Rollback Recovery' tells that communication induced checkpoint can prevent domino effect and it provides diagrams explaining the interaction between different components e.g. failure detector, checkpointer, etc. But now my problem is `how can I checkpoint to stable storage and recover seamlessly?' For instance, I will checkpoint a running program to a storage e.g. hadoop hdfs; when trying to recover the state, how can I ensure the program would resume to continuously execute as it were without a problem?

I appreciate any suggestion.

Many thank.

 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Checkpoint question
 
Similar Threads
My SCEA Part 1Study Notes
Failure mode of two reference calls to one thread
JFileChooser difficult problem
Should Bean developer throw RemoteException & related stuff
how to implement "buffering"