data:image/s3,"s3://crabby-images/ba181/ba181c50fa0049fbcb66c3126388cc8319d2fccc" alt=""
Fault tolerant computing
As a first step to writing my own simulation code while attempting to do something useful, a few days ago I started writing a code to explore failure and recovery from failure in a distributed computation. By failure in this case, I mean when one of the computation units goes down. My test system is N harmonic oscillators on N nodes (or processes on a shared memory machine). Read More …