Fault tolerant computing-II
So the basics are there, with full recovery from 23 failures over 1000 timesteps on 32 processes: context: https://cosmicrays.wordpress.com/2011/07/06/fault-tolerant-computing/
Science and productivity by Dr. Christine Corbett Moran
So the basics are there, with full recovery from 23 failures over 1000 timesteps on 32 processes: context: https://cosmicrays.wordpress.com/2011/07/06/fault-tolerant-computing/
As a first step to writing my own simulation code while attempting to do something useful, a few days ago I started writing a code to explore failure and recovery from failure in a distributed computation. By failure in this case, I mean when one of the computation units goes down. My test system is N harmonic oscillators on N nodes (or processes on a shared memory machine). Read More …
Since last we spoke a couple of months ago, I had a hell of a time personally: I moved house, went from one location to another too often, had a major and very stressful financial crisis and had some rough times with friends which rocked the emotional boat. Although there are many things which didn’t Read More …