AMQP UK, Thursday, 25th August 2011
When we designed our cloud service, we had to overcome several challenges. So we wrote a whitepaper as a pragmatic guide to what we learned and how we did it.
Anybody who has had to manage a large corporate IT network will know how hard it is to keep on top of monitoring. Here at stormmq we had a rather large problem designing a monitoring system that could cope with our diverse infrastructure. This is because our network is a cloud with many nodes, based in many locations with many diverse functions.
Running a cloud requires a lot of hardware, software and man hours: servers, operating systems and administrators. Running a cloud profitably requires that those resources are used to maximum effect; so that not a CPU cycle is wasted, not one piece of hardware sits in storage too long and any one administrator can manage as much hardware as possible. In essence, a cloud needs to run itself with minimal intervention. Adding hardware, changing software and preventative maintenance must be automated, self‐healing and self‐updating: the nirvana of zero configuration!
Normal network monitoring tools such as Nagios and Cacti are not designed for a cloud where resources appear and disappear frequently. It is because of this that we had to design our own system, built internally using our own AMQP service.