Everyone loves the cloud. It’s the end of all ills. It’s cheap, the security is good and a service like AWS will never fail you.
OK – we all know that’s not really true, but it’s crazy how frequently people forget to use common sense and caution when it comes to cloud migration and deployment. There is one thing that people tend to forget or ignore or disregard: Cloud fails much more often than you would think or hope.
Hosting your app with one or more multi-region providers such as AWS or Microsoft Azure is no guarantee against issues.Take the recent AWS US-East outage. When a whole region goes down like this, even people with multi-cloud deployments suffer the consequences of bottlenecks and throttling measures while replicating data or spinning up DR systems in other regions or hosted with other cloud providers. And just to make it more painful, later on you may find yourself billed for additional professional services or data transfer costs incurred in getting your service back online.
So – what are the key points for reducing risk in a cloud implementation?
- To deploy code, apps and tools that we need to run the business with less risk we need not only virtualization, but cloud. Cloud is about having a 100% disposable infrastructure that you can replicate anywhere anytime with little to no effort.
- Cloud is not about having a strong virtualisation platform: Xen, VMware that is able to create backups and move vms around when the host fails. We don't care about backups or snapshots, we care about flexibility and speed in creating infrastructure.
- In an extreme implementation of cloud, you would even make the database disposable, but this is a more controversial topic.
- You should be able to create any environment (prod,dev,qa, etc) in minutes by running one simple command.
- You should also be able to automate tests by spinning up whole production-like environments automatically when new code is pushed into the repository.
- Imagine that we have all our web servers in AWS and it goes down. All regions go down. Well, in the ideal scenario we wouldn’t care. If they cannot offer us a quick recovery but we have the automation, we should be able to spin up the whole infrastructure in a new cloud platform: Azure, Google, Rackspace, etc.
- Don’t put all your eggs in one basket. You can do hybrid-cloud environments where you have x% of your platform in our cloud provider and the rest in a different one or even on premise using cloud “in a box” solutions like HP Eucalyptus, Stratoscale Symphony, Openstack, or Azure Cloud Platform System.
- You can even automate the update of the DNS records via API in the main cloud providers if you wish.
- When you change your architecture, update your automation! One of my former clients didn’t do this and was calling me up asking for help in recovering their installation because the automation could not replicate the updated architecture.
- Any multi-region or multi-cloud architecture automatically means much more money spent, but there's no way around it unless you can afford downtime.
- Support cost: I've seen is a false sense of comfort when people pay for support to a cloud provider. When they have an outage, they can’t use it and try phoning only to get a recorded message. Don’t bother is my feeling! Cloud services have extensive documentation about the procedures and products and also forums are full of people who are happy to help you with doubts.In a crisis situation, these are likely to be more value than the paid support line.
Is Cloud wrong?
No, of course not! We need cloud to build better tools for the business. We shouldn’t have to manage with a 500GB database that is painful to migrate or upgrade, development platforms that are hard to manage and replicate or spend time maintaining services like messaging buses that could be offloaded to a cloud platform. We need cloud to free up the space, but we have to be extremely aware of the consequences of doing so.
The answer is simple really – it’s just not free and it’s not hassle-free. Bad luck folks, sucks to be us! We still need to work on our automation and be able to create our whole infrastructure in 30min elsewhere –and then we can take advantage of everything that a cloud platform offers with less of the risk. A geo-distributed database and lots of automation should allow anyone to feel truly secure in their cloud services.