With cloud SLAs generally being worth what you don’t pay for them, what can you do to protect yourself? Sean Hull in AirBNB didn’t have to fail has some solid advice on how to deal with outages:
- Use Redundancy. Make database and webserver tiers redundant using multi-az or alternately read-replicas.
- Have a browsing only mode. Give users a read-only version of your site. Users may not even notice failures as they will only see problems when they need to perform a write operation.
- Web Applications need Feature Flags. Build in the ability to turn off and on major parts of your site and flip the switch when problems arise.
- Consider Netflix’s Simian. By randomly causing outages in your application you can continually test your failover and redundancy infrastructure.
- Use multiple clouds. Use Redundant Arrays of Inexpensive Clouds as a way of surviving outages in any one particular cloud.
None of these are easy and it’s worth considering that your application may not need them at all. Life will almost always go on anyway.
Sean has many more details in AirBNB didn’t have to fail.