In our last post we talked about contracts and SLA's and while they are important, not all of the problems are actually the cloud provider's responsibility. Some of the problem could be your network.
Historically internet access wasn't always considered mission critical, but when your financial system, customer support, sales and storefront are online, it needs to be. Not all companies have figured that out yet.
The best redundancy is dual carrier. Some carriers will sell you a redundant option and this is good, but doesn't always protect you from "logic" errors. Internet routing is complex and like any complex system can have problems. These problems can sometimes, and frankly pretty rarely, but sometimes affect the provider network, even if you have redundant physical connections.
Other times the problem could be with the ISP's uplinks and who they interconnect with. This isn't really a problem with the bigger network players since they have multiple uplinks to all the other providers, but smaller regional providers could have limited redundancy of their own. Also do not forget your own redundancy. If both circuits go into the same closet and a water pipe breaks and ruins both sets of equipment that is just as bad.
It's also possible, though not a technical problem, that something silly like a billing problem could cause the provider to terminate your service, either intentionally or not. I have seen cases where a $6.24 bill that was billed incorrectly, and sent to the wrong address, caused the provider to terminate services. Getting circuits turned back on, for some reason, is a lot more difficult than turning them off.
So two carriers is best, but even then many providers resell other providers services and even using two carriers isn't foolproof. You have to ask which paths the circuits take, where they locally terminate and then how they get to a major POP. I have seen cases where two providers were both using the same fiber and a single fiber cut took down both providers and both of our redundant circuits.
Even using a wireless provider for a backup isn't foolproof. In Lawrence MA a few weeks ago a mattress fire took down a major fiber and copper conduit which took many cellular providers offline as well as the phones and internet for many local businesses. It even took down emergency 911 service, and if anyone takes redundancy seriously it's those guys. Lives really are at stake if 911 isn't working. Luckily they had coverage with other towns to assist.
Getting internet redundancy is hard and not as foolproof as we would like. Make sure to ask questions around where they circuit goes, who the interconnect with and what the escalation points are in case something does go wrong. Redundancy is difficult to setup but not impossible. As more services move to the cloud, it is important to make sure you can always get there.
Nice blog! Quite interesting to read! Internet redundancy is a new concept to me and i think my friend also have not heard about it. I would share this blog with them.
ReplyDelete