Single Point of Failure

On January 14, 2005, the Intelsat 804 satellite suddenly lost its power source and began drifting helplessly in space. This satellite provided much, and in some cases all, of the communications lines for countries from Sri Lanka to Samoa.

The effect of this sudden loss of service was particularly severe on Pacific island nations, because in many places this satellite represented the only communication link to the outside world. As of January 21, some countries were still offline, and others were still experiencing problems.

International communications were badly disrupted. International telephone and fax traffic stopped. Internet access was gone. Banks and credit card companies could not conduct transactions, leaving tourists without cash and resort owners accepting debts on faith alone. Airlines and airports could not communicate easily. Most importantly, disaster early warning systems were severely impaired.

When reviewing the list of affected countries, one thing quickly becomes clear: the countries most affected by the satellite failure were those whose communications systems had a single point of failure.

Single Point of Failure. Every network analyst knows, and fears, this term. It’s simple enough in principle: when planning a communications system, always make sure that there’s no single part whose failure can bring the whole system down.

In practice, it’s not as easy as it sounds. The failure of the Intelsat 804 satellite continues to cause significant problems throughout the Pacific region, particularly among the small island nations. This is mostly because the cost of communications makes having back-up satellite access very difficult.

Because of the relatively small amount of traffic they buy, Pacific island nations are relatively unimportant to international satellite providers. Technicians working to fix the problem reported spending hours, even days, trying to contact Intelsat staff. They spoke of being given emergency space on an alternative satellite, only to be bumped off by other customers.

The money that a satellite provider makes from a small island country is, relatively speaking, very small. From a business perspective, we’re not very important to them. But for us, international communications are more important than just business.

What if there had been a natural disaster? At the height of the hurricane season, in a region prone to earthquakes, volcanoes and tsunamis, this is not merely idle speculation. In fact, shortly after the outage occurred, there was a strong earthquake in Micronesia. Had it caused even a localised tsunami, the loss of communications could have cost us many lives.

Looking at the list of affected countries, it quickly becomes clear that those who suffered most are the ones who had only one connection to the outside world. Several countries had separate contracts for data and voice communications. When voice communications disappeared, they were able to use their data lines to compensate. In one case, technicians were able to use Voice Over IP (VOIP) protocols to enable outbound telephone calls within twelve hours.

What lessons can we take from this incident? It’s clear now that those carriers who relied on a single source for their data and voice communications paid most dearly. Their customers paid dearly too, in terms of lost business. It was pure luck that no lives were lost. Next time, we might not be so lucky.

But what can we do to prevent this happening again? The answer is to remove single points of failure wherever possible. Satellite communication is expensive, and underwater cable even more so. Still, it’s been demonstrated that opening national markets to multiple data carriers usually reduces prices for consumers and increases revenues for the carriers. In New Caledonia, data use has increased by one thousand percent since it opened its communications market three years ago. Importantly, they were one of the least affected nations when the Intelsat 804 failed.

Opening the communications market is not an appropriate answer for every island nation. Some are simply too small to support it. In these cases, using separate providers for voice and data service at very least ensures that if the one is lost, the other is still available.

Single Points of Failure are a liability in every system. International communications is one area where such a liability can cost lives.