[an error occurred while processing this directive]

  Linux Letter 21

The Linux Letter for November 22, 2000

Hello again! I'm sure that this week's installment of The Letter comes as a huge surprise, especially since there hasn't been one for months. But I finally decided to drag myself over to the keyboard and do something useful for a change.

Plenty is new since the last Letter, more than I can talk about in one column. So hopefully I'll be able to pull myself together and string a few weeks' worth of them together for your reading enjoyment.

This weeks topic is high availability computing. It really doesn't have that much of an application to the average Linux user, but it's an interesting subject and one for which Linux is well suited.

A while back, Ebay, the Internet auction service, was down for almost a day. The company estimated that it lost about $40 million in business and the stock price plummeted in response. High availability computing is an option that companies can use to fight against catastrophic computer failures like that.

In order to understand why high availability systems are important, its necessary to have an idea about how they work. Designers of the systems analyze all of the possible ways that a computer system or network of computers can fail, and then devise schemes to either prevent the failures or work around them. Other methods of providing high availability include clustering multiple servers together and distributing the network load evenly amongst them, a concept called load balancing.

System monitoring is critical to high availability computing because if a node in a network goes down, the rest of the network needs to know immediately so that data does not become corrupted. Another potential problem in clustered servers is for a node to fail, then come back to life without going through a proper startup. If the rest of the network does not know the status of the node, data corruption is almost certain.

Functionally, a high availability system is composed of several nodes, with each node usually being a separate computer. Special software monitors the nodes and removes any failed nodes from service, seamlessly allowing the other nodes to continue to function without interruption.

The measurement of a high availability system is traditionally expressed in terms of uptime. 99.9% uptime is the accepted beginning of high availability and translates into about 9 hours of downtime per year. As the number of "nines" increases, so does the complexity of the system. Of course, increased complexity and reliability translates directly into increased cost, so true high availability systems tend to be quite expensive. But, as demonstrated by the Ebay case, the relative cost can be an easy amount to swallow.

Linux is a great candidate for high availability because the operating system's source code is freely available. The High Availability Linux Project is an open source attempt to develop such a system by creating software tools to monitor a cluster of Linux-based computers.

Mission Critical Linux ships a distribution based on some of the components of the High Availability Linux Project. SuSE is porting SGI's FailSafe high availability system to Linux. A high availability HOWTO provides suggestions for hardware and software to limit downtime.

High availability computing for Linux is still very much in the development stage, but with the explosive growth of online services such as banking and e-commerce, you can expect to see quick development of the technology. While high availability computing may not have direct applications to your situation as a home or small business user, the usual trickle-down effects of new technology development are sure to add more performance and reliability to our everyday desktop environment.

High-Availability Linux Project: http://linux-ha.org

Mission Critical Linux: http://www.missioncriticallinux.com/

SuSE Linux: http://www.suse.com

SGI: http://www.sgi.com


Hot Tip of the Week

You boot your system. Everything goes great until you get to the LILO prompt. There's no LILO…or maybe just an L…or LI…you get the picture. You know that something's wrong, but what is it? The "amount" of LILO displayed can give you a rough clue about what's going on in your system:

"L" - /boot/boot.b could not be loaded. Almost any kind of disk error can cause this.

"LI" - /boot/boot.b has been moved, but LILO wasn't reinstalled. Also, some kind of disk error may have happened.

"LIL" - LILO can't allocate the necessary map file. This is probably some kind of disk error.

"LIL?" - /boot/boot.b has been moved, but LILO wasn't reinstalled. Also, some kind of disk error may have happened.

"LIL-" - The map file data is invalid or (you guessed it) /boot/boot.b has been moved, but LILO wasn't reinstalled. Also, some kind of disk error may have happened.

"LILO" - Congratulations…LILO loaded successfully with no errors!


Happy computing!

Drew Dunn


Get your free email account...  TODAY!!!


The Power


[an error occurred while processing this directive]