High Availability refers to the system uptime, usually measured as a percentage of system uptime over a period of time. An example might be a server, and if it is up all the time, with absolutely no downtime in a period of one year, it is considered 100% available. As you might imagine, it is nearly impossible to maintain 100% availability for modern systems. You have a fairly complex network that might block users access to your server, or the server might need Windows updates that require a reboot, etc.
As a technology professional, your success in achieving 100% uptime is often outside of your control, but something you must manage. Usually this is managed through setting user expectations. You shouldn’t promise your users 100% uptime, but you should strive for as much uptime as possible. The commonly accepted percentage of maximum uptime that is deemed reasonable is 99.999%, which means your server will be unavailable less than 6 minutes over a one year period of time, or about 25 seconds per month. This is possible, but only if you manage this metric as closely as possible. You must manage the server uptime very closely, scheduling updates and reboots, and refusing to accept that a reboot will fix all known issues.
High Availability Measurements
|Availability %||Downtime per year||Downtime per month|
|90%||36.5 days||72 hours|
|95%||18.25 days||36 hours|
|97%||10.96 days||21.6 hours|
|98%||7.30 days||14.4 hours|
|99%||3.65 days||7.20 hours|
|99.5%||1.83 days||3.60 hours|
|99.8%||17.52 hours||86.23 minutes|
|99.9%||8.76 hours||43.8 minutes|
|99.95%||4.38 hours||21.56 minutes|
|99.99%||52.56 minutes||4.38 minutes|
|99.995%||26.28 minutes||2.16 minutes|
|99.999%||5.26 minutes||25.9 seconds|
|99.9999%||31.5 seconds||2.59 seconds|
|99.99999%||3.15 seconds||262.97 milliseconds|
|99.999999%||315.569 milliseconds||26.297 milliseconds|
|99.9999999%||31.5569 milliseconds||2.6297 milliseconds|
Looking at the chart above, what is your expected availability? As a database administrator or network administrator, you have to manage user expectations and meet those expectations by measuring server availability and working to meet user expectations.
If you are seeing your server availability is less than optimal, and you want to improve that metric, you have to start measuring server downtime. Start looking at ways to keep your server available more than it is today. If your server is down for one hour this week, you are now at about 99.8%, even if this is the only time the server was down all year.
Start tracking the availability of your servers, because only by measuring something can you really start making it better.