This is the second installment in my series on fine-tuning your Exchange server for better performance.
In my last installment I looked at evaluating your memory to see whether improvements could be made there. Now that I've brought your memory up to par, let's take a look at the next components of your system that impact performance: disk and CPU usage.
Disk Usage
The CPU is an often-used benchmark of performance ("400MHz," "500MHz," etc.) but as with memory, a heavy CPU load can actually be caused by bottlenecks or inefficiencies in your storage system rather than by a deficiency in your CPU. One of the terms you should get familiar with is "spindle." A spindle is a term used to refer to a physical hard drive. As you probably know, a hard drive is basically a stack of platters that spin around a vertical axisroughly similar to a phonograph or CD player. The read/write heads are attached to the end of an arm (or stack of arms), which move in and out as the platters spin beneath them. The vertical axis is referred to as the "spindle." You can have a spindle partitioned into multiple, logical drives (C, D, etc.); and thanks to volume sets and RAID, you can have multiple spindles grouped together as a single logical drive.
This is important because one of the ongoing discussions in the Exchange community revolves around the placement of files. Exchange has several kinds of data files: the pub and priv.edb files, which contain your public and private folders, respectively; the dir.edb file, which contains your directory; the log files that contain records of the transactions in your database; and so forth.
The ideal Exchange setup includes having the operating system and pagefile installed on one spindle, a mirror set (RAID 1) of two separate spindles (addressed as a single logical drive) dedicated to the Exchange transaction log files, and a stripe set of three or more spindles (also as a single logical drive) that contains all of the other Exchange components and databases. To do that requires a server with at least six physical drives in it, but it is the optimum configuration for efficiency.
Must you have a server like that? No. You can install and run Exchange on a server with just a single physical drive, but that's far from ideal. You'll suffer performance penalties and without any redundancy, when that drive fails, your backups will become your only lifeline.
Why do you want log files on a separate drive? Because they're written sequentially and having them on their own physical drive (or mirror) means that the drive heads don't have to leave the sequence. On their own drive, they will always be positioned properly to write the next piece of data so they won't have to seek. If you have the log files on the same physical drive (even if it's a different partition) as the rest of the Exchange data, then the drive heads have to continually seek back and forth to write log files, read mailbox data, write more log files, write some public folder data, etc. Optimizing the performance of your log files is important because data is written to the log files first and posted to the databases later.
OK, you say, but you've already got your Exchange server up and running, and the log files are on the same drive as your Information store. What can you do about it at this point? Glad you asked. If you rerun the Performance Optimizer tool you'll have the opportunity to specify which hard drives to store the log files on. After it asks you some fairly basic questions about how many users you expect to have and what functions you expect the server to perform, the Optimizer does some analysis of your machine and then presents you with a screen that shows which drives and directories it recommends that you locate the files on. You can accept its recommendations or edit them to put the files in whichever drives and directories you wish.
Checking the Performance Monitor Counters
Now that you've got your file organization as optimal as possible let's look at the performance of your hardware. In order to monitor the important disk counters with Performance Monitor, you'll need to enable the disk performance statistics driver (DiskPerf). You do this by dropping out to a command (or DOS) prompt and typing DISKPERF –Y at the prompt.
NOTE: If you're using software RAID, you'll need to use the
"–YE" switch instead of "–Y" to enable DiskPerf correctly.
After you've done that you'll need to restart your server so that the DiskPerf service can be started. Once that's done, start Performance Monitor. There are a couple of counters that you'll want to keep an eye on.
Under the Logical object, select the Current Disk Queue Length counter. Be sure to select each of the instances for your various logical drives in order to get a complete picture. This counter measures how many disk operations are waiting to be performed on the logical drive. This value should never average more than 2 under normal circumstances, although the volume that contains your .edb files is a notable exception. Because Exchange caches data to be written to the database until there is
enough data to be efficiently saved to disk, you may well see spikes in disk access every 30 seconds or so. The Current Disk Queue Length counter may reach as high as 64 during these spikes, which is normal. The key here is to make sure that the average queue length isn't too high. The way you calculate that is by taking the number of spindles you have and dividing that by two. That means that if you have a six-drive RAID 5 array the maximum acceptable average Queue Length should be around 3. Anything higher than that would indicate a bottleneck in your storage system.
The next counter you should check is the %Disk Time counter. This gives you an idea of how busy your drives are. As a general rule, for databases like Exchange, an average higher than 90 percent indicates that the storage system is too busy and needs to either have less of a load or more speed. Ideally this value should be under 50 percent, but a system like Exchange that is constantly reading and writing log files and databases can be forgiven a little higher utilization.
The next counters to take a look at are the Avg. Disk Bytes/Transfer and Disk Bytes/Sec counters. These tell you a little about how efficiently your system is running. On both of these counters, bigger numbers are better. These are good counters to check periodicallyespecially if you're noticing a slowdownto see if they change with time. These are also good counters to check before and after you upgrade your hardwareagain to see how that change has affected the performance of your disk system.
The final basic counter that I recommend you look at is the Free Megabytes counter. This counter tells you how much free space each logical drive has. In fact, I advise you to set an alert on this counter so that if it falls below a certain threshold you get a message warning you that action needs to be taken. An Exchange server that runs out of disk space (either for databases or logs) will shut down unceremoniously, which won't please your users.
If you don't have a fast system you might want to turn DiskPerf off when you're done monitoring the hard drive, as it does cause a slight performance hit. To turn it off, just drop back out to a command prompt and type DISKPERF –N. It will be turned off after your next reboot. If you have a fast machine or aren't concerned with the very slight load of DiskPerf, you can leave it on to make future monitoring easier.
What can you do if these counters indicate that your storage system is insufficient? Well, there are two basic choices: either reduce the load on the server or upgrade your hardware.
Reducing the Load
To reduce the load on your server remove any non-necessary services and move any reasonably portable services off to other servers. As an example, if you're using your Exchange Server as a WINS and/or DHCP server but the drives are overloaded, you should consider moving those services to a different server. Exchange likes to have a server box all to itself whenever possible, so if you have it sharing a server with SQL, Proxy, or other services, and your monitoring indicates an overload, you should consider either moving the Exchange service to a different server or moving the other services.
Upgrading the Hardware
Hardware upgrades can come in a few different forms. First, you can upgrade your controller card. If you are using software RAID to run your RAID 1 or RAID 5 array, consider getting a controller card that can do it in the hardware instead. Caching controllers can also help alleviate some of the load from your server, but be careful with cards that cache writesa system failure or outage before all of the writes have been committed to disk could cause data loss or corruption. As if a UPS power backup wasn't essential already, it's even more so when you have a caching controller on board.
If you have both RAID 5 and RAID 1 arrays on the system, you may find that running each array off its own, separate controller card relieves some bottleneck.
If you're running RAID 5, you may find some performance improvements by adding more physical drives (spindles) to the array.
Regardless of the flavor of RAID you're using, if your drives are older you may find some performance improvements by replacing them with newer, faster drives featuring lower seek times (especially for your database volumes) and higher data transfer rates.
CPU Usage
If you have enough memory and your storage system is performing well, you may yet find that your system is CPU-boundin other words, it just doesn't have the processor horsepower to perform optimally. If that's the case, you have basically two options: reduce the load on the server or upgrade your hardware.
First, I'll show you what you should be checking to see if your CPU is overloaded; then I'll mention the possible remedies.
The one crucial Performance Monitor counter to monitor is the Processor: Percent Processor Time counter. This indicates what the basic load on your processor isthat is, what percentage of the time it spends executing non-idle threads. As you might expect, this tends to be a bursty counter: activity comes and goes, spiking the counter as high as 100 percent on occasion and then dropping it back to 0 percent moments later. The importing statistic to monitor here is the average. Conventional wisdom says that anything over 75 percent is a bottleneck; in the quest to get faster you should take reasonable steps to reduce this usage as much as possible.
If you do see abnormally high processor usage, it may be helpful to try and track down which process is hogging the CPU. It may well be that a process has gone awry and needs to be fixed or restarted, or it might just be a CPU-intensive process that really doesn't need to be run, or that can be moved to a less-loaded server. One technique for finding out which process is eating your CPU cycles is to open the Process Object and look at the %Processor Time counters for each instance (except Idle) to see which one is running high.
NOTE: Don't be concerned if you look at the "_Total" instance and it shows usage at or near 100 percent. That's because _Total includes the Idle process, which marks when the CPU isn't doing anything at all.
However, an easier way to finding out which process is eating your CPU cycles is to hit Ctrl+Alt+Del to pop up the NT Security dialog box and select Task Manager. On the Processes tab, scan down the CPU column to see which process(es) are using the CPU heavily.
TIP: If you happen to have a multiprocessor system, under NT 3.x or NT 4, you may want to make one Registry change to improve the way that Performance Monitor displays the processor counters. Change the HKEY_CURRENT_USER\Software\Microsoft\PerfMon key from 1 to 0. When that key is set to 0 the CPU usage counter is capped at 100. When you have a multiprocessor system, the usage of the CPUs is summedwhich means that if you've got four processors and each is loaded 50 percent, you're going to see the counter pegged at 100 percent, which isn't very informative. By changing that value to 0 you allow the counter to rise to the necessary level, which means that four processors, each loaded 50 percent, register as 200 percent usage in the Performance Monitor. Now you know that you effectively have two processors fully loaded.
The second counter you should be looking at is the System: Processor Queue Length counter. This shows you how many instructions are piled up, waiting to be executed on the system. Like the disk queue it's generally wise to keep this value (average) under 2.
If these counters indicate that your server is too heavily loaded, your options are basically the same as your options for an overloaded disk systemeither reduce the load on the server or upgrade your hardware.
We've already discussed reducing the load. It involves shutting down unnecessary services and offloading necessary, but portable, services to other servers wherever possible. If you've already done that, you may just have too much usage for this Exchange server to handle all by itself. You might consider adding a second Exchange server and offloading some of the tasks (connectors and/or mailboxes) to that other server.
Upgrading the hardware is another option. You can replace your existing CPU with a faster one (although depending upon your system, you may have to replace the entire motherboard), add one or more additional CPUs to assist the first one, or just replace the entire Exchange server with a more powerful model.
Note that if you choose to add additional CPUs it's not quite as simple as plugging in the CPU and booting the server. You have to tell NT about that second CPU by using the UPTOMP.EXE utility. For more information on that process, I encourage you to step next door and visit my colleague L.J. Johnson, the Windows NT/2000 Pro.
In this installment I've examined your storage subsystem and your processor; in Part III I'll take a look at your network and start to examine some of the Exchange core components for how they can be fine-tuned.