So CPU is spiked and it's time to upgrade your web server to a new one. Or is it really a problem with CPU?
In many cases finding what the particular performance issue is with a website can be complicated. There can be several layers to a web application. For example:
Http request comes in for ASP.NET web page.
Logic is ran, checks for cached file on the disk.
If there is no cache, it will then go over the network to make a request to SQL server.
- SQL Server will check the database on disk for the data and return back through the network.
- The ASP.NET web page gets a response back from SQL, writes to a local cache, and responds back to the requesting client.
It even gets more complex in high availability environments. At that point you need to isolate any load balanced servers and individually check each one to see if the problem isn't with the overall application, but one of the servers.
Now throw in a web service layer, objects that connect to another database, maybe Active Directory, and things even get more complex.
Say for our above example, how resources are being impacted?
- Network layers, for both HTTP, and the SQL Server.
- Disk IO for writing to the cache, reading the ASP.NET page, even IIS for writing to the W3C compliant logs.
- Disk IO for the SQL Server
- Of course memory and CPU resources.
Beyond the obvious logic issues, where many developers get caught up in, some may see CPU utilization spike and think the machine needs an upgrade, but is this always the case?
Let's talk about some real-world examples I have encountered when I was running a small online business. One of my challenges was dealing with the huge amount of traffic we were generating, with a limited budget for resources. In this scenario, CPU utilization was spiking. If I'm going to throw some money at something, I had to be sure I was throwing it at the right resource.
First thing to do was to run some performance metrics. Performance monitor is your friend. I needed to find out what was going on at every level of the transaction. Counters were ran to gauge the actual traffic. By running the Current Anonymous Users counter on that web instance I was able to see how many users were connecting to the machine at that specific time. In addition, to that counter, we also added CPU utilization, disk I/O, page faults (for memory issues). Then we ran the same counters on the back end SQL box.
It appeared there weren't too many connections on the SQL box, but CPU utilization continued to spike. After evaluating the counters, we determined that disk utilization on the SQL box was spiked, and almost flat-lined the entire time. It turned out all that CPU spiking was being caused by SQL having to wait for the disk to respond to the request.
After replacing the disks to faster drives, and separating out the database to multiple spindles, everything screamed and responded the way it should. Money was placed where it was needed, and wasn't blindly put somewhere to respond to the symptom, and not the cause.
What's this example teach us? Before treating the symptom, an administrator needs to due some research first and collect data before bringing it up to their manager, purchasing resources, or whatever your first response may be.