People have been asking me recently, “what is the best enterprise database monitoring software?”
Of course for Oracle there is OEM but what if OEM doesn’t find the problem? What if OEM breaks? ( I’ve blog before on how OEM can break and all the kings men don’t seem to be able to make him work again) What if one wants to access the performance data but the database is down? (Grid control only has an anemic modified and transformed extract from AWR). What if only DBAs have access to OEM and the developers want access to performance information and not only access but safe access in a user friendly and manager friendly interface? (I’m a strong believer in giving OEM access to developers but unfortunately the OEM interface is setup as a DBA interface with potentially risky access to database actions and is not a safe read only developer browsing interface). What if one wants to monitor multiple database platforms in the same interface?
In enterprise database monitors I look for a dashboard based on wait time and CPU statistics. I want to see CPU and wait statistics correlated in a way that I can easily see the load on the database and the break down of the time by wait class and CPU. The main databases are instrumented with wait events such as Oracle, Sybase, SQL Server and DB2 (DB2 wait interface support is in 9.7, before 9.7 the statistics were less useful).
I only know of 3 cross platform enterprise database monitoring products that follow such a methodology: Confio Ignite, Quest Performance Analyzer and Precise Indepth/I3.
Precise Indepth/I3 product has fallen off my radar. I haven’t heard much from them and the last time I talked to them they said that installing the product required a consultant. That’s a show stopper for me. Quest’s as well is falling off my radar, not for lack of technology but more a lack of focus by the company. The product “Performance Analyzer” doesn’t even show up in the top 10 hits on Google. My guess is that they have rolled the product under the hood of Foglight and sell it as a Foglight option, which means more money and more complexity which are all drawbacks. As far as technology goes, “Performance Analyzer” was pretty cool and had nice dash boards but probably the biggest drawback was the product required binaries to be installed on each and every target (at least for Oracle) which can turn into a maintenance nightmare. Well, who else is out there? The other contender is Confio. One of Confio’s great advantages is that they only do (at least for most of their history) monitoring. There is clear focus and enthusiasm. It’s such a refreshing change from Quest and Precise (and if you are Oracle only, then OEM).
So let’s look at Confio. One of the newest and most exciting things at Confio is the feature of monitoring VMware statistics. VMware has a hundreds of statistics ( I once ran esxtop in batch mode and got 24,000 columns, yes 24,000). The statistics are shown in nice graphs in ESX vSphere and vCenter, but the statistics in the graphs have to be chose from lists and the number of lines in the graphs can become overwhelming, but the worst part is the lack of correlation between statistics of different types such as CPU, I/O and network which are on different graphs. Finally there is no way to correlate the VMware statistics with the Oracle database, until now with Confio.
Here is a Confio enterprise dashboard where I can see my databases. The databases are grouped in this image by those on VMware and those not:
If I drill down onto a database running on VMware I get not only the classic load chart on the database but also correlated graphs from the OS and VMware:
The top graph is the load on the Oracle database grouped by top SQL based on active time in the database (which includes wait time and CPU. Below this graph are 3 other graphs:
DB Instance
Signal Waits Percent
Instance CPU Utilization
VM/OS
VM CPU Usage
VM CPU Ready Time
O/S CPU Queue Length
Physical Host
Host CPU Usage
Now with these graphs on the same page I can easily make correlations. I can see a spike in my SQL load at noon.
I can correlate this spike in database load with the three graphs.
1. The “DB Instance” graph shows a spike in the CPU used by Oracle as seen in the “DB Instance” graph. The “DB Instance” graph also shows “Signal Waits Percent” which is a fancy way of saying Oracle is waiting for CPU – ie there is CPU contention.
2. The “VM/OS” graph shows CPU usage going up and “CPU Ready Time” going up. The statisitics “CPU Ready Time” is an important statistics for VMware, yet it’s not well documented. The statisitc “CPU Ready Time” is shows how much time the VM wanted CPU from the host but couldn’t get it.
3. The “Physical Host” graph shows that there was a spike in CPU used at the HOST level across all the VMs on that host.
Additionally there are event notifications of changes on the system such as adding a new VM to the host. Note the grey circles with arrows. Pass your mouse over the event icon to get information about the event.
Grey circles are events on other VMs, blue circles are events on this VM.
I find the ability to see all the layers in one view, see the important statistics only and be able to correlate all these statistics invaluable.
On top of the additional VMware monitoring option as seen anbove, Confio offers the classic view of the database load view through different aggregation groupings :
SQL
Waits
Programs
Machines
DB Users
O/S Users
Files
Plans
Objects
Modules
Actions
Clicking the Object tabs gives a different perspective
The above charts are large granularity but one can zoom down to as small as 10 second intervals:
Ignite also notifies when it finds performance issues:
Drilling down on alerts will point out such useful things as a SQL statement that has had an execution plan change for the worse:
Summary
The above are a few of my first impressions of Confio’s Ignite. Ignite seems to fill a clear need in the industry for enterprise cross database platform monitoring including the unique additional ability to monitor VMware.
If you are on Oracle only, then it is a cheaper alternative to OEM and if you have OEM already then Ignite is a good complement. One attractive feature of Ignite is that all the data is collected into a centralized database allowing one to easily run custom queries and query across multiple databases. Most importantly Ignite gives safe access to database data to managers and developers – the people who should actually be seeing and understanding database performance.
You can take Confio Ignite for a spin at:
login as demo, demo
PS: please share any experiences you have had with the product in the comments. Thanks
Comments