DataDog – the marriage of ASH, Query Stats and UI
I’m so happy to be working at Datadog ( and we are hiring so ping me if interested. Looking for Python developers who are interested in databases. Next up will be work on deep Oracle monitoring).
Datadog has brought together my favorite parts of database performance monitoring
Average Active Sessions
I’ve long been an evangelist of the Average Active Sessions (AAS) view of database performance but I’ve wanted the AAS approach married with rich SQL execution metrics like one would see in VividCortex (now part of Solarwinds).
I also love rich powerful graphics like Datadog has. I recall seeing Datadog presentations years ago at conferences and thinking “wow, I’d like to work on database performance visualizations with their graphical interfaces”. At the time Datadog only had cursory database monitoring capabilities, but starting a little over a year ago they kicked off a project to heavily invest in deep database monitoring.
Now Datadog is bringing it all together – great graphics, active session driven analysis supplemented with rich SQL execution Metrics.
First lets look at the instances listing page. I love that this page has the little Database Load chart as a column – kind of a spark chart. It gives a quick idea of the differences between instances in the fleet.
There is also flyover on the load chart widget
This is really cool – in our staging environment we have lots of instances with load being built, loaded, then scrapped. Now I can see the old instances that have been shut down (those with the bars on the left) and then new instances with load starting ( bars on right)
If I see an instance of interest I’d just click on that instance and go to the instance detail page:
Here is the instance Detail Page.
The top two graphs give queries and latency showing the averages over the last hour compared to same hour yesterday compared to same hour over the last week. Pretty cool.
Then there is the ubiquitous load chart.
Then what’s really cool is the top SQL has not only the load but also detailed SQL metrics. It also shows how many users are currently executing the top SQL and how long those current executions are
if I see a query of interest I’d just click on that query and go to the query detail page.
One super cool thing is Datadog tracks the execution plans for a statement over time, so I can see if the plan changes.
The top 3 charts compare metrics of this query to the total of the queries on the system. Just clicking on “all other queries” will remove the other query metrics so the chart just shows metrics for the query on the page.
The same query can be run on more than one instance “Host Running this Query” will show other instances running this query.
Finally I can click on the “metrics” tab in the middle of the screen to get access to a full set of query metrics