top of page
  • Writer's picturekyle Hailey

Portfolio : visualization of quantitative data


Almost all my professional connections know me as a database person but really my core passion is a cross between visual and quantitative. I like visualizing quantitative data in a way that makes the data easy and ideally fun to understand.



Importance of rethinking data visualization


Here is a short (15 min) presentation I did on pitfalls and solutions for displaying quantitative data effectively.


I go over some of the challenges to visualizing performance data, go over Edward Tufte's visual analysis of space shuttle disaster from his 2nd book, and go over a solution dashboard for a complex system (Oracle database performance).




Growth of ChatGPT


When OpenAI released ChatGPT end of 2022 , the growth over the next month was insane but when I saw graphics about the growth, they all failed to show how insane and novel the growth rare of ChatGPT was, for example



So I redesigned a chart to emphasize the growth and this visualization went viral on twitter:




Database load


What I'm most known for is database performance and that is the area I've made the most impact on in the industry. I redesigned Oracle's performance monitoring dashboard from a confusing overwhelming array of 100s of pages and charts to one page with one graph and two supporting charts.

The main graph immediately answers the question : "what is the load on the database? Is it idle, doing well, or bottlenecked?" Looking at the chart, if it's empty the database is idle. If the load is above the dashed line it is bottlenecked. The higher above the line the more bottlenecked. In between it's doing fine.

THis simplicity and power made this the defacto visualization of database performance that went on to be used at Amazon RDS and copied for Google Cloud SQL.

The colors tell the type of load : CPU is green, IO is blue and red is locks.

The two supporting charts tell what commands are causing the load (SQL statments) and what users are causing the load.

This interface is monitoring millions of databases world wide.





Visual SQL Tuning (VST)


The most challenging problem in databases is tuning a complex SQL statement. People can spend days are weeks analyzing a core business SQL statement.

I designed a visualization of SQL statements that allows a users to first find the optimal join path of a SQL statement in seconds. The join path is the most important piece of optimizing an SQL statement. The chart also points out if the SQL is even viable , i.e. could it ever finish in a reasonable time and the charts point out visually if there are errors like cartesian joins.


Here is a video where I explain the theory, application and usage of the VST chart








IO latency : graphing histograms over time


Finally one area that I always wanted to to more work on is showing histogram data over time. Brendan Gregg popularized heat charts which are basically displaying histogram data over time:



I was always uncomfortable undestanding the heat charts. Sure the looked excitingly cool but I always had problems explaining what they were showing.

At one point I was tasked with analysing the performance of IO arrays. Part of the analysis was looking at latency over different work loads like over increasing # of users (or I/O threads) or I/O read sizes or reads only or writes only or mix of reads and writes.


I wanted to see the latency histograms at each load. Latency under 100us was generally from local cache on the machine using the I/O array. IO under 1ms was generally I/O for cache on the disk array. I/O over 1ms was generally I/O from disk. THis information was important depending on what we were testing for. For example any test that had high I/O from the host using the array was pointless as those I/O were avoiding the disk array this we would be testing the host memory instead of the disk array.


So my idea was to show the historgrams vertically instead of horizontally, over lay the average I/O time as a line and color code the histogram buckets such that blue was coming from memory, green was "good" I/O latency and yellow, orange and red were increasingly bad I/O latency







2 views0 comments

Comments


bottom of page