AWR: I/O latency fluctuations
One thing I’ve been wishing was more prominent in OEM or other tools is the displays of the fluctuation in I/O times for Oracle. I/O times can happen because of change in load or hardware issues. Changes in I/O latency are not shown clearly in OEM. Luckily the data is easily to query from the AWR repository. Starting in 11g there is a statistics “Average Synchronous Single-Block Read Latency” that can be queried in v$sysmetric and dba_hist_sysmetric_summary as in
select to_char(begin_time,'DD-MON-YY HH24:MI'), average, maxval from dba_hist_sysmetric_summary where metric_name='Average Synchronous Single-Block Read Latency' order by begin_time;
but for 10g Here is a query that gets the hourly average I/O latency for all User I/O and System I/O:
set pagesize 100 col event_name format a30 col avg_ms format 99999.99 select btime, event_name, (time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0) avg_ms, (count_end-count_beg) ct from ( select e.event_name, to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime, total_waits count_end, time_waited_micro/1000 time_ms_end, Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg, Lag (e.total_waits) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg from DBA_HIST_SYSTEM_EVENT e, DBA_HIST_SNAPSHOT s where s.snap_id=e.snap_id and s.begin_interval_time > sysdate -2 and e.wait_class in ( 'User I/O', 'System I/O') -- and s.dbid=2 -- and s.dbid=e.dbid -- and s.begin_interval_time > to_date('07-NOV-11 13:00','DD-MON-YY HH24:MI') -- and s.begin_interval_time < to_date('07-NOV-11 15:00','DD-MON-YY HH24:MI') order by e.event_name, begin_interval_time ) where (count_end-count_beg) > 0 order by event_name,btime /
18-APR-11 18:00 db file scattered read 8.24 4404 18-APR-11 19:00 db file scattered read 7.85 3256 18-APR-11 20:00 db file scattered read 8.22 3511 18-APR-11 21:00 db file scattered read 7.40 5025 18-APR-11 22:00 db file scattered read 6.01 290640 18-APR-11 23:00 db file scattered read 8.99 6775 19-APR-11 00:00 db file scattered read 22.08 30126 19-APR-11 01:00 db file scattered read 8.63 105273 19-APR-11 02:00 db file scattered read 5.62 24064 19-APR-11 03:00 db file scattered read 8.32 3368 19-APR-11 04:00 db file scattered read 9.51 3218 19-APR-11 05:00 db file scattered read 5.64 28113 19-APR-11 06:00 db file scattered read 7.75 11014 19-APR-11 07:00 db file scattered read 8.80 18040 19-APR-11 08:00 db file scattered read 8.00 11443 19-APR-11 09:00 db file scattered read 5.51 11543 19-APR-11 10:00 db file scattered read 5.48 19038
It would be nice to see these I/O times broken down by histograms. Something to look into with the wait histograms.
Another thing to keep in mind is the IOPs of your database. If the IOPs start to max out the underlying disk storage subsystem then the latencies will suffer. Oracle has these statistics as well
select begin_time, metric_name, average, maxval from dba_hist_sysmetric_summary where metric_name in ('Physical Read Total IO Requests Per Sec','Physical Write Total IO Requests Per Sec') order by begin_time;
A typical rule of thumb is around 100 IOPs per spindle. And if the underlying I/O subsystem is RAID 5 or a variation then shoot for multiply the write IOPs from Oracle by 4 to take into account the extra overhead of RAID 5 (and ilk).