AWR AND ADDM ANALYSIS



                        AWR REPORT ANALYSING                      BY ANAND GUPTA
1.1.Purpose of Document
AWR (Automatic Workload Repository) Reports are very handy during performance tuning of the database. These reports provide clear picture of database side activities. Various useful data is nicely documented in different sections in readable format. It becomes necessity to understand these reports for teams involved in activities like database tuning,performance/load testing or analyzing any performance issue.
Purpose of the document is to enable the reader to understand basic AWR terminology and to use these details for understanding the database performance issues.
1.2.Scope
This document describes how to utilize AWR reports to tune,

·         Physical database instance
·         Identify SQLs which are performing poorly
1.3.Intended audience
This document is intended for:

·         Performance/ Load testing team
·         Oracle DBAs
·         Development team

2.AWR Walk through
AWR reports are divided into various sections. Each section provides specific information about database health. We will go through each of the important sections and will try to explore information captured in these sections.
2.1.Instance information
DB Name
DB Id
Instance
Inst num
Release
RAC
Host
Orcl
11882829
Orcl
1
10.2.0.4.0
YES
XXXXXX

Provides information about the database,
1.     Name of the database
2.     Database ID
3.     Instance Name and Instance Number
4.     Oracle Database Server release version
5.     Whether database is part of RAC?
6.     Machine name hosting the database instance


Snap Id
Snap Time
Sessions
Cursors/Session
Begin Snap:
605
31-Jan-11 17:00:07
433
7.4
End Snap:
606
31-Jan-11 18:00:04
438
7.5
Elapsed:

59.96 (mins)


DB Time:

820.16 (mins)



Before we go through this table, we will see,
2.1.1.How AWR reports are generated?
Starting from Oracle 10g onwards, at every 60 mins database server takes snapshot of a database. Snapshot gathers database health related information at that point of time. This information is stored in Oracle data dictionary and servives database restarts.
Each snapshot is identified by a snapshot ID. In order to generate AWR reports, 2 snapshot IDs are needed. Latest of these snapshots is compared with old snapshot and based on the comparison an AWR report gets generated.
 Above table provides information about,
1.Begin Snapshot ID and End Snapshot ID along with date and time details for the snapshot
2.The Sessions column indicates total no. of database sessions active when the snapshot was taken.
3.Elapsed Time column indicates the duration in mins between 2 snapshots. In this example, the duration is approx. 60 mins
4.DB Time columns represents, total time spent by foreground processes. It includes CPU Time, IO Time and Wait time and excludes Idle wait time and time taken by background processes.
(DB time) / (Elapsed time) gives total no. of database active sessions during that period. In case of above scenario total no. of active sessions are, 820.16 (DB Time) / 59.96 (Elapsed Time), which is 13.67 sessions.
2.2.Cache Sizes

Begin
End


Buffer Cache:
740M
724M
Std Block Size:
8K
Shared Pool Size:
264M
280M
Log Buffer:
6,148K

The section gives information about size details for memory componets
1.     Buffer Cache
2.     Shared Pool
3.     Standard Block Size
4.     Log Buffer
From Oracle 10g onwards, database server does Automatic Memory Management (AMM) for PGA and SGA components. Based on load, database server keeps on allocating or deallocating memory assigned to different components of SGA and PGA. Due to this reason, we can observe different sizes for Buffer Cache and Shared Pool, at the beginning or end of AWR snapshot period.
2.3.Load Profile


Per Second
Per Transaction
Redo size:
1,638,781.89
31,189.97
Logical reads:
309,768.35
5,895.64
Block changes:
9,284.70
176.71
Physical reads:
1,830.10
34.83
Physical writes:
237.61
4.52
User calls:
531.06
10.11
Parses:
481.48
9.16
Hard parses:
47.43
0.90
Sorts:
596.32
11.35
Logons:
0.03
0.00
Executes:
2,546.83
48.47
Transactions:
52.54


This section provides information Per Second and Per Transaction basis. For example it shows,
1.How many Hard Parse are happening
2.How much Redo is generated
3.How many Physical Reads are happening etc
4.How many Sorts are happening
This information is proportionate to the amount of load experienced by the database server during that period.
2.3.1.How load profile information is helpful during load testing?
Ans: Load testing for application release is done to assess the release performance. Load test is first done on baseline version and then performed on the release version. It is recommended to generate AWR reports for baseline as well as for the release load test.
In order to assess performance of release test against baseline test, we can compare the load profile section of AWR for both the tests. This comparison will give slight idea about the release performance. If there are significant differences in the count of baseline and release load profile figures, then that could give some pointers for investigation in case the release load test is bad in terms of performance.
2.3.2.Is it ok to keep the snapshot frequency to default i.e. every 60 minutes?
To analyse the performance issues in detail, it is recommended to generate AWR reports with interval less than 60 min. May be one can generate AWR reports at every 10 minutes only during the load test interval, and later change the frequency to default one.

The following list includes detailed descriptions for particular statistics:
Redo size: The amount of redo generated during this report.
Logical Reads: Calculated as (Consistent Gets + DB Block Gets = Logical Reads).
Block changes: The number of blocks modified during the sample interval.
Physical Reads: The number of requests for a block that caused a physical I/O operation.
Physical Writes: Number of physical writes performed.
User Calls: Number of user queries generated.
Parses: The total of all parses; both hard and soft.
Hard Parses: The parses requiring a completely new parse of the SQL statement. These consume both latches and shared pool area.
Soft Parses: Soft parses are not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse; hence it consumes far fewer resources.
Sorts, Logons, Executes and Transactions: All self-explanatory.
Parse activity statistics should be checked carefully because they can immediately indicate a problem within the application. For example, a database has been running several days with a fixed set of applications, it should, within a course of time, parse most SQLs issued by the applications, and these statistics should be near zero.
If there are high values of Soft Parses  or especially Hard Parses statistics, such values should be taken as an indication that the applications make little use of bind variables and produce large numbers of unique SQLs. However, if the database serves developmental purposes, high vales of these statistics are not bad.
The following information is also available in the workload section:
  % Blocks changed per Read:    4.85    Recursive Call %:    89.89
 Rollback per transaction %:    8.56       Rows per Sort:    13.39
The % Blocks changed per Read statistic indicates that only 4.85 percent of all blocks are retrieved for update, and in this example, the Recursive Call % statistic is extremely high with about 90 percent. However, this fact does not mean that nearly all SQL statements executed by the database are caused by parsing activity, data dictionary management, space management, and so on.
Remember, Oracle considers all SQL statements executed within PL/SQL programs to be recursive.  If there are applications making use of a large number of stored PL/SQL programs, this is good for performance. However, applications that do not widely use PL/SQL may indicate the need to further investigate the cause of this high recursive activity.
It is also useful to check the value of the Rollback per transaction % statistic. This statistic reports the percent of transactions rolled back. In a production system, this value should be low. If the output indicates a high percentage of transactions rolled back, the database expends a considerable amount of work to roll back changes made.  This should be further investigated in order to see why the applications roll back so often.

2.4.Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:
99.29
Redo NoWait %:
100.00
Buffer Hit %:
99.41
In-memory Sort %:
100.00
Library Hit %:
78.19
Soft Parse %:
90.15
Execute to Parse %:
81.10
Latch Hit %:
99.44
Parse CPU to Parse Elapsd %:
9.09
% Non-Parse CPU:
98.03

The section provides information about hit ratios for different memory components. These ratios, tells how often a particular data is found in a respecitive memory structure. Details about hit ratios are as mentioned delow,
Buffer Hit: Shows % of times particular block was found in buffer cache, instead of reading it from disk.
Buffer Nowait: Indicates % of times data buffers were accessed directly without any wait time.
Library Hit%: Shows % of times SQL and PL/SQL found in shared pool.
In-Memory Sort %: Shows %of times Sorting operations happened in memory than in the disk (temporary tablespace).
Soft parse %: Shows % of times the SQL in shared pool is used.
Latch Hit %: Shows % of time latches are acquired without having to wait.
As per Oracle AWR report the target should be 100% for these ratios. But in reality this is not possible always. Hence the ratio above 80% is always healthy.

2.5.Shared Pool Statistics:

Begin
End
Memory Usage %:
91.36
90.18
% SQL with executions>1:
85.32
78.53
% Memory for SQL w/exec>1:
95.71
97.93

% SQL with executions>1: Shows % of SQLs executed more than 1 time. The % should be very near to value 100.
% memory for SQL w/exec>1: From the memory space allocated to cursors, shows which % has been used by cursors more than 1.
The ratio above 80% is always healthy.

2.6.Top 5 Timed Events:
Event
Waits
Time(s)
Avg Wait(ms)
% Total Call Time
Wait Class
cursor: pin S wait on X
963,939
10,958
11
22.3
Concurrency
CPU time

10,932

22.2

db file sequential read
6,301,396
9,923
2
20.2
User I/O
read by other session
7,022,258
5,819
1
11.8
User I/O
row cache lock
1,881,861
3,899
2
7.9
Concurrency

This section shows TOP 5 wait events the processes were waiting on during the snapshot time period. These events are helpful during analysis of any database related performance bottlenecks. The

·         Wait Class, column helps in classifying whether the issue is related to application or infrastructure.
·         Waits, column provides information about no. of wait happens.
·         Time(s), column provides information about total CPU time in seconds spent on the wait.
Important wait events and their causes are explained in detail below,
2.6.1.db file scattered read:
This event indicates wait due to full table scans or index fast full scans.
To avoid this event, identify all the tables on which FTS is happening and create proper indexes so that oracle will do Index scans instead of FTS. The index scan will help in reducing no. of IO operations.
To get an idea about tables on which FTS is happening please refer to “Segment Statistics” -> “Segments By Physical Read” section of AWR report. This section lists down both Tables and Indexes on which Physical Reads are happening. Please note that physical reads doesn’t necessarily means FTS but a possibility of FTS.
2.6.2.db file sequential read:
The event indicates that index scan is happening while reading data from table. High no. of such event may be a cause of unselective indexes i.e. oracle optimizer is not selecting proper indexes from set of available indexes. This will result in extra IO activity and will contribute to delay in SQL execution.
Generally high no. is possible for properly tuned application having high transaction activity.
2.6.3.buffer busy wait:
Indicates that particular block is being used by more than one processes at the same. When first process is reading the block the other processes goes in a wait as the block is in unshared more. Typical scenario for this event to occur is, when we have batch process which is continuously polling database by executing particular SQL repeatedly and there are more than one parallel instances running for the process. All the instances of the process will try to access same memory blocks as the SQL they are executing is the same. This is one of the situation in which we experience this event.
2.6.4. enq: TX - row lock contention:
Oracle maintenance data consistency with the help of locking mechanism. When a particular row is being modified by the process, either through Update/ Delete or Insert operation, oracle tries to acquire lock on that row. Only when the process has acquired lock the process can modify the row otherwise the process waits for the lock. This wait situation triggers this event. The lock is released whenever a COMMIT is issued by the process which has acquired lock for the row. Once the lock is released, processes waiting on this event can acquire lock on the row and perform DML operation.

2.7.Time Model Statistics:
Statistic Name
Time (s)
% of DB Time
sql execute elapsed time
45,168.56
91.79
DB CPU
10,932.03
22.22
parse time elapsed
8,626.43
17.53
hard parse elapsed time
7,616.90
15.48
hard parse (sharing criteria) elapsed time
929.82
1.89
hard parse (bind mismatch) elapsed time
929.20
1.89
PL/SQL compilation elapsed time
731.51
1.49
PL/SQL execution elapsed time
440.62
0.90
sequence load elapsed time
409.34
0.83
repeated bind elapsed time
26.35
0.05
connection management call elapsed time
6.02
0.01
DB time
49,209.52

background elapsed time
1,774.14

background cpu time
448.11


Important statistics here is the DB Time. The statistic represents total time spent in database calls. It is calculated by aggregating the CPU time and wait time of all sessions not waiting on idle event (non-idle user sessions). Since this timing is cumulative time for all non-idle sessions, it is possible that the time will exceed the actual wall clock time.
Objective of tuning oracle database is to reduce the amount of time users spend in performing some action on the database.
This time represents time taken by foreground sessions and not background sessions.
In above example, DB Time is 49,209.52 seconds. 91% of the time is being spent on “SQL execution elapsed time” i.e for SQL execution.
DB CPU represents time spent on CPU resource by foreground user processes. This time doesn’t include waiting time for CPU. DB CPU is contributing to 22% of total DB time.
Important thing to note here is, the actual wall clock time is around 3600 seconds (difference between 2 snapshots) but the DB CPU shown here is 10,932 seconds.
If the server machine (on which database server is running) has more than 1 CPUs, it is possible to have DB CPU greater than actual wall clock time. In this example the database server is hosted on machine with 8 CPUs. Hence a 1 second is divided into 8 parts. Hence DB CPU 10,932 seconds means, 10,932 (DB CPU) / 8 (CPU) = 1366 wall clock seconds.
The “Parse time elapsed” and “Hard parse elapsed time” has taken around 17% and 15% of the total DB time. “Parse time elapsed” represents time spent for Syntax and Semantic checks. Whereas “Hard parse include time” represents time spent for Syntax and Semantic checks PLUS time spent for optimizing the SQL and generating optimizer plan.
2.8.Wait Class:
Wait Class
Waits
%Time -outs
Total Wait Time (s)
Avg wait (ms)
Waits /txn
Concurrency
3,371,424
28.51
16,439
5
17.83
User I/O
13,397,731
0.00
15,930
1
70.87
Commit
235,762
0.00
2,988
13
1.25
Cluster
1,307,191
0.00
1,323
1
6.91
System I/O
208,454
0.00
911
4
1.10
Other
2,018,321
14.36
475
0
10.68
Configuration
26,428
17.80
309
12
0.14
Application
1,643
0.00
34
21
0.01
Network
2,770,033
0.00
21
0
14.65

The events are sorted in descending order of “Total Wait Time (s)” column.
There are over 800 distinct wait events. Oracle has grouped these wait events in 12 wait classes. These wait classes are further divided in 2 categories, Administrative Wait Class and Application Wait Class.
These wait classes gives overall information about whether the waits happening for Application or for System events.
In the example above first 2 rows show that total wait time is higher for Concurrency and “User I/O” wait classes. Though we won’t have much control on reducing concurrency, but we could aim at reducing the User IO.
High User IO means,

·         From the pool of available indexes proper indexes are not being used
·         FTS is happening on big tables with millions of rows
This could give pointers for further tuning.
2.9.Wait Events:
Event
Waits
%Time -outs
Total Wait Time (s)
Avg wait (ms)
Waits /txn
cursor: pin S wait on X
963,939
99.70
10,958
11
5.10
db file sequential read
6,301,396
0.00
9,923
2
33.33
read by other session
7,022,258
0.00
5,819
1
37.15
row cache lock
1,881,861
0.00
3,899
2
9.95
log file sync
235,762
0.00
2,988
13
1.25
gc buffer busy
835,453
0.00
752
1
4.42
log file parallel write
145,027
0.00
716
5
0.77
library cache load lock
17,095
0.00
420
25
0.09
enq: TX - index contention
24,548
0.00
378
15
0.13
latch: cache buffers chains
172,466
0.00
292
2
0.91
kksfbc child completion
5,527
98.72
276
50
0.03
gc current grant 2-way
175,219
0.00
257
1
0.93
library cache lock
78,169
0.00
195
2
0.41
enq: SQ - contention
17,467
0.00
194
11
0.09
gc cr grant 2-way
141,816
0.00
148
1
0.75
gc current block 2-way
103,694
0.00
115
1
0.55
enq: HW - contention
4,142
0.00
112
27
0.02
db file parallel write
44,214
0.00
92
2
0.23
db file scattered read
31,617
0.00
89
3
0.17
latch: shared pool
81,395
0.00
87
1
0.43
log file sequential read
3,652
0.00
80
22
0.02
latch: row cache objects
40,058
0.00
76
2
0.21
latch: library cache
19,479
0.00
65
3
0.10
direct path write
18,866
0.00
53
3
0.10
gc cr multi block request
49,879
0.00
50
1
0.26
buffer busy waits
25,287
0.00
42
2
0.13
latch: session allocation
26,429
0.00
39
1
0.14
direct path read
21,708
0.00
39
2
0.11
enq: TX - row lock contention
914
0.00
34
37
0.00
ksxr poll remote instances
1,570,668
13.71
23
0
8.31
reliable message
286
5.59
22
78
0.00

The wait events are sorted on “Total Wait Time (s)” column in descending order. The idle events are listed down in the end. First 10 to 15 events should be looked into because rest of the events are idle events and can be ignore. These events are related to foreground processes.

2.10.Background Wait Events:
Event
Waits
%Time -outs
Total Wait Time (s)
Avg wait (ms)
Waits /txn
log file parallel write
145,047
0.00
716
5
0.77
db file parallel write
44,226
0.00
92
2
0.23
log file sequential read
3,659
0.00
81
22
0.02
events in waitclass Other
1,651,179
17.00
71
0
8.73
row cache lock
10,726
0.00
35
3
0.06
control file sequential read
9,724
0.00
14
1
0.05
latch: row cache objects
9,547
0.00
13
1
0.05
control file parallel write
1,497
0.00
6
4
0.01
gc current block 2-way
3,159
0.00
5
2
0.02
library cache lock
1,464
0.00
4
2
0.01
latch: shared pool
3,523
0.00
4
1
0.02
Log archive I/O
3,642
0.00
3
1
0.02
db file sequential read
862
0.00
3
4
0.00
cursor: pin S wait on X
137
100.00
2
12
0.00
library cache pin
47
0.00
1
12
0.00
os thread startup
8
0.00
0
62
0.00
latch: cache buffers chains
70
0.00
0
7
0.00
gc current grant 2-way
120
0.00
0
3
0.00
gc cr grant 2-way
323
0.00
0
1
0.00

The wait events are sorted on “Total Wait Time (s)” column in descending order. The idle events are listed down in the end. First 10 to 15 events should be looked into because rest of the events are idle events and can be ignore. These events are related to background processes.
2.11.SQL Statistics:
This is a very important section. This section lists down TOP SQL details based on,

·         Elapsed Time
·         CPU Time
·         No. of Gets
·         No. of Reads
·         No. of Executions etc.
2.11.1.SQL ordered by Elapsed Time:

Elapsed Time (s)
CPU Time (s)
Executions
Elap per Exec (s)
% Total DB Time
SQL Id
SQL Module
SQL Text
7,678
503
1,004
7.65
15.60
XXXXXXXX
select MOD(NVL(translate(B.XXX...
1,202
97
1
1202.04
2.44
XXXXXXXX
declare XXX_XXXX XXX_XXX...
905
89
70
12.93
1.84
XXXXXXXX
select ID , TO_CHAR(XXXX,...
88
11
41,246
0.00
0.18
 
select l.col#, l.intcol#, l....
59
11


0.12
 
select /*+ INDEX(ol$ ol$signat...
55
2
2
27.37
0.11
XXXXXXXX
select xxxx.xxxxxxx...
54
16
57,158
0.00
0.11
 
select owner#, name, namespace...
43
5


0.09
 
select col#, intcol#, reftyp, ...
37
10
20,391
0.00
0.08
 
select /*+ index(idl_ub2$ i_id...
35
2
1
35.01
0.07
XXXXXXX
SELECT DISTINCT XX.XXXXXX...

The section provides TOP SQLs sorted in descending order by, Elapsed Time(s).
Elapse per Exec (s): Elapse time in seconds for per execution of the SQL.
Executions: Total no. of executions for the SQL during the two snapshot period.
Elapsed Time: This is calculated by multiplying Elapse per Exec (s) by Executions.
% Total DB Time: % of DB time utilized by the SQL.
SQL Module: Provides module detail which is executing the SQL. Process name at the OS level is displayed as SQL Module name. If the module name starts with any of the names given below, then don’t consider these SQLs for tuning purpose as these SQLs are oracle internal SQLs,

1.      DBMS
2.      sqlplusw
3.      TOAD
4.      rman
5.      SQL
6.      Enterprise Manager
7.      ORACLE
8.      MMON_SLAVE
9.      emagent etc…
This section is very important in terms of tuning perspective as it provides information about SQLs which need tuning.
During load testing one should compare this section for Release load test and with baseline load test. This comparison will give difference about which SQLs performed badly during baseline and release load test.
Looking at the column, Elapse per Exec (s), one can understand which SQLs need tuning. Normally SQLs taking time above 1 second are candidate for tuning.
For other sections like, SQL ordered by Gets, SQL ordered by CPU Time etc, the SQLs appearing under SQL ordered by Elapsed Time are listed most of the time. Hence this section is worth investigating.
Below is the brief description about each section,
·         SQL ordered by CPU Time: SQLs are listed based on CPU time taken and are ordered by, CPU Time (s) column.
·         SQL ordered by Gets: SQLs are listed based on Buffer Reads (Buffer Gets) for the SQL and are ordered by, Buffer Gets, column.
·         SQL ordered by Reads: SQLs are listed based on Physical Reads (Disk Reads) for the SQL and are ordered by, Physical Reads, column.
·         SQL ordered by Executions: SQLs are listed based on total no. of executions for the SQL and are ordered by, Executions, column.
For each of these sections, to get the complete SQL Text details, click on the SQL ID listed under SQL ID column.
2.12.Instance Activity Stats - Thread Activity
Statistic
Total
per Hour
log switches (derived)
2
2.00

This section provides information about how many Redo Log Switches are happening. The “Total“ column provides information for log switches during the snapshot period and “per hour” column provides information per Hour basis.
2 log switches per hour is optimal value as per oracle standards.



Comments

  1. Hey Anand! This is very informative blog. I now have got a clearing understanding of AWR Reports after reading your blog.

    Please keep writing more.

    ReplyDelete

Post a Comment

Popular posts from this blog

Query for Tablespace usage with Autoextend

How to configure multiple Oracle listeners

sheel script - automatic tablespace addition in oracle 11g