AWR AND ADDM ANALYSIS

AWR REPORT ANALYSING BY ANAND GUPTA

1.1.Purpose of Document

AWR (Automatic Workload Repository) Reports are very handy during performance tuning of the database. These reports provide clear picture of database side activities. Various useful data is nicely documented in different sections in readable format. It becomes necessity to understand these reports for teams involved in activities like database tuning,performance/load testing or analyzing any performance issue.

Purpose of the document is to enable the reader to understand basic AWR terminology and to use these details for understanding the database performance issues.

1.2.Scope

This document describes how to utilize AWR reports to tune,

· Physical database instance

· Identify SQLs which are performing poorly

1.3.Intended audience

This document is intended for:

· Performance/ Load testing team

· Oracle DBAs

· Development team

2.AWR Walk through

AWR reports are divided into various sections. Each section provides specific information about database health. We will go through each of the important sections and will try to explore information captured in these sections.

2.1.Instance information

DB Name	DB Id	Instance	Inst num	Release	RAC	Host
Orcl	11882829	Orcl	1	10.2.0.4.0	YES	XXXXXX

Provides information about the database,

1. Name of the database

2. Database ID

3. Instance Name and Instance Number

4. Oracle Database Server release version

5. Whether database is part of RAC?

6. Machine name hosting the database instance

	Snap Id	Snap Time	Sessions	Cursors/Session
Begin Snap:	605	31-Jan-11 17:00:07	433	7.4
End Snap:	606	31-Jan-11 18:00:04	438	7.5
Elapsed:		59.96 (mins)
DB Time:		820.16 (mins)

Before we go through this table, we will see,

2.1.1.How AWR reports are generated?

Starting from Oracle 10g onwards, at every 60 mins database server takes snapshot of a database. Snapshot gathers database health related information at that point of time. This information is stored in Oracle data dictionary and servives database restarts.

Each snapshot is identified by a snapshot ID. In order to generate AWR reports, 2 snapshot IDs are needed. Latest of these snapshots is compared with old snapshot and based on the comparison an AWR report gets generated.

Above table provides information about,

1.Begin Snapshot ID and End Snapshot ID along with date and time details for the snapshot

2.The Sessions column indicates total no. of database sessions active when the snapshot was taken.

3.Elapsed Time column indicates the duration in mins between 2 snapshots. In this example, the duration is approx. 60 mins

4.DB Time columns represents, total time spent by foreground processes. It includes CPU Time, IO Time and Wait time and excludes Idle wait time and time taken by background processes.

(DB time) / (Elapsed time) gives total no. of database active sessions during that period. In case of above scenario total no. of active sessions are, 820.16 (DB Time) / 59.96 (Elapsed Time), which is 13.67 sessions.

2.2.Cache Sizes

	Begin	End
Buffer Cache:	740M	724M	Std Block Size:	8K
Shared Pool Size:	264M	280M	Log Buffer:	6,148K

The section gives information about size details for memory componets

1. Buffer Cache

2. Shared Pool

3. Standard Block Size

4. Log Buffer

From Oracle 10g onwards, database server does Automatic Memory Management (AMM) for PGA and SGA components. Based on load, database server keeps on allocating or deallocating memory assigned to different components of SGA and PGA. Due to this reason, we can observe different sizes for Buffer Cache and Shared Pool, at the beginning or end of AWR snapshot period.

2.3.Load Profile

	Per Second	Per Transaction
Redo size:	1,638,781.89	31,189.97
Logical reads:	309,768.35	5,895.64
Block changes:	9,284.70	176.71
Physical reads:	1,830.10	34.83
Physical writes:	237.61	4.52
User calls:	531.06	10.11
Parses:	481.48	9.16
Hard parses:	47.43	0.90
Sorts:	596.32	11.35
Logons:	0.03	0.00
Executes:	2,546.83	48.47
Transactions:	52.54

This section provides information Per Second and Per Transaction basis. For example it shows,

1.How many Hard Parse are happening

2.How much Redo is generated

3.How many Physical Reads are happening etc

4.How many Sorts are happening

This information is proportionate to the amount of load experienced by the database server during that period.

2.3.1.How load profile information is helpful during load testing?

Ans: Load testing for application release is done to assess the release performance. Load test is first done on baseline version and then performed on the release version. It is recommended to generate AWR reports for baseline as well as for the release load test.

In order to assess performance of release test against baseline test, we can compare the load profile section of AWR for both the tests. This comparison will give slight idea about the release performance. If there are significant differences in the count of baseline and release load profile figures, then that could give some pointers for investigation in case the release load test is bad in terms of performance.

2.3.2.Is it ok to keep the snapshot frequency to default i.e. every 60 minutes?

To analyse the performance issues in detail, it is recommended to generate AWR reports with interval less than 60 min. May be one can generate AWR reports at every 10 minutes only during the load test interval, and later change the frequency to default one.

The following list includes detailed descriptions for particular statistics:

Redo size: The amount of redo generated during this report.

Logical Reads: Calculated as (Consistent Gets + DB Block Gets = Logical Reads).

Block changes: The number of blocks modified during the sample interval.

Physical Reads: The number of requests for a block that caused a physical I/O operation.

Physical Writes: Number of physical writes performed.

User Calls: Number of user queries generated.

Parses: The total of all parses; both hard and soft.

Hard Parses: The parses requiring a completely new parse of the SQL statement. These consume both latches and shared pool area.

Soft Parses: Soft parses are not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse; hence it consumes far fewer resources.

Sorts, Logons, Executes and Transactions: All self-explanatory.

Parse activity statistics should be checked carefully because they can immediately indicate a problem within the application. For example, a database has been running several days with a fixed set of applications, it should, within a course of time, parse most SQLs issued by the applications, and these statistics should be near zero.

If there are high values of Soft Parses or especially Hard Parses statistics, such values should be taken as an indication that the applications make little use of bind variables and produce large numbers of unique SQLs. However, if the database serves developmental purposes, high vales of these statistics are not bad.

The following information is also available in the workload section:

% Blocks changed per Read: 4.85 Recursive Call %: 89.89

Rollback per transaction %: 8.56 Rows per Sort: 13.39

The % Blocks changed per Read statistic indicates that only 4.85 percent of all blocks are retrieved for update, and in this example, the Recursive Call % statistic is extremely high with about 90 percent. However, this fact does not mean that nearly all SQL statements executed by the database are caused by parsing activity, data dictionary management, space management, and so on.

Remember, Oracle considers all SQL statements executed within PL/SQL programs to be recursive. If there are applications making use of a large number of stored PL/SQL programs, this is good for performance. However, applications that do not widely use PL/SQL may indicate the need to further investigate the cause of this high recursive activity.

It is also useful to check the value of the Rollback per transaction % statistic. This statistic reports the percent of transactions rolled back. In a production system, this value should be low. If the output indicates a high percentage of transactions rolled back, the database expends a considerable amount of work to roll back changes made. This should be further investigated in order to see why the applications roll back so often.

2.4.Instance Efficiency Percentages (Target 100%)

Buffer Nowait %:	99.29	Redo NoWait %:	100.00
Buffer Hit %:	99.41	In-memory Sort %:	100.00
Library Hit %:	78.19	Soft Parse %:	90.15
Execute to Parse %:	81.10	Latch Hit %:	99.44
Parse CPU to Parse Elapsd %:	9.09	% Non-Parse CPU:	98.03

The section provides information about hit ratios for different memory components. These ratios, tells how often a particular data is found in a respecitive memory structure. Details about hit ratios are as mentioned delow,

Buffer Hit: Shows % of times particular block was found in buffer cache, instead of reading it from disk.

Buffer Nowait: Indicates % of times data buffers were accessed directly without any wait time.

Library Hit%: Shows % of times SQL and PL/SQL found in shared pool.

In-Memory Sort %: Shows %of times Sorting operations happened in memory than in the disk (temporary tablespace).

Soft parse %: Shows % of times the SQL in shared pool is used.

Latch Hit %: Shows % of time latches are acquired without having to wait.

As per Oracle AWR report the target should be 100% for these ratios. But in reality this is not possible always. Hence the ratio above 80% is always healthy.

2.5.Shared Pool Statistics:

	Begin	End
Memory Usage %:	91.36	90.18
% SQL with executions>1:	85.32	78.53
% Memory for SQL w/exec>1:	95.71	97.93

% SQL with executions>1: Shows % of SQLs executed more than 1 time. The % should be very near to value 100.

% memory for SQL w/exec>1: From the memory space allocated to cursors, shows which % has been used by cursors more than 1.

The ratio above 80% is always healthy.

2.6.Top 5 Timed Events:

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
cursor: pin S wait on X	963,939	10,958	11	22.3	Concurrency
CPU time		10,932		22.2
db file sequential read	6,301,396	9,923	2	20.2	User I/O
read by other session	7,022,258	5,819	1	11.8	User I/O
row cache lock	1,881,861	3,899	2	7.9	Concurrency

This section shows TOP 5 wait events the processes were waiting on during the snapshot time period. These events are helpful during analysis of any database related performance bottlenecks. The

· Wait Class, column helps in classifying whether the issue is related to application or infrastructure.

· Waits, column provides information about no. of wait happens.

· Time(s), column provides information about total CPU time in seconds spent on the wait.

Important wait events and their causes are explained in detail below,

2.6.1.db file scattered read:

This event indicates wait due to full table scans or index fast full scans.

To avoid this event, identify all the tables on which FTS is happening and create proper indexes so that oracle will do Index scans instead of FTS. The index scan will help in reducing no. of IO operations.

To get an idea about tables on which FTS is happening please refer to “Segment Statistics” -> “Segments By Physical Read” section of AWR report. This section lists down both Tables and Indexes on which Physical Reads are happening. Please note that physical reads doesn’t necessarily means FTS but a possibility of FTS.

2.6.2.db file sequential read:

The event indicates that index scan is happening while reading data from table. High no. of such event may be a cause of unselective indexes i.e. oracle optimizer is not selecting proper indexes from set of available indexes. This will result in extra IO activity and will contribute to delay in SQL execution.

Generally high no. is possible for properly tuned application having high transaction activity.

2.6.3.buffer busy wait:

Indicates that particular block is being used by more than one processes at the same. When first process is reading the block the other processes goes in a wait as the block is in unshared more. Typical scenario for this event to occur is, when we have batch process which is continuously polling database by executing particular SQL repeatedly and there are more than one parallel instances running for the process. All the instances of the process will try to access same memory blocks as the SQL they are executing is the same. This is one of the situation in which we experience this event.

2.6.4. enq: TX - row lock contention:

Oracle maintenance data consistency with the help of locking mechanism. When a particular row is being modified by the process, either through Update/ Delete or Insert operation, oracle tries to acquire lock on that row. Only when the process has acquired lock the process can modify the row otherwise the process waits for the lock. This wait situation triggers this event. The lock is released whenever a COMMIT is issued by the process which has acquired lock for the row. Once the lock is released, processes waiting on this event can acquire lock on the row and perform DML operation.

2.7.Time Model Statistics:

Statistic Name	Time (s)	% of DB Time
sql execute elapsed time	45,168.56	91.79
DB CPU	10,932.03	22.22
parse time elapsed	8,626.43	17.53
hard parse elapsed time	7,616.90	15.48
hard parse (sharing criteria) elapsed time	929.82	1.89
hard parse (bind mismatch) elapsed time	929.20	1.89
PL/SQL compilation elapsed time	731.51	1.49
PL/SQL execution elapsed time	440.62	0.90
sequence load elapsed time	409.34	0.83
repeated bind elapsed time	26.35	0.05
connection management call elapsed time	6.02	0.01
DB time	49,209.52
background elapsed time	1,774.14
background cpu time	448.11

Important statistics here is the DB Time. The statistic represents total time spent in database calls. It is calculated by aggregating the CPU time and wait time of all sessions not waiting on idle event (non-idle user sessions). Since this timing is cumulative time for all non-idle sessions, it is possible that the time will exceed the actual wall clock time.

Objective of tuning oracle database is to reduce the amount of time users spend in performing some action on the database.

This time represents time taken by foreground sessions and not background sessions.

In above example, DB Time is 49,209.52 seconds. 91% of the time is being spent on “SQL execution elapsed time” i.e for SQL execution.

DB CPU represents time spent on CPU resource by foreground user processes. This time doesn’t include waiting time for CPU. DB CPU is contributing to 22% of total DB time.

Important thing to note here is, the actual wall clock time is around 3600 seconds (difference between 2 snapshots) but the DB CPU shown here is 10,932 seconds.

If the server machine (on which database server is running) has more than 1 CPUs, it is possible to have DB CPU greater than actual wall clock time. In this example the database server is hosted on machine with 8 CPUs. Hence a 1 second is divided into 8 parts. Hence DB CPU 10,932 seconds means, 10,932 (DB CPU) / 8 (CPU) = 1366 wall clock seconds.

The “Parse time elapsed” and “Hard parse elapsed time” has taken around 17% and 15% of the total DB time. “Parse time elapsed” represents time spent for Syntax and Semantic checks. Whereas “Hard parse include time” represents time spent for Syntax and Semantic checks PLUS time spent for optimizing the SQL and generating optimizer plan.

2.8.Wait Class:

Wait Class	Waits	%Time -outs	Total Wait Time (s)	Avg wait (ms)	Waits /txn
Concurrency	3,371,424	28.51	16,439	5	17.83
User I/O	13,397,731	0.00	15,930	1	70.87
Commit	235,762	0.00	2,988	13	1.25
Cluster	1,307,191	0.00	1,323	1	6.91
System I/O	208,454	0.00	911	4	1.10
Other	2,018,321	14.36	475	0	10.68
Configuration	26,428	17.80	309	12	0.14
Application	1,643	0.00	34	21	0.01
Network	2,770,033	0.00	21	0	14.65

The events are sorted in descending order of “Total Wait Time (s)” column.

There are over 800 distinct wait events. Oracle has grouped these wait events in 12 wait classes. These wait classes are further divided in 2 categories, Administrative Wait Class and Application Wait Class.

These wait classes gives overall information about whether the waits happening for Application or for System events.

In the example above first 2 rows show that total wait time is higher for Concurrency and “User I/O” wait classes. Though we won’t have much control on reducing concurrency, but we could aim at reducing the User IO.

High User IO means,

· From the pool of available indexes proper indexes are not being used

· FTS is happening on big tables with millions of rows

This could give pointers for further tuning.

2.9.Wait Events:

Event	Waits	%Time -outs	Total Wait Time (s)	Avg wait (ms)	Waits /txn
cursor: pin S wait on X	963,939	99.70	10,958	11	5.10
db file sequential read	6,301,396	0.00	9,923	2	33.33
read by other session	7,022,258	0.00	5,819	1	37.15
row cache lock	1,881,861	0.00	3,899	2	9.95
log file sync	235,762	0.00	2,988	13	1.25
gc buffer busy	835,453	0.00	752	1	4.42
log file parallel write	145,027	0.00	716	5	0.77
library cache load lock	17,095	0.00	420	25	0.09
enq: TX - index contention	24,548	0.00	378	15	0.13
latch: cache buffers chains	172,466	0.00	292	2	0.91
kksfbc child completion	5,527	98.72	276	50	0.03
gc current grant 2-way	175,219	0.00	257	1	0.93
library cache lock	78,169	0.00	195	2	0.41
enq: SQ - contention	17,467	0.00	194	11	0.09
gc cr grant 2-way	141,816	0.00	148	1	0.75
gc current block 2-way	103,694	0.00	115	1	0.55
enq: HW - contention	4,142	0.00	112	27	0.02
db file parallel write	44,214	0.00	92	2	0.23
db file scattered read	31,617	0.00	89	3	0.17
latch: shared pool	81,395	0.00	87	1	0.43
log file sequential read	3,652	0.00	80	22	0.02
latch: row cache objects	40,058	0.00	76	2	0.21
latch: library cache	19,479	0.00	65	3	0.10
direct path write	18,866	0.00	53	3	0.10
gc cr multi block request	49,879	0.00	50	1	0.26
buffer busy waits	25,287	0.00	42	2	0.13
latch: session allocation	26,429	0.00	39	1	0.14
direct path read	21,708	0.00	39	2	0.11
enq: TX - row lock contention	914	0.00	34	37	0.00
ksxr poll remote instances	1,570,668	13.71	23	0	8.31
reliable message	286	5.59	22	78	0.00

The wait events are sorted on “Total Wait Time (s)” column in descending order. The idle events are listed down in the end. First 10 to 15 events should be looked into because rest of the events are idle events and can be ignore. These events are related to foreground processes.

2.10.Background Wait Events:

Event	Waits	%Time -outs	Total Wait Time (s)	Avg wait (ms)	Waits /txn
log file parallel write	145,047	0.00	716	5	0.77
db file parallel write	44,226	0.00	92	2	0.23
log file sequential read	3,659	0.00	81	22	0.02
events in waitclass Other	1,651,179	17.00	71	0	8.73
row cache lock	10,726	0.00	35	3	0.06
control file sequential read	9,724	0.00	14	1	0.05
latch: row cache objects	9,547	0.00	13	1	0.05
control file parallel write	1,497	0.00	6	4	0.01
gc current block 2-way	3,159	0.00	5	2	0.02
library cache lock	1,464	0.00	4	2	0.01
latch: shared pool	3,523	0.00	4	1	0.02
Log archive I/O	3,642	0.00	3	1	0.02
db file sequential read	862	0.00	3	4	0.00
cursor: pin S wait on X	137	100.00	2	12	0.00
library cache pin	47	0.00	1	12	0.00
os thread startup	8	0.00	0	62	0.00
latch: cache buffers chains	70	0.00	0	7	0.00
gc current grant 2-way	120	0.00	0	3	0.00
gc cr grant 2-way	323	0.00	0	1	0.00

2.11.SQL Statistics:

This is a very important section. This section lists down TOP SQL details based on,

· Elapsed Time

· CPU Time

· No. of Gets

· No. of Reads

· No. of Executions etc.

2.11.1.SQL ordered by Elapsed Time:

Elapsed Time (s)	CPU Time (s)	Executions	Elap per Exec (s)	% Total DB Time	SQL Id	SQL Module	SQL Text
7,678	503	1,004	7.65	15.60	93pczysczy6ut	XXXXXXXX	select MOD(NVL(translate(B.XXX...
1,202	97	1	1202.04	2.44	7pusny3989p7b	XXXXXXXX	declare XXX_XXXX XXX_XXX...
905	89	70	12.93	1.84	ftz3ja1zr82s5	XXXXXXXX	select ID , TO_CHAR(XXXX,...
88	11	41,246	0.00	0.18	6qz82dptj0qr7		select l.col#, l.intcol#, l....
59	11			0.12	2k3bu9k1p7kyz		select /*+ INDEX(ol$ ol$signat...
55	2	2	27.37	0.11	9pwx7hnjcgwyh	XXXXXXXX	select xxxx.xxxxxxx...
54	16	57,158	0.00	0.11	cqgv56fmuj63x		select owner#, name, namespace...
43	5			0.09	9g485acn2n30m		select col#, intcol#, reftyp, ...
37	10	20,391	0.00	0.08	39m4sx9k63ba2		select /*+ index(idl_ub2$ i_id...
35	2	1	35.01	0.07	77zqh2uy696by	XXXXXXX	SELECT DISTINCT XX.XXXXXX...

The section provides TOP SQLs sorted in descending order by, Elapsed Time(s).

Elapse per Exec (s): Elapse time in seconds for per execution of the SQL.

Executions: Total no. of executions for the SQL during the two snapshot period.

Elapsed Time: This is calculated by multiplying Elapse per Exec (s) by Executions.

% Total DB Time: % of DB time utilized by the SQL.

SQL Module: Provides module detail which is executing the SQL. Process name at the OS level is displayed as SQL Module name. If the module name starts with any of the names given below, then don’t consider these SQLs for tuning purpose as these SQLs are oracle internal SQLs,

1. DBMS

2. sqlplusw

3. TOAD

4. rman

5. SQL

6. Enterprise Manager

7. ORACLE

8. MMON_SLAVE

9. emagent etc…

This section is very important in terms of tuning perspective as it provides information about SQLs which need tuning.

During load testing one should compare this section for Release load test and with baseline load test. This comparison will give difference about which SQLs performed badly during baseline and release load test.

Looking at the column, Elapse per Exec (s), one can understand which SQLs need tuning. Normally SQLs taking time above 1 second are candidate for tuning.

For other sections like, SQL ordered by Gets, SQL ordered by CPU Time etc, the SQLs appearing under SQL ordered by Elapsed Time are listed most of the time. Hence this section is worth investigating.

Below is the brief description about each section,

· SQL ordered by CPU Time: SQLs are listed based on CPU time taken and are ordered by, CPU Time (s) column.

· SQL ordered by Gets: SQLs are listed based on Buffer Reads (Buffer Gets) for the SQL and are ordered by, Buffer Gets, column.

· SQL ordered by Reads: SQLs are listed based on Physical Reads (Disk Reads) for the SQL and are ordered by, Physical Reads, column.

· SQL ordered by Executions: SQLs are listed based on total no. of executions for the SQL and are ordered by, Executions, column.

For each of these sections, to get the complete SQL Text details, click on the SQL ID listed under SQL ID column.

2.12.Instance Activity Stats - Thread Activity

Statistic	Total	per Hour
log switches (derived)	2	2.00

This section provides information about how many Redo Log Switches are happening. The “Total“ column provides information for log switches during the snapshot period and “per hour” column provides information per Hour basis.

2 log switches per hour is optimal value as per oracle standards.

Search This Blog

SAP BASIS,HANA AND ORACLE DBA BLOGS

AWR AND ADDM ANALYSIS

Comments

Post a Comment

Popular posts from this blog

Query for Tablespace usage with Autoextend

How to configure multiple Oracle listeners

ORA-12505:USE_SID_AS_SERVICE_LISTENER=on.