环境:
OS: linux
CPU: 8个
Oracle Database: 10.2.0.3.0
接到开发组的一个调优请求,任务是对一张海量表 CS2_CT_MVMT(近 2亿多记录,表大小 48G)进行数据 update,而更新数据来自于另外一张海量的日志表 CS2_TXN_LOG(同样近 2亿,表大小 42G),数据处理的语句如下:
<span style="font-size: 9pt;">UPDATE CS2_CT_MVMT CT</span>
<span style="font-size: 9pt;"> SET CT.LAST_MOD_DT = (SELECT TL.MSG_GMT_DT</span>
<span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span>
<span style="font-size: 9pt;"> WHERE TL.MSG_ID > 9000000000000000000</span> <span style="font-size: 9pt;"> AND CT.MSG_ID = TL.MSG_ID</span> </code><code style="font-family: Courier New;"><span style="font-size: 9pt;"> AND TL.MSG_GMT_DT IS NOT NULL)</span>
<span style="font-size: 9pt;"> WHERE EXISTS (SELECT 1</span> <span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span> <span style="font-size: 9pt;"> WHERE TL.MSG_ID > 9000000000000000000</span> <span style="font-size: 9pt;"> AND CT.MSG_ID = TL.MSG_ID</span> <span style="font-size: 9pt;"> AND TL.MSG_GMT_DT IS NOT NULL);</span>
根据开发人员对数据的估算,在此逻辑下, CS2_CT_MVMT将会有近一半数据(近 1亿条)要被更新,相对应的,数据来自于 CS2_TXN_LOG中的近 1一亿条数据。开发人员曾经尝试过运行语句,但是 3天都没执行完。
第一步,对语句进行调优
首先,先看下语句的查询计划:
<span style="font-size: 9pt;">SQL> set line 300</span>
<span style="font-size: 9pt;">SQL> set pages 50</span>
<span style="font-size: 9pt;">SQL> explain plan for</span> <span style="font-size: 9pt;"> 2 UPDATE CS2_CT_MVMT CT</span>
<span style="font-size: 9pt;"> 3 SET CT.LAST_MOD_DT = (SELECT TL.MSG_GMT_DT</span>
<span style="font-size: 9pt;"> 4 FROM CS2_TXN_LOG TL</span>
<span style="font-size: 9pt;"> 5 WHERE TL.MSG_ID > 9000000000000000000</span>
<span style="font-size: 9pt;"> 6 AND CT.MSG_ID = TL.MSG_ID</span>
<span style="font-size: 9pt;"> 7 AND TL.MSG_GMT_DT IS NOT NULL)</span>
<span style="font-size: 9pt;"> 8 WHERE EXISTS (SELECT 1</span> <span style="font-size: 9pt;"> 9 FROM CS2_TXN_LOG TL</span>
<span style="font-size: 9pt;"> 10 WHERE TL.MSG_ID > 9000000000000000000</span>
<span style="font-size: 9pt;"> 11 AND CT.MSG_ID = TL.MSG_ID</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> 12 AND TL.MSG_GMT_DT IS NOT NULL);</span>
<span style="font-size: 9pt;">Explained.</span> <span style="font-size: 9pt;">SQL> select * from table(dbms_xplan.display());</span>
<span style="font-size: 9pt;">PLAN_TABLE_OUTPUT</span> <span style="font-size: 9pt;">---------------------------------------------------------------------------</span> </code>
<code style="font-size: 10pt; font-family: Courier New; margin: 0in -9pt 0.0001pt 0in; background: none repeat scroll 0% 0% #ffff99;"><span style="font-size: 9pt;">Plan hash value: 3604468536</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 11M| 432M| | 3580K (1)| 11:56:06 |</span>
<span style="font-size: 9pt;">| 1 | UPDATE | CS2_CT_MVMT | | | | | |</span>
<span style="font-size: 9pt;">|* 2 | HASH JOIN SEMI | | 11M| 432M| 352M| 3580K (1)| 11:56:06 |</span>
<span style="font-size: 9pt;">|* 3 | TABLE ACCESS FULL | CS2_CT_MVMT | 11M| 216M| | 1690K (1)| 05:38:12 |</span>
<span style="font-size: 9pt;">|* 4 | TABLE ACCESS FULL | CS2_TXN_LOG | 214M| 3888M| | 1552K (1)| 05:10:25 |</span>
<span style="font-size: 9pt;">|* 5 | FILTER | | | | | | |</span> <span style="font-size: 9pt;">|* 6 | TABLE ACCESS BY INDEX ROWID| CS2_TXN_LOG | 1 | 19 | | 4 (0)| 00:00:01 |</span> <span style="font-size: 9pt;">|* 7 | INDEX UNIQUE SCAN | CS2_TXN_LOG_PK | 1 | | | 3 (0)| 00:00:01 |</span> <span style="font-size: 9pt;">---------------------------------------------------------------------------</span> <span style="font-size: 9pt;">Predicate Information (identified by operation id):</span> <span style="font-size: 9pt;">---------------------------------------------------</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> 2 - access("CT"."MSG_ID"="TL"."MSG_ID")</span> <span style="font-size: 9pt;"> 3 - filter("CT"."MSG_ID">9000000000000000000)</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> 4 - filter("TL"."MSG_ID">9000000000000000000 AND "TL"."MSG_GMT_DT" IS NOT NULL)</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> 5 - filter(9000000000000000000<:B1)</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">PLAN_TABLE_OUTPUT</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;"> 6 - filter("TL"."MSG_GMT_DT" IS NOT NULL)</span>
<span style="font-size: 9pt;"> 7 - access("TL"."MSG_ID"=:B1)</span> <span style="font-size: 9pt;"> filter("TL"."MSG_ID">9000000000000000000)</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">25 rows selected.</span>
从查询计划看,两表之间存在一个 hash join。再看看两表之间的关系:
<span style="font-size: 9pt;">SQL> select c.table_name as child_table, c.constraint_name as FK, p.table_name as parent_table, p.constraint_name as PK</span> <span style="font-size: 9pt;"> 2 from all_constraints p, all_constraints c</span> <span style="font-size: 9pt;"> 3 where c.table_name in ('CS2_CT_MVMT','CS2_TXN_LOG')</span> <span style="font-size: 9pt;"> 4 and c.r_constraint_name = p.constraint_name</span> <span style="font-size: 9pt;"> 5 and c.constraint_type = 'R'</span> <span style="font-size: 9pt;"> 6 and p.constraint_type='P';</span>
<span style="font-size: 9pt;"> CHILD_TABLE FK PARENT_TABLE PK</span> <span style="font-size: 9pt;">----------------------------- ------------------------------ -------------</span>
<span style="font-size: 9pt;">CS2_CT_MVMT CS2_CT_MVMT_FK1 CS2_TXN_LOG CS2_TXN_LOG_PK</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">SQL> select table_name, column_name from dba_ind_columns</span> <span style="font-size: 9pt;"> 2 where index_name = 'CS2_TXN_LOG_PK';</span> <span style="font-size: 9pt;">ABLE_NAME COLUMN_NAME</span> <span style="font-size: 9pt;">----------------------------- ------------------------------</span>
<span style="font-size: 9pt;">CS2_TXN_LOG MSG_ID</span>
可见, CS2_CT_MVMT是依赖于 CS2_TXN_LOG的子表,其依赖字段 MSG_ID正是 CS2_TXN_LOG的主键唯一字段。我们可以对这条语句的逻辑描述如下:
CS2_CT_MVMT依据主外键关系从父表 CS2_TXN_LOG中更新相应数据,且只从 CS2_TXN_LOG获取满足 (MSG_ID > 9000000000000000000 AND MSG_GMT_DT IS NOT NULL)。我们可以这个逻辑关系对语句进行调整以减少查询计划中的访问路径:
根据业务分析, CS2_CT_MVMT中将近一半的数据满足更新条件,也就是由 hash join只能过滤一半的数据,而在 SET子句中的子查询已经保证了两表数据的完整性关系已经对父表的数据过滤条件。这样的话,这个 join的代价就太大了,它应该从语句中拿掉:
<span style="font-size: 9pt;">UPDATE CS2_CT_MVMT CT</span> <span style="font-size: 9pt;"> SET CT.LAST_MOD_DT = (SELECT TL.MSG_GMT_DT</span> <span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> WHERE TL.MSG_ID > 9000000000000000000</span> <span style="font-size: 9pt;"> AND CT.MSG_ID = TL.MSG_ID</span> <span style="font-size: 9pt;"> AND TL.MSG_GMT_DT IS NOT NULL);</span> </code><span style="font-size: 11pt; font-family: 宋体;"> </span></pre>
<p class="MsoNormal" style="text-indent: 0.25in;"><span style="font-size: 11pt; font-family: 宋体;" lang="ZH-CN">但这样会产生一个问题:子查询的记录数可能为</span> <span style="font-size: 11pt; font-family: 宋体;">1<span lang="ZH-CN">或</span> 0<span lang="ZH-CN">(</span> 1:1<span lang="ZH-CN">),也就是将近</span> 1<span lang="ZH-CN">半为</span> NULL<span lang="ZH-CN">(即对应的</span> CS2_CT_MVMT<span lang="ZH-CN">记录无需更新)。很简单,我们用</span> NVL<span lang="ZH-CN">函数来处理:</span> </span></p>
<pre style="font-size: 10pt; font-family: Courier New; margin: 0in -9pt 0.0001pt 0in; background: none repeat scroll 0% 0% #ffff99;"><code style="font-family: Courier New;"><span style="font-size: 9pt;">UPDATE CS2_CT_MVMT CT</span>
<span style="font-size: 9pt;"> SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT</span> <span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span>
<span style="font-size: 9pt;"> WHERE TL.MSG_ID > 9000000000000000000</span> <span style="font-size: 9pt;"> AND CT.MSG_ID = TL.MSG_ID</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> AND TL.MSG_GMT_DT IS NOT NULL), CT.LAST_MOD_DT);</span> </code><span style="font-size: 11pt; font-family: 宋体;" lang="ZH-CN">从查询计划中看到,</span>
COST<span lang="ZH-CN">将近一半:</span></pre>
<p class="MsoNormal" style="text-indent: 0.25in;"></p>
<p class="MsoPlainText"><span style="font-family: 宋体;"> </span></p>
<pre style="font-size: 10pt; font-family: Courier New; margin: 0in -9pt 0.0001pt 0in; background: none repeat scroll 0% 0% #ffff99;"><code style="font-family: Courier New;"><span style="font-size: 9pt;">---------------------------------------------------------------------------</span> <span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |</span> <span style="font-size: 9pt;">---------------------------------------------------------------------------</span> <span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 119M| 2160M| 1699K (2)| 05:39:57 |</span> <span style="font-size: 9pt;">| 1 | UPDATE | CS2_CT_MVMT | | | | |</span> <span style="font-size: 9pt;">| 2 | TABLE ACCESS FULL | CS2_CT_MVMT | 119M| 2160M| 1699K (2)| 05:39:57 |</span>
<span style="font-size: 9pt;">|* 3 | FILTER | | | | | |</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">* 4 | TABLE ACCESS BY INDEX ROWID| CS2_TXN_LOG | 1 | 19 | 4 (0)| 00:00:01 |</span> <span style="font-size: 9pt;">|* 5 | INDEX UNIQUE SCAN | CS2_TXN_LOG_PK | 1 | | 3 (0)| 00:00:01 |</span> <span style="font-size: 9pt;">---------------------------------------------------------------------------</span> </code>
<code style="font-family: Courier New;">
<span style="font-size: 9pt;">Predicate Information (identified by operation id):</span> <span style="font-size: 9pt;">--------------------------------------------------</span> <span style="font-size: 9pt;"> 3 - filter(9000000000000000000<:B1)</span> <span style="font-size: 9pt;"> 4 - filter("TL"."MSG_GMT_DT" IS NOT NULL)</span> <span style="font-size: 9pt;"> 5 - access("TL"."MSG_ID"=:B1)</span>
<span style="font-size: 9pt;"> filter("TL"."MSG_ID">9000000000000000000)</span>
再进一步,由于存在等价关系 CT.MSG_ID = TL.MSG_ID,我们可以将 TL.MSG_ID > 9000000000000000000转换为 CT.MSG_ID > 9000000000000000000,作为对 CS2_CT_MVMT扫描后的 filter,从而大大减少 UPDATE操作,也大大减少子查询的运行次数。
<span style="font-size: 9pt;">UPDATE CS2_CT_MVMT CT</span> <span style="font-size: 9pt;"> SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT</span> <span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> WHERE CT.MSG_ID = TL.MSG_ID</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> AND TL.MSG_GMT_DT IS NOT NULL), CT.LAST_MOD_DT)</span> <span style="font-size: 9pt;">WHERE CT.MSG_ID > 9000000000000000000;</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |</span> </code><code style="font-family: Courier New;"><span style="font-family: Courier New;">-----------------------------------------------------------</span> <span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 11M| 216M| 1690K (1)| 05:38:12 |</span> <span style="font-size: 9pt;">| 1 | UPDATE | CS2_CT_MVMT | | | | |</span> <span style="font-size: 9pt;">* 2 | TABLE ACCESS FULL | CS2_CT_MVMT | 11M| 216M| 1690K (1)| 05:38:12 |</span> <span style="font-size: 9pt;">|* 3 | TABLE ACCESS BY INDEX ROWID| CS2_TXN_LOG | 1 | 19 | 4 (0)| 00:00:01 |</span> <span style="font-size: 9pt;">|* 4 | INDEX UNIQUE SCAN | CS2_TXN_LOG_PK | 1 | | 3 (0)| 00:00:01 |</span> <span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;"> </span></code>
<span style="font-size: 9pt;">--</span><code style="font-family: Courier New;">
<span style="font-size: 9pt;"> 2 - filter("CT"."MSG_ID">9000000000000000000)</span>
<span style="font-size: 9pt;"> 3 - filter("TL"."MSG_GMT_DT" IS NOT NULL)</span>
<span style="font-size: 9pt;"> 4 - access("TL"."MSG_ID"=:B1)</span>
最后再看子查询部分:对于每条满足条件的 CS2_CT_MVMT中的记录,都将执行一次子查询,由 CS2_TXN_LOG的主键获取其表数据块中的 MSG_GMT_DT数据。子查询包含了 3步操作:
1. 对主键的 INDEX UNIQUE SCAN,获取表记录的 ROWID;
2. 由 ROWID读取表的数据块;
3. 根据条件 MSG_GMT_DT IS NOT NULL进行过滤
可以预计,子查询的执行次数也是巨大的,我们可以考虑通过建立复合字段索引来消除第二步操作。而通过与开发人员确认,实际上 CS2_TXN_LOG中 MSG_GMT_DT为 NULL的数据极少,也就是 MSG_GMT_DT IS NOT NULL过滤的数据极少, NVL函数已经可以处理这些数据,我们就可以将这个条件拿掉。
<span style="font-size: 9pt;">SQL> CREATE UNIQUE INDEX CS2_PARTY_OWNER.CS2_TXN_LOG_TEST ON CS2_TXN_LOG (MSG_ID, MSG_GMT_DT);</span>
<span style="font-size: 9pt; font-family: Arial Unicode MS;" lang="ZH-CN">语及查询计划如下:</span>
<span style="font-size: 9pt;">UPDATE CS2_CT_MVMT CT</span>
<span style="font-size: 9pt;"> SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT</span>
<span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span>
<span style="font-size: 9pt;"> WHERE CT.MSG_ID = TL.MSG_ID), CT.LAST_MOD_DT)</span>
<span style="font-size: 9pt;">WHERE CT.MSG_ID > 9000000000000000000;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 11M| 216M| 1690K (1)| 05:38:12 |</span>
<span style="font-size: 9pt;">| 1 | UPDATE | CS2_CT_MVMT | | | | |</span>
<span style="font-size: 9pt;">|* 2 | TABLE ACCESS FULL| CS2_CT_MVMT | 11M| 216M| 1690K (1)| 05:38:12 |</span>
<span style="font-size: 9pt;">|* 3 | INDEX RANGE SCAN | CS2_TXN_LOG_TEST | 1 | 19 | 4 (0)| 00:00:01 |</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">Predicate Information (identified by operation id):</span>
<span style="font-size: 9pt;">---------------------------------------------------</span>
<span style="font-size: 9pt;"> 2 - filter("CT"."MSG_ID">9000000000000000000)</span>
<span style="font-size: 9pt;"> 3 - access("TL"."MSG_ID"=:B1)</span>
通过逻辑分析和查询计划分析对语句的查询部分优化基本上完成。但是这个结果还远不能满足 1小时的系统 off line时间完成数据更新的目标。
由于这是一次表更新操作,因此我们就还需要考虑一些依赖对象对更新操作的影响。
第二步,消除依赖对象的性能影响
我们看下被更新表上有一些什么依赖对象会影响到更新性能。
<span style="font-size: 9pt;">SQL> SELECT trigger_name FROM DBA_TRIGGERS WHERE table_name='CS2_CT_MVMT' AND triggering_event like '%UPDATE%';</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">no rows selected</span> <span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">SQL> SELECT s.constraint_name, search_condition</span> <span style="font-size: 9pt;"> 2 FROM DBA_CONSTRAINTS S, DBA_CONS_COLUMNS C</span>
<span style="font-size: 9pt;"> 3 WHERE s.table_name='CS2_CT_MVMT'</span>
<span style="font-size: 9pt;"> 4 AND s.constraint_name=c.constraint_name</span>
<span style="font-size: 9pt;"> 5 and s.table_name=c.table_name</span>
<span style="font-size: 9pt;"> 6 and c.column_name='LAST_MOD_DT';</span>
<span style="font-size: 9pt;">CONSTRAINT_NAME SEARCH_CONDITION</span>
<span style="font-size: 9pt;">------------------------------ --------------------------------------------</span>
<span style="font-size: 9pt;">SYS_C0013690 "LAST_MOD_DT" IS NOT NULL</span>
<span style="font-size: 9pt;">SQL> SELECT index_name FROM DBA_IND_COLUMNS</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> 2 WHERE TABLE_NAME='CS2_CT_MVMT'</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;"> 3 AND column_name='LAST_MOD_DT';</span>
<span style="font-size: 9pt;">INDEX_NAME</span> </code>
<code style="font-family: Courier New;"><span style="font-size: 9pt;">-----------------------------</span>
<span style="font-size: 9pt;">CS2_CT_MVMT_IDX7</span>
在批量更新之前,将这些依赖对象和约束先禁用将有助于提高更新性能。
<span style="font-size: 9pt;">SQL> ALTER TABLE CS2_CT_MVMT MODIFY LAST_MOD_DT NULL;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">Table altered.</span>
提示:我们已经通过逻辑保证了 LAST_MOD_DT不被更新为 NULL,这个约束对整体性能的影响很小,可以不用关闭。
<span style="font-size: 9pt;">SQL> drop index CS2_CT_MVMT_IDX7;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">Index dropped.</span>
Tips:有人可能会想到将表修改为 NOLOGGING,以禁止写 REDO LOG。但事实上, NOLOGGING只对 Direct Write的 INSERT起作用,对 UPDATE,总是会写 REDO LOG。有兴趣的朋友可以做个简单实验去验证一下。
第三步,将语句并行化
这一步起的作用并不是提高整体性能,而是使过程在更短的时间内占用更大的负载来完成任务。
给语句加上并行化提示
<span style="font-size: 9pt;">UPDATE /*+parallel(CT 8)*/ CS2_CT_MVMT CT</span>
<span style="font-size: 9pt;"> SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT</span>
<span style="font-size: 9pt;"> FROM CS2_TXN_LOG TL</span>
<span style="font-size: 9pt;"> WHERE CT.MSG_ID = TL.MSG_ID), CT.LAST_MOD_DT)</span>
<span style="font-size: 9pt;">WHERE CT.MSG_ID > 9000000000000000000;</span>
看查询计划:
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 11M| 216M| 1690K (1)| 05:38:12 |</span>
<span style="font-size: 9pt;">| 1 | UPDATE | CS2_CT_MVMT | | | | |</span>
<span style="font-size: 9pt;">|* 2 | TABLE ACCESS FULL| CS2_CT_MVMT | 11M| 216M| 1690K (1)| 05:38:12 |</span>
<span style="font-size: 9pt;">|* 3 | INDEX RANGE SCAN | CS2_TXN_LOG_TEST | 1 | 19 | 4 (0)| 00:00:01 |</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
嗯,没其作用。看看系统的 parallel设置:
<span style="font-size: 9pt;">SQL> show parameter parallel</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">NAME TYPE VALUE</span>
<span style="font-size: 9pt;">------------------------------------ ----------- --------------------------</span>
<span style="font-size: 9pt;">fast_start_parallel_rollback string LOW</span>
<span style="font-size: 9pt;">parallel_adaptive_multi_user boolean TRUE</span>
<span style="font-size: 9pt;">parallel_automatic_tuning boolean FALSE</span>
<span style="font-size: 9pt;">parallel_execution_message_size integer 2152</span>
<span style="font-size: 9pt;">parallel_instance_group string</span>
<span style="font-size: 9pt;">parallel_max_servers integer 0</span>
<span style="font-size: 9pt;">parallel_min_percent integer 0</span>
<span style="font-size: 9pt;">parallel_min_servers integer 0</span>
<span style="font-size: 9pt;">parallel_server boolean TRUE</span>
<span style="font-size: 9pt;">parallel_server_instances integer 3</span>
<span style="font-size: 9pt;">parallel_threads_per_cpu integer 2</span>
<span style="font-size: 9pt;">recovery_parallelism integer 0</span>
注意到, parallel_max_servers被设置为 0了,因此不会起并行进程。修改设置,
<span style="font-size: 9pt;">SQL> alter system set parallel_max_servers=160 scope=memory;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">System altered.</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt; font-family: Arial Unicode MS;" lang="ZH-CN">再次获取查询计划</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 11M| 216M| 234K (1)| 00:46:50 | | | |</span>
<span style="font-size: 9pt;">| 1 | UPDATE | CS2_CT_MVMT | | | | | | | |</span>
<span style="font-size: 9pt;">| 2 | PX COORDINATOR | | | | | | | | |</span>
<span style="font-size: 9pt;">| 3 | PX SEND QC (RANDOM)| :TQ10000 | 11M| 216M| 234K (1)| 00:46:50 | Q1,00 | P->S | QC (RAND) |</span>
<span style="font-size: 9pt;">| 4 | PX BLOCK ITERATOR | | 11M| 216M| 234K (1)| 00:46:50 | Q1,00 | PCWC | |</span>
<span style="font-size: 9pt;">|* 5 | TABLE ACCESS FULL| CS2_CT_MVMT | 11M| 216M| 234K (1)| 00:46:50 | Q1,00 | PCWP | |</span>
<span style="font-size: 9pt;">|* 6 | INDEX RANGE SCAN | CS2_TXN_LOG_TEST | 1 | 19 | 4 (0)| 00:00:01 | | | |</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
语句已经并行化了。
尝试运行语句,看看效果。
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">SQL> select s.sid, s.SERIAL#, s.PROGRAM, s.MODULE, sq.SQL_ID, sq.sql_text</span>
<span style="font-size: 9pt;"> 2 from v$session s, v$sqlarea sq</span>
<span style="font-size: 9pt;"> 3 where s.sql_address = sq.address</span>
<span style="font-size: 9pt;"> 4 order by sql_id, program;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;"> SID SERIAL# PROGRAM MODULE SQL_ID</span>
<span style="font-size: 9pt;">---------- ---------- ------------------------------------------------ ----</span>
<span style="font-size: 9pt;">SQL_TEXT</span>
<span style="font-size: 9pt;">---------------------------------------------------------------------------</span>
<span style="font-size: 9pt;"> 1892 6069 oracle@pmrac01 (P000) SQL*Plus 51b0kx3g5gbfx</span>
<span style="font-size: 9pt;" lang="FR">UPDATE /*+ PARALLEL(CT,8) */ CS2_CT_MVMT CT SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT </span>
<span style="font-size: 9pt;" lang="FR">FROM CS2_TXN_LOG TL </span>
<span style="font-size: 9pt;" lang="FR">WHERE CT.MSG_ID = TL.MSG_ID),CT.LAST_MOD_DT) WHERE CT.MSG_ID > 9000000000000000000</span>
<span style="font-size: 9pt;" lang="FR"> 1961 23363 oracle@pmrac01 (P001) SQL*Plus 51b0kx3g5gbfx</span>
<span style="font-size: 9pt;" lang="FR">UPDATE /*+ PARALLEL(CT,8) */ CS2_CT_MVMT CT SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT </span>
<span style="font-size: 9pt;" lang="FR">FROM CS2_TXN_LOG TL </span>
<span style="font-size: 9pt;" lang="FR">WHERE CT.MSG_ID = TL.MSG_ID),CT.LAST_MOD_DT) WHERE CT.MSG_ID > 9000000000000000000</span>
<span style="font-size: 9pt;" lang="FR">... ...</span>
<span style="font-size: 9pt;" lang="FR"> 1898 7240 oracle@pmrac01 (P007) SQL*Plus 51b0kx3g5gbfx</span>
<span style="font-size: 9pt;" lang="FR">UPDATE /*+ PARALLEL(CT,8) */ CS2_CT_MVMT CT SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT </span>
<span style="font-size: 9pt;" lang="FR">FROM CS2_TXN_LOG TL </span>
<span style="font-size: 9pt;" lang="FR">WHERE CT.MSG_ID = TL.MSG_ID),CT.LAST_MOD_DT) WHERE CT.MSG_ID > 9000000000000000000</span>
<span style="font-size: 9pt;" lang="FR"> 1889 6026 sqlplus@pmrac01 (TNS V1-V3) SQL*Plus 51b0kx3g5gbfx</span>
<span style="font-size: 9pt;" lang="FR">UPDATE /*+ PARALLEL(CT,8) */ CS2_CT_MVMT CT SET CT.LAST_MOD_DT = NVL((SELECT TL.MSG_GMT_DT </span>
<span style="font-size: 9pt;" lang="FR">FROM CS2_TXN_LOG TL </span>
<span style="font-size: 9pt;" lang="FR">WHERE CT.MSG_ID = TL.MSG_ID),CT.LAST_MOD_DT) WHERE CT.MSG_ID > 9000000000000000000</span>
可以注意到,已经有 9个会话在执行该语句( 1个协调进程 (coordinator), 8个并行进程 (slave))。 sql_id是 51b0kx3g5gbfx。再观察这些会话产生的统计数据:
<span style="font-size: 9pt;">SQL> select s.sid, s.SERIAL#, n.NAME, ss.value, s.USERNAME,s.LOGON_TIME,s.MODULE
2 from v$sesstat ss, v$statname n, v$session s, v$px_session px
3 where n.name in ('physical write bytes', 'physical read bytes', 'undo change vector size','redo size')
4 and ss.statistic#=n.STATISTIC#
5 and ss.SID = s.SID
6 and ss.SID = px.SID
7 and not exists (select 1 from v$mystat m where s.sid=m.sid)
order by n.name desc,ss.value desc;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;"> SID SERIAL# NAME VALUE</span>
<span style="font-size: 9pt;">---------- ---------- -------------------------------------- -----------</span>
<span style="font-size: 9pt;"> 1889 6026 undo change vector size 374128088</span>
<span style="font-size: 9pt;"> 1892 6069 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1893 10815 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1954 2616 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1935 11165 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1945 7426 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1961 23363 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1832 6515 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1898 7240 undo change vector size 0</span>
<span style="font-size: 9pt;"> 1889 6026 redo size 1140869748</span>
<span style="font-size: 9pt;"> 1961 23363 redo size 0</span>
<span style="font-size: 9pt;"> 1832 6515 redo size 0</span>
<span style="font-size: 9pt;"> 1892 6069 redo size 0</span>
<span style="font-size: 9pt;"> 1893 10815 redo size 0</span>
<span style="font-size: 9pt;"> 1898 7240 redo size 0</span>
<span style="font-size: 9pt;"> 1935 11165 redo size 0</span>
<span style="font-size: 9pt;"> 1954 2616 redo size 0</span>
<span style="font-size: 9pt;"> 1945 7426 redo size 0</span>
<span style="font-size: 9pt;"> 1954 2616 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1945 7426 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1935 11165 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1898 7240 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1893 10815 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1892 6069 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1889 6026 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1832 6515 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1961 23363 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1889 6026 physical read bytes 3674587136</span>
<span style="font-size: 9pt;"> 1832 6515 physical read bytes 2610708480</span>
<span style="font-size: 9pt;"> 1935 11165 physical read bytes 2233417728</span>
<span style="font-size: 9pt;"> 1945 7426 physical read bytes 1317404672</span>
<span style="font-size: 9pt;"> 1892 6069 physical read bytes 1291378688</span>
<span style="font-size: 9pt;"> 1893 10815 physical read bytes 1283899392</span>
<span style="font-size: 9pt;"> 1954 2616 physical read bytes 1242357760</span>
<span style="font-size: 9pt;"> 1898 7240 physical read bytes 1230888960</span>
<span style="font-size: 9pt;"> 1961 23363 physical read bytes 1193918464</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">36 rows selected.</span>
发现问题了:只有一个进程(即主进程)产生了 Redo和 Undo log,所有并行进程都没有 redo和 undo产生,也就是只有一个进程在进行写操作,尽管读已经并行化了。
再看会话的并行属性:
<span style="font-size: 9pt;">SQL> select s.sid, s.SERIAL#, s.PDML_STATUS, s.PDDL_STATUS, s.PQ_STATUS, s.PROGRAM
2 from v$session s, v$px_session px
3 where s.sid = px.sid
4 and not exists (select 1 from v$mystat m where s.sid=m.sid)</span> <span style="font-size: 9pt; font-family: Arial Unicode MS;" lang="ZH-CN">;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;"> SID SERIAL# PDML_STA PDDL_STA PQ_STATU</span>
<span style="font-size: 9pt;">---------- ---------- -------- -------- --------</span>
<span style="font-size: 9pt;"> 1792 3473 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1827 12996 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1872 4173 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1889 6998 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1910 16279 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1916 3073 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1935 12587 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1953 3125 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1958 19949 DISABLED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">9 rows selected.</span>
看到 PDML是 diabled。说明语句只进行了 parallel query,而没有达到 parallel DML( PDML)的目的。为了实现 PDML,还需要在会话中打开 parallel DML的开关
<span style="font-size: 9pt;">SQL> ALTER SESSION FORCE PARALLEL DML;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">Session altered.</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt; font-family: Arial Unicode MS;" lang="ZH-CN">再次运行语句,观察统计情况:</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">SQL> select s.sid, s.SERIAL#, n.NAME, ss.value</span>
<span style="font-size: 9pt;"> 2 from v$sesstat ss, v$statname n, v$session s, v$sqlarea sq</span>
<span style="font-size: 9pt;"> 3 where n.name in ('physical write bytes', 'physical read bytes', 'undo change vector size','redo size')</span>
<span style="font-size: 9pt;"> 4 and ss.statistic#=n.STATISTIC#</span>
<span style="font-size: 9pt;"> 5 and s.sql_address = sq.address</span>
<span style="font-size: 9pt;"> 6 and ss.SID = s.SID</span>
<span style="font-size: 9pt;"> 7 and sq.sql_id='51b0kx3g5gbfx'</span>
<span style="font-size: 9pt;"> 8 order by n.name desc,ss.value desc;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;"> SID SERIAL# NAME VALUE</span>
<span style="font-size: 9pt;">---------- ---------- -------------------------------------- --------------</span>
<span style="font-size: 9pt;"> 1779 524 undo change vector size 10659808</span>
<span style="font-size: 9pt;"> 1806 8301 undo change vector size 10592844</span>
<span style="font-size: 9pt;"> 1935 11332 undo change vector size 9842040</span>
<span style="font-size: 9pt;">... ...</span>
<span style="font-size: 9pt;"> 1838 8657 redo size 22625612</span>
<span style="font-size: 9pt;"> 1806 8301 redo size 22488968</span>
<span style="font-size: 9pt;"> 1935 11332 redo size 20878408</span>
<span style="font-size: 9pt;"> 1892 6131 redo size 15077588</span>
<span style="font-size: 9pt;">... ...</span>
<span style="font-size: 9pt;"> 1935 11332 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1838 8657 physical write bytes 0</span>
<span style="font-size: 9pt;"> 1901 28061 physical write bytes 0</span>
<span style="font-size: 9pt;">... ...</span>
<span style="font-size: 9pt;"> 1838 8657 physical read bytes 631848960</span>
<span style="font-size: 9pt;"> 1935 11332 physical read bytes 625991680</span>
<span style="font-size: 9pt;"> 1872 3007 physical read bytes 559521792</span>
<span style="font-size: 9pt;">... ...</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">68 rows selected.</span>
会话信息中, PDML已经强制使用
<span style="font-size: 9pt;">SQL> select s.sid, s.SERIAL#, s.PDML_STATUS, s.PDDL_STATUS, s.PQ_STATUS, s.PROGRAM
2 from v$session s, v$px_session px
3 where s.sid = px.sid
4 and not exists (select 1 from v$mystat m where s.sid=m.sid)</span> <span style="font-size: 9pt; font-family: Arial Unicode MS;" lang="ZH-CN">;</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;"> SID SERIAL# PDML_STA PDDL_STA PQ_STATU</span>
<span style="font-size: 9pt;">---------- ---------- -------- -------- --------</span>
<span style="font-size: 9pt;"> 1779 524 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1783 13993 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1792 3477 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1806 8481 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1812 2874 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1819 24796 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1838 8756 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1842 5929 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1872 4177 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1889 7002 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1894 13805 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1910 16366 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1916 3077 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1927 6182 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1935 12591 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1958 19955 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> 1959 9259 FORCED ENABLED ENABLED</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">17 rows selected.</span>
细心的朋友可能会留意到在我的 HINT中指定的 Parallel Degree是 8,而在这却出现了 16个 slave进程。我们看看在设置 PDML语句的查询计划变化就知道原因了:
<span style="font-size: 9pt;">PLAN_TABLE_OUTPUT</span>
<span style="font-size: 9pt;">--------------------------------------------------------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">Plan hash value: 4032879153</span>
<span style="font-size: 9pt;"> </span>
<span style="font-size: 9pt;">--------------------------------------------------------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |</span>
<span style="font-size: 9pt;">--------------------------------------------------------------------------------------------------------------------------</span>
<span style="font-size: 9pt;">| 0 | UPDATE STATEMENT | | 11M| 216M| 124K (1)| 00:24:53 | | | |</span>
<span style="font-size: 9pt;">| 1 | PX COORDINATOR | | | | | | | | |</span>
<span style="font-size: 9pt;">| 2 | PX SEND QC (RANDOM) | :TQ10001 | 11M| 216M| 124K (1)| 00:24:53 | <span style="color: red;">Q1,01</span> | P->S | QC (RAND) |</span>
<span style="font-size: 9pt;">| 3 | INDEX MAINTENANCE | CS2_CT_MVMT | | | | | Q1,01 | PCWP | |</span>
<span style="font-size: 9pt;">| 4 | PX RECEIVE | | 11M| 216M| 124K (1)| 00:24:53 | Q1,01 | PCWP | |</span>
<span style="font-size: 9pt;">| 5 | PX SEND RANGE | :TQ10000 | 11M| 216M| 124K (1)| 00:24:53 | <span style="color: blue;">Q1,00</span> | P->P | RANGE |</span>
<span style="font-size: 9pt;">| 6 | UPDATE | CS2_CT_MVMT | | | | | Q1,00 | PCWP | |</span>
<span style="font-size: 9pt;">| 7 | PX BLOCK ITERATOR | | 11M| 216M| 124K (1)| 00:24:53 | Q1,00 | PCWC | |</span>
<span style="font-size: 9pt;">|* 8 | TABLE ACCESS FULL| CS2_CT_MVMT | 11M| 216M| 124K (1)| 00:24:53 | Q1,00 | PCWP | |</span>
<span style="font-size: 9pt;">|* 9 | INDEX RANGE SCAN | CS2_TXN_LOG_TEST | 1 | 19 | 4 (0)| 00:00:01 | | | |</span>
<span style="font-size: 9pt;">--------------------------------------------------------------------------------------------------------------------------</span>
在查询计划中, slave进程集的数量是 2个( TQ字段, Q1,00和 Q1,01),因此,进程总量就变为 2*8=16了。
这样,这条语句就真正实现了 PDML。最终,我们设置该语句的并行度为 32,在生产环境上的执行时间是 25分钟,加上索引重建的时间,总共时间不到 1小时,实现了优化目的。
Bug :说实话, Oracle的 PDML还不是十分成熟,在我们的测试调优过程中,遇到了多个 Bug(参见 Metalink上 Bug 4896424、 Bug 5914711)。其中还有一个 Oracle还在定位中(当 Parallel Degree大于 32时随机出现)。这些 bug在 10g中都没有补丁,虽然它们都有规避办法,但是如果直接在 online时使用 PDML总还是让人不放心,好彩我们的程序只是一个运行一次的数据补丁,并不会直接影响生产系统。
补充 1 : parallel DML的效果在很大程度上还取决于磁盘 IO的并行程度。有人可能会有一个误解,认为 parallel DML只能在分区表上起作用。但事实上,从 9iR2开始, oracle的 intra-partition技术支持在单分区的 parallel DML( update,merge.在 9iR1开始支持单分区的 parallel Insert)。
补充 2 :在分分区表或者单个分区上并行进程数量受到 Min Transaction Freelists的限制。在 MSSM的段管理方式中,可以参考下表可(并非一定等于该数值,因为不同的数据块结构会有不同 ITL数或其他信息,这些不同都会影响到 Min Transaction Freelists)。在 ASSM中, Oracle通过 bitmap管理 free block,这一限制最大可以达到 65535。
Block Size |
Min Transaction Freelists |
2k |
25 |
4k |
50 |
8k |
101 |
16k |
204 |
32k |
409 |
33 :在我的调试过程中,曾经发生一件令人费解的事——有时从会话中找不到并行会话!最终,通过 的方法找到了原因,具体可参见这片文章:《通过 Parallel Trace分析并行过程 》。