DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

qq群里面的一朋友的的DG因为备库已经下架,主库重启的时候不能正常启动,帮忙处理结果如下
版本相关信息

Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
ORACLE_HOME = /export/home/oracle/product/9.2.0
System name:    SunOS
Node name:      eTermSrv2
Release:        5.10
Version:        Generic
Machine:        sun4u
Instance name: abcd

数据库不能启动日志

Fri Aug 10 12:37:56 2012
ALTER DATABASE OPEN
Fri Aug 10 12:37:56 2012
LGWR: Primary database is in CLUSTER CONSISTENT mode
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LNS0 started with pid=16
Fri Aug 10 12:37:59 2012
ORA-1013 signalled during: ALTER DATABASE OPEN...
Fri Aug 10 12:41:45 2012
LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2
LGWR: Continuing...
Fri Aug 10 12:41:45 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-12535: TNS:operation timed out
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
Creating archive destination LOG_ARCHIVE_DEST_2: 'ora9i'
LGWR: Error 12535 creating archivelog file 'ora9i'
Fri Aug 10 12:45:32 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-12535: TNS:operation timed out
LGWR: Completed archiving  log 1 thread 1 sequence 6808
Thread 1 advanced to log sequence 6808
Fri Aug 10 12:45:32 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'
LGWR: terminating instance due to error 1157
Instance terminated by LGWR, pid = 11504

通过这里大概看出数据库原dg配置是MAXIMUM AVAILABILITY,因为备机下架,导致ora9i的tns不能访问,从而出现一些列错误,其中使得lgwr异常,因为oracle的某种内部机制,导致dbwr不能访问数据文件(这里体现出来是system01.dbf不能访问,但实际上应该是所有所有数据文件均不能访问,因为system01.dbf位于第一,所以报出该错误.)

*** SESSION ID:(3.1) 2012-08-10 12:37:56.847
Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
*** 2012-08-10 12:41:45.614
Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
*** 2012-08-10 12:41:45.615
LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2
Continuing...
ORA-12535: TNS:operation timed out
*** 2012-08-10 12:45:32.514
Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
Error 12535 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
*** 2012-08-10 12:45:32.514
LGWR: Error 12535 creating archivelog file 'ora9i'
*** 2012-08-10 12:45:32.514
kcrrfail: dest:2 err:12535 force:0
ORA-12535: TNS:operation timed out
error 1157 detected in background process
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'

通过trace文件,更加清楚的说明,可能是因为lgwr异常导致dbwr访问数据文件出现问题.

问题分析/解决汇总

SQL> show parameter log_archive_dest_state_1;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_1             string      enable

SQL> show parameter log_archive_dest_state_2;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2             string      enable

SQL> show parameter log_archive_dest_1;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1                   string      location=/export/home/oracle/o
                                                 radata/abcd/archive

SQL> show parameter log_archive_dest_2;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string      SERVICE=ora9i LGWR SYNC AFFIRM

SQL>select  protection_mode,database_role from v$database;

PROTECTION_MODE      DATABASE_ROLE
-------------------- ----------------
MAXIMUM AVAILABILITY PRIMARY

SQL>   show parameter succ;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_min_succeed_dest         integer     1

通过上面的sql查询结果和alert日志与trace文件结合分析,大体结论是:
数据库的dg的保护模式为:MAXIMUM AVAILABILITY
数据库的日志传输方式是:LGWR SYNC AFFIRM
现在因为备机下架,主库LGWR不能通过tns访问备库,从而导致数据库的dbwr进程访问异常,是的数据库不能正常被open
这里的故障有一些巧合:MAXIMUM AVAILABILITY+LGWR SYNC AFFIRM+9.2.0.1+SunOS
我不清楚是不是ORACLE的bug导致,但是这个问题解决起来比较简单,只需要修改log_archive_dest_state_2=defer,使得log_archive_dest_2参数不生效,让lgwr不再访问备机

此条目发表在 Data Guard 分类目录。将固定链接加入收藏夹。

DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动》有 2 条评论

  1. 江帆 说:

    貌似LGWR SYNC 实现的就是最大保护模式 虽然模式为最大可用性 但是同步方式中规定了必须同步等到对端机的响应才能执行下一步操作。所以其实你更改下sync的模式 应该就没问题了

  2. jyc 说:

    SYNC改成异步 应该也不至于当掉。