联系:手机/微信(+86 17813235971) QQ(107644445)
标题:DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
qq群里面的一朋友的的DG因为备库已经下架,主库重启的时候不能正常启动,帮忙处理结果如下
版本相关信息
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production ORACLE_HOME = /export/home/oracle/product/9.2.0 System name: SunOS Node name: eTermSrv2 Release: 5.10 Version: Generic Machine: sun4u Instance name: abcd
数据库不能启动日志
Fri Aug 10 12:37:56 2012 ALTER DATABASE OPEN Fri Aug 10 12:37:56 2012 LGWR: Primary database is in CLUSTER CONSISTENT mode LGWR: Primary database is in MAXIMUM AVAILABILITY mode LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR LNS0 started with pid=16 Fri Aug 10 12:37:59 2012 ORA-1013 signalled during: ALTER DATABASE OPEN... Fri Aug 10 12:41:45 2012 LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2 LGWR: Continuing... Fri Aug 10 12:41:45 2012 Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc: ORA-12535: TNS:operation timed out ****************************************************************** LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 ****************************************************************** Creating archive destination LOG_ARCHIVE_DEST_2: 'ora9i' LGWR: Error 12535 creating archivelog file 'ora9i' Fri Aug 10 12:45:32 2012 Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc: ORA-12535: TNS:operation timed out LGWR: Completed archiving log 1 thread 1 sequence 6808 Thread 1 advanced to log sequence 6808 Fri Aug 10 12:45:32 2012 Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc: ORA-01157: cannot identify/lock data file 1 - see DBWR trace file ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf' LGWR: terminating instance due to error 1157 Instance terminated by LGWR, pid = 11504
通过这里大概看出数据库原dg配置是MAXIMUM AVAILABILITY,因为备机下架,导致ora9i的tns不能访问,从而出现一些列错误,其中使得lgwr异常,因为oracle的某种内部机制,导致dbwr不能访问数据文件(这里体现出来是system01.dbf不能访问,但实际上应该是所有所有数据文件均不能访问,因为system01.dbf位于第一,所以报出该错误.)
*** SESSION ID:(3.1) 2012-08-10 12:37:56.847 Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR *** 2012-08-10 12:41:45.614 Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i' *** 2012-08-10 12:41:45.615 LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2 Continuing... ORA-12535: TNS:operation timed out *** 2012-08-10 12:45:32.514 Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i' Error 12535 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i' *** 2012-08-10 12:45:32.514 LGWR: Error 12535 creating archivelog file 'ora9i' *** 2012-08-10 12:45:32.514 kcrrfail: dest:2 err:12535 force:0 ORA-12535: TNS:operation timed out error 1157 detected in background process ORA-01157: cannot identify/lock data file 1 - see DBWR trace file ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'
通过trace文件,更加清楚的说明,可能是因为lgwr异常导致dbwr访问数据文件出现问题.
问题分析/解决汇总
SQL> show parameter log_archive_dest_state_1; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_state_1 string enable SQL> show parameter log_archive_dest_state_2; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_state_2 string enable SQL> show parameter log_archive_dest_1; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_1 string location=/export/home/oracle/o radata/abcd/archive SQL> show parameter log_archive_dest_2; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_2 string SERVICE=ora9i LGWR SYNC AFFIRM SQL>select protection_mode,database_role from v$database; PROTECTION_MODE DATABASE_ROLE -------------------- ---------------- MAXIMUM AVAILABILITY PRIMARY SQL> show parameter succ; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_min_succeed_dest integer 1
通过上面的sql查询结果和alert日志与trace文件结合分析,大体结论是:
数据库的dg的保护模式为:MAXIMUM AVAILABILITY
数据库的日志传输方式是:LGWR SYNC AFFIRM
现在因为备机下架,主库LGWR不能通过tns访问备库,从而导致数据库的dbwr进程访问异常,是的数据库不能正常被open
这里的故障有一些巧合:MAXIMUM AVAILABILITY+LGWR SYNC AFFIRM+9.2.0.1+SunOS
我不清楚是不是ORACLE的bug导致,但是这个问题解决起来比较简单,只需要修改log_archive_dest_state_2=defer,使得log_archive_dest_2参数不生效,让lgwr不再访问备机
貌似LGWR SYNC 实现的就是最大保护模式 虽然模式为最大可用性 但是同步方式中规定了必须同步等到对端机的响应才能执行下一步操作。所以其实你更改下sync的模式 应该就没问题了
SYNC改成异步 应该也不至于当掉。