分类目录归档:Oracle ASM

ORA-15096: lost disk write detected

又一例由于存储掉电导致asm磁盘组,由于ORA-15096: lost disk write detected,导致无法mount的恢复请求

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */
NOTE: cache registered group DATA number=2 incarn=0x73886b6a
NOTE: cache began mount (first) of group DATA number=2 incarn=0x73886b6a
NOTE: Assigning number (2,2) to disk (/dev/asm-data3)
NOTE: Assigning number (2,1) to disk (/dev/asm-data2)
NOTE: Assigning number (2,0) to disk (/dev/asm-data1)
Fri Nov 06 19:06:56 2020
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 94 for pid 30, osid 11596
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asm-data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asm-data2
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asm-data3
NOTE: cache mounting (first) external redundancy group 2/0x73886B6A (DATA)
Fri Nov 06 19:06:57 2020
* allocate domain 2, invalid = TRUE
kjbdomatt send to inst 2
Fri Nov 06 19:06:57 2020
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=25.7986 group=2 (DATA)
NOTE: starting recovery of thread=2 ckpt=33.364 group=2 (DATA)
NOTE: BWR validation signaled ORA-15096
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11596.trc:
ORA-15096: lost disk write detected
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 2/0x73886B6A (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11596, image: oracle@db1 (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
kjbdomdet send to inst 2
detach from dom 2, sending detach message to inst 2
freeing rdom 2
NOTE: detached from domain 2
NOTE: cache dismounted group 2/0x73886B6A (DATA)
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x73886b6a
NOTE: cache deleting context for group DATA 2/0x73886b6a
GMON dismounting group 2 at 95 for pid 30, osid 11596
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15096: lost disk write detected
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */

通过判断,通过一系列处理之后,数据库进行了mount操作发现报错ORA-600 2130

Fri Nov 06 17:03:27 2020
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 40 slaves
Fri Nov 06 17:03:29 2020
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_pr00_7393.trc  (incident=195869):
ORA-00600: internal error code, arguments: [2130], [2], [1], [2], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_195869/ynhis1_pr00_7393_i195869.trc
Fri Nov 06 17:03:30 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

判断redo异常,通过resetlogs打开库,发现报错ORA-00600 2662

Fri Nov 06 18:21:32 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 8670753264
Resetting resetlogs activation ID 306909514 (0x124b114a)
Redo thread 2 enabled by open resetlogs or standby activation
Fri Nov 06 18:21:39 2020
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 8670753267, threshold SCN value is 0
Fri Nov 06 18:21:39 2020
Assigning activation ID 408224320 (0x18550240)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /orabak/data/group_1.289.954514319
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 06 18:21:40 2020
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc  (incident=231847):
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365], [4194545], [], [], [], [], [],[]
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_231847/ynhis1_ora_24310_i231847.trc
Fri Nov 06 18:21:42 2020
Dumping diagnostic data in directory=[cdmp_20201106182142],requested by(instance=1,osid=24310),summary=[incident=231847]
Fri Nov 06 18:21:43 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Error 704 happened during db open, shutting down database
USER (ospid: 24310): terminating the instance due to error 704
Instance terminated by USER, pid = 24310
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (24310) as a result of ORA-1092

处理该错误之后,数据库resetlog之后,数据库open成功但是报错ORA-00600 4137

Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc  (incident=255799):
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_255799/ynhis1_smon_26195_i255799.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Fri Nov 06 18:30:46 2020
replication_dependency_tracking turned off (no async multimaster replication found)
ORACLE Instance ynhis1 (pid = 23) - Error 600 encountered while recovering transaction (25, 33).
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc:
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库可以正常启动关闭,然后安排数据导出导入新库操作,恢复完成.

发表在 Oracle ASM | 标签为 , , , | 评论关闭

win asm disk header 异常恢复

有朋友反馈win环境下rac异常,asm无法正常mount,检查日志发现

Fri Jul 03 03:55:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15081: failed to submit an I/O operation to a disk
Fri Jul 03 03:59:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15081: failed to submit an I/O operation to a disk

报错信息比较明显是由于无法找到\\.\ORCLDISKDATA1磁盘,因此异常,通过asmtool查看磁盘信息

C:\app\11.2.0\grid>asmtool -list
NTFS                             \Device\Harddisk0\Partition3            81920M
NTFS                             \Device\Harddisk0\Partition4           200000M
NTFS                             \Device\Harddisk0\Partition5          4293849M
                                 \Device\Harddisk1\Partition2             4062M
                                 \Device\Harddisk2\Partition2          2097022M
ORCLDISKFRA0                     \Device\Harddisk3\Partition2           511870M

明显的发现ORCLDISKDATA1磁盘丢失,通过对磁盘dd到本地然后进行分析发现,asm disk header损坏

C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006B38C00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]


C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd blkn=2
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2349305287 ; 0x00c: 0x8c078dc7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                         0 ; 0x000: 0x00000000
kfdatb.shrink:                      448 ; 0x004: 0x01c0
kfdatb.ub2pad:                        0 ; 0x006: 0x0000
kfdatb.auinfo[0].link.next:           8 ; 0x008: 0x0008
kfdatb.auinfo[0].link.prev:           8 ; 0x00a: 0x0008
kfdatb.auinfo[1].link.next:          12 ; 0x00c: 0x000c
kfdatb.auinfo[1].link.prev:          12 ; 0x00e: 0x000c
kfdatb.auinfo[2].link.next:         456 ; 0x010: 0x01c8
kfdatb.auinfo[2].link.prev:         456 ; 0x012: 0x01c8
kfdatb.auinfo[3].link.next:         488 ; 0x014: 0x01e8
kfdatb.auinfo[3].link.prev:         488 ; 0x016: 0x01e8
kfdatb.auinfo[4].link.next:          24 ; 0x018: 0x0018
kfdatb.auinfo[4].link.prev:          24 ; 0x01a: 0x0018
kfdatb.auinfo[5].link.next:          28 ; 0x01c: 0x001c
kfdatb.auinfo[5].link.prev:          28 ; 0x01e: 0x001c
kfdatb.auinfo[6].link.next:         552 ; 0x020: 0x0228
kfdatb.auinfo[6].link.prev:        3112 ; 0x022: 0x0c28
kfdatb.spare:                         0 ; 0x024: 0x00000000
kfdate[0].discriminator:              1 ; 0x028: 0x00000001
kfdate[0].allo.lo:                    0 ; 0x028: XNUM=0x0
kfdate[0].allo.hi:              8388608 ; 0x02c: V=1 I=0 H=0 FNUM=0x0
kfdate[1].discriminator:              1 ; 0x030: 0x00000001
kfdate[1].allo.lo:                    0 ; 0x030: XNUM=0x0
kfdate[1].allo.hi:              8388608 ; 0x034: V=1 I=0 H=0 FNUM=0x0
kfdate[2].discriminator:              1 ; 0x038: 0x00000001
kfdate[2].allo.lo:                    0 ; 0x038: XNUM=0x0
kfdate[2].allo.hi:              8388609 ; 0x03c: V=1 I=0 H=0 FNUM=0x1

fra磁盘虽然磁盘asm label信息存在,但是其他信息依旧损坏,但是也只是磁盘头信息损坏
20200705203336


通过现场分析,基本上可以确定是由于某种原因导致win asm 的磁盘的所有磁盘头都损坏(两个磁盘头被置空,另外一个磁盘头基本上损坏),基于原因未知
基于客户现场的情况,以及他们有前一天的rman备份,而且客户有保障现场(进一步故障原因分析)的需求,未在现场环境进行恢复,而是在不对现场环境做任何修改的情况下,直接恢复fra里面的redo和归档日志,进而结合备份异地实现数据库恢复,实现数据0丢失,又不破坏现场的效果
20200705203742

以前遇到过类似我其他操作系统平台中asm disk header异常的case:
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
asm磁盘头全部损坏数据0丢失恢复
分区无法识别导致asm diskgroup无法mount
asm disk误设置pvid导致asm diskgroup无法mount恢复

发表在 Oracle ASM, Oracle备份恢复 | 标签为 , , | 评论关闭

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

客户对asm进行扩容,由于配置不恰当,在使用asmca增加asm disk的时候直接选中了已经被用作文件系统的vg中的磁盘

Tue Nov 19 09:48:48 2019
Non critical error ORA-48180 cFri Nov 22 12:47:48 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (4,15) to disk (/dev/rhdisk29)
NOTE: Assigning number (4,16) to disk (/dev/rhdisk30)
NOTE: Assigning number (4,17) to disk (/dev/rhdisk31)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0015
NOTE: initializing header on grp 4 disk XIFENFEI_0016
NOTE: initializing header on grp 4 disk XIFENFEI_0017
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 12:47:51 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:47:59 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:47:59 2019
GMON updating group 4 at 12 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 13 for pid 18, osid 39912680
Fri Nov 22 12:48:01 2019
NOTE: cache opening disk 15 of grp 4: XIFENFEI_0015 path:/dev/rhdisk29
NOTE: cache opening disk 16 of grp 4: XIFENFEI_0016 path:/dev/rhdisk30
NOTE: cache opening disk 17 of grp 4: XIFENFEI_0017 path:/dev/rhdisk31
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 14 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */

发现增加错磁盘之后,从vg里面强制踢掉被asm使用的磁盘,并且尝试在asm中删除这些磁盘,并加入新磁盘

Fri Nov 22 12:52:03 2019
SQL> ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:52:03 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: requesting all-instance membership refresh for group=4
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
Fri Nov 22 12:52:12 2019
GMON querying group 4 at 15 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0
…………
Fri Nov 22 12:58:26 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:58:26 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,18) to disk (/dev/rhdisk7)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0018
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:58:41 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:58:41 2019
GMON updating group 4 at 16 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
Fri Nov 22 12:58:41 2019
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 17 for pid 18, osid 39912680
NOTE: cache opening disk 18 of grp 4: XIFENFEI_0018 path:/dev/rhdisk7
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 18 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */
Starting background process ARB0
Fri Nov 22 12:58:46 2019
ARB0 started with pid=44, OS id=54460432 
…………
Fri Nov 22 12:59:57 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:59:57 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,19) to disk (/dev/rhdisk10)
NOTE: Assigning number (4,20) to disk (/dev/rhdisk11)
NOTE: Assigning number (4,21) to disk (/dev/rhdisk8)
NOTE: Assigning number (4,22) to disk (/dev/rhdisk9)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0019
NOTE: initializing header on grp 4 disk XIFENFEI_0020
NOTE: initializing header on grp 4 disk XIFENFEI_0021
NOTE: initializing header on grp 4 disk XIFENFEI_0022
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 13:00:08 2019
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 13:00:08 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: initiating PST update: grp = 4
Fri Nov 22 13:00:13 2019
GMON updating group 4 at 19 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 20 for pid 18, osid 39912680
NOTE: cache opening disk 19 of grp 4: XIFENFEI_0019 path:/dev/rhdisk10
NOTE: cache opening disk 20 of grp 4: XIFENFEI_0020 path:/dev/rhdisk11
NOTE: cache opening disk 21 of grp 4: XIFENFEI_0021 path:/dev/rhdisk8
NOTE: cache opening disk 22 of grp 4: XIFENFEI_0022 path:/dev/rhdisk9
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 21 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0

asm在做着reblance的过程中遭遇到坏块,直接导致磁盘组dismount

Sun Nov 24 04:42:27 2019
NOTE: group 4 PST updated.
WARNING: cache read  a corrupt block: group=4(XIFENFEI) dsk=15 blk=258 disk=15
 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=254
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: a corrupted block from group XIFENFEI was dumped to
   /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc
WARNING: cache read (retry) a corrupt block: group=4(XIFENFEI) 
 dsk=15 blk=258 disk=15 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ERROR: cache failed to read group=4(XIFENFEI) dsk=15 blk=258 from disk(s): 15(XIFENFEI_0015)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: cache initiating offline of disk 15 group XIFENFEI
NOTE: process _x000_+asm2 (28639240) initiating offline of disk 
    15.1717056824 (XIFENFEI_0015) with mask 0x7e in group 4
NOTE: initiating PST update: grp = 4, dsk = 15/0x66583538, mask = 0x6a, op = clear
GMON updating disk modes for group 4 at 23 for pid 28, osid 28639240
ERROR: Disk 15 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 4)
Sun Nov 24 04:42:27 2019
NOTE: cache dismounting (not clean) group 4/0x0B08C40B (XIFENFEI) 
WARNING: Offline for disk XIFENFEI_0015 in mode 0x7f failed.
Sun Nov 24 04:42:27 2019
NOTE: halting all I/Os to diskgroup 4 (XIFENFEI)
NOTE: messaging CKPT to quiesce pins Unix process pid: 59441780, image: oracle@xifenfei2 (B000)
Sun Nov 24 04:42:27 2019
ERROR: ORA-15130 thrown in ARB0 for group number 4
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_arb0_50856926.trc:
ORA-15130: diskgroup "XIFENFEI" is being dismounted

至此两个节点的该磁盘组就陷入了不停的mount,然后dismount的轮流循环中.这里我们可以大概的分析出来,由于vg的磁盘组被写入了数据或者强制剔除的时候导致asm写入该文件的数据被破坏,导致后续的asm reblance遭遇坏块,然后直接dismount.对于该问题的解决方案,通过对对该磁盘组的acd和cod进行patch,让其不进行reblance,保持该磁盘组现在,稳定的mount状态,然后对其数据进行备份和重建该磁盘组.这个客户运气不错,vg中的asm disk磁盘写入较少,数据库运行正常.
对于这种情况,如果发生极端损坏,比如asm磁盘组无法mount,可以参考:找回ASM中数据文件
如果是asm的元数据大量损坏,无法通过asm字典级别恢复,可以通过参考:asm disk header 彻底损坏恢复

发表在 Oracle ASM | 标签为 , , , , | 评论关闭