联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ocr磁盘组掉盘故障处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
由于某种故障导致crs的OCR_0001盘掉线,votedisk从3个变为了2个
WARNING: Write Failed. group:3 disk:1 AU:1 offset:4190208 size:4096 WARNING: Hbeat write to PST disk 1.3915948466 in group 3 failed. [4] Mon Jun 14 15:31:11 2021 NOTE: process _b000_+asm1 (21889) initiating offline of disk 1.3915948466 (OCR_0001) with mask 0x7e in group 3 NOTE: checking PST: grp = 3 GMON checking disk modes for group 3 at 14 for pid 28, osid 21889 NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: checking PST for grp 3 done. NOTE: sending set offline flag message 1047812201 to 1 disk(s) in group 3 WARNING: Disk OCR_0001 in mode 0x7f is now being offlined INFO: Instance #2 could not find disk 1 in group 3 NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x6a, op = clear GMON updating disk modes for group 3 at 15 for pid 28, osid 21889 NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: PST update grp = 3 completed successfully NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x7e, op = clear GMON updating disk modes for group 3 at 16 for pid 28, osid 21889 NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: cache closing disk 1 of grp 3: OCR_0001 NOTE: PST update grp = 3 completed successfully Mon Jun 14 15:31:13 2021 NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 3 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Successful voting file relocation on diskgroup OCR NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 2 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Successful voting file relocation on diskgroup OCR Mon Jun 14 15:34:08 2021 WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620)) SQL> alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */ NOTE: GroupBlock outside rolling migration privileged region NOTE: requesting all-instance membership refresh for group=3 Mon Jun 14 15:34:10 2021 GMON updating for reconfiguration, group 3 at 17 for pid 28, osid 21889 NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: cache closing disk 1 of grp 3: (not open) OCR_0001 NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: group 3 PST updated. Mon Jun 14 15:34:10 2021 NOTE: membership refresh pending for group 3/0x7258515c (OCR) NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 2 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Successful voting file relocation on diskgroup OCR GMON querying group 3 at 18 for pid 18, osid 8900 NOTE: group OCR: updated PST location: disk 0000 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 1) NOTE: cache closing disk 1 of grp 3: (not open) _DROPPED_0001_OCR SUCCESS: refreshed membership for 3/0x7258515c (OCR) SUCCESS: alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */
在第一次掉盘之后rebalance完成之后,又掉一块盘,ocr磁盘组正常,表决盘因为就只有一个磁盘,无法在ocr磁盘组中refresh到其他磁盘上
Tue Jun 15 04:41:42 2021 WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. Tue Jun 15 04:41:42 2021 NOTE: process _b000_+asm1 (58548) initiating offline of disk 0.3915948465 (OCR_0000) with mask 0x7e in group 3 NOTE: checking PST: grp = 3 GMON checking disk modes for group 3 at 23 for pid 28, osid 58548 NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: checking PST for grp 3 done. NOTE: sending set offline flag message 3615961191 to 1 disk(s) in group 3 WARNING: Disk OCR_0000 in mode 0x7f is now being offlined INFO: Instance #2 could not find disk 1 in group 3 NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x6a, op = clear GMON updating disk modes for group 3 at 24 for pid 28, osid 58548 NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: PST update grp = 3 completed successfully NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x7e, op = clear GMON updating disk modes for group 3 at 25 for pid 28, osid 58548 NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: cache closing disk 0 of grp 3: OCR_0000 NOTE: PST update grp = 3 completed successfully Tue Jun 15 04:41:44 2021 NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 2 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR WARNING: Waited 18 secs for write IO to PST disk 0 in group 3. WARNING: Waited 18 secs for write IO to PST disk 0 in group 3. NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR Tue Jun 15 04:44:21 2021 NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR Tue Jun 15 04:44:21 2021 WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620)) SQL> alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */ NOTE: GroupBlock outside rolling migration privileged region NOTE: requesting all-instance membership refresh for group=3 NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR Tue Jun 15 04:44:24 2021 GMON updating for reconfiguration, group 3 at 26 for pid 28, osid 58548 NOTE: cache closing disk 0 of grp 3: (not open) OCR_0000 NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: group 3 PST updated. NOTE: membership refresh pending for group 3/0x7258515c (OCR) NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR GMON querying group 3 at 27 for pid 18, osid 8900 NOTE: cache closing disk 0 of grp 3: (not open) _DROPPED_0000_OCR SUCCESS: refreshed membership for 3/0x7258515c (OCR) NOTE: starting rebalance of group 3/0x7258515c (OCR) at power 1 SUCCESS: alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */
可以明显的看到,ocr磁盘组只剩余1个disk,查询表决盘信息
node1-> crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 3619aee7c3b04fc1bfa5c4ce659acbf7 (/dev/emcpowerc) [OCR] 2. ONLINE 00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR] Located 2 voting disk(s).
可以发现表决盘中的两个disk一个属于ocr磁盘组,一个是被ocr磁盘组drop掉的磁盘,尝试增加以前离线的磁盘到ocr磁盘组
SQL> alter diskgroup OCR add disk '/dev/emcpowerc'; alter diskgroup OCR add disk '/dev/emcpowerc' * ERROR at line 1: ORA-15032: not all alterations performed ORA-15033: disk '/dev/emcpowerc' belongs to diskgroup "OCR" SQL> alter diskgroup OCR add disk '/dev/emcpowerc' force 2 ; alter diskgroup OCR add disk '/dev/emcpowerc' force * ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 15191 Session ID: 1613 Serial number: 7
查看alert日志
SQL> alter diskgroup OCR add disk '/dev/emcpowerc' force NOTE: GroupBlock outside rolling migration privileged region NOTE: Assigning number (3,4) to disk (/dev/emcpowerc) NOTE: requesting all-instance membership refresh for group=3 WARNING: ignoring disk /dev/emcpowerd in deep discovery NOTE: initializing header on grp 3 disk OCR_0004 WARNING: ignoring disk /dev/emcpowerd in deep discovery NOTE: requesting all-instance disk validation for group=3 NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance. NOTE: requesting all-instance disk validation for group=3 NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance. NOTE: Attempting voting file relocation on diskgroup OCR Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc (incident=311185): ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_311185/+ASM1_rbal_12207_i311185.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. ERROR: ORA-600 thrown in RBAL for group number 3 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc: ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], [] Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc: ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], [] RBAL (ospid: 12207): terminating the instance due to error 488
由于ORA-600 kfdvfGetCurrent_baddsk错误导致增加磁盘失败,通过上面查询的votedisk的信息,可以发现emcpowerc这个盘虽然ocr中离线,但是依旧还是votedisk盘,因此无法增加到该磁盘组中,采用变通方法,先加另外一块盘
SQL> alter diskgroup OCR add failgroup OCR_0001 disk '/dev/emcpowerd' force; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options node1-> crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR] 2. ONLINE 0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR] Located 2 voting disk(s).
增加成功emcpowerd之后,emcpowerc已经不再是表决盘,变为了emcpowerd,再次增加emcpowerc
SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options node1-> crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR] 2. ONLINE 0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR] 3. ONLINE 4f6201f808dc4ff3bf928b14eae0d4a6 (/dev/emcpowerc) [OCR] Located 3 voting disk(s). ASMCMD> lsdsk -G ocr Path /dev/emcpowerc /dev/emcpowerd /dev/emcpowere
SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force NOTE: GroupBlock outside rolling migration privileged region NOTE: Assigning number (3,0) to disk (/dev/emcpowerc) NOTE: requesting all-instance membership refresh for group=3 NOTE: initializing header on grp 3 disk OCR_0000 NOTE: requesting all-instance disk validation for group=3 Mon Jan 24 17:47:42 2022 NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance. NOTE: requesting all-instance disk validation for group=3 NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance. Mon Jan 24 17:47:48 2022 GMON updating for reconfiguration, group 3 at 20 for pid 30, osid 16978 NOTE: group 3 PST updated. NOTE: initiating PST update: grp = 3 GMON updating group 3 at 21 for pid 30, osid 16978 NOTE: group OCR: updated PST location: disk 0002 (PST copy 0) NOTE: group OCR: updated PST location: disk 0005 (PST copy 1) NOTE: group OCR: updated PST location: disk 0000 (PST copy 2) NOTE: PST update grp = 3 completed successfully NOTE: membership refresh pending for group 3/0x725dccb9 (OCR) NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 2 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR GMON querying group 3 at 22 for pid 18, osid 15952 NOTE: cache opening disk 0 of grp 3: OCR_0000 path:/dev/emcpowerc Mon Jan 24 17:47:53 2022 NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 2 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: Failed voting file relocation on diskgroup OCR GMON querying group 3 at 23 for pid 18, osid 15952 SUCCESS: refreshed membership for 3/0x725dccb9 (OCR) Mon Jan 24 17:47:53 2022 SUCCESS: alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force NOTE: starting rebalance of group 3/0x725dccb9 (OCR) at power 1 Starting background process ARB0 Mon Jan 24 17:47:53 2022 ARB0 started with pid=31, OS id=17092 NOTE: assigning ARB0 to group 3/0x725dccb9 (OCR) with 1 parallel I/O cellip.ora not found. NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 0:2 for diskgroup 3 (OCR) NOTE: stopping process ARB0 SUCCESS: rebalance completed for group 3/0x725dccb9 (OCR) NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 2 voting file(s). NOTE: Voting file relocation is required in diskgroup OCR NOTE: Attempting voting file relocation on diskgroup OCR NOTE: voting file allocation on grp 3 disk OCR_0000 NOTE: Successful voting file relocation on diskgroup OCR Mon Jan 24 17:47:57 2022 NOTE: GroupBlock outside rolling migration privileged region NOTE: requesting all-instance membership refresh for group=3 NOTE: membership refresh pending for group 3/0x725dccb9 (OCR) Mon Jan 24 17:48:03 2022 GMON querying group 3 at 24 for pid 18, osid 15952 SUCCESS: refreshed membership for 3/0x725dccb9 (OCR) Mon Jan 24 17:48:06 2022 NOTE: Attempting voting file refresh on diskgroup OCR NOTE: Refresh completed on diskgroup OCR . Found 3 voting file(s).
表决磁盘组从2个变为了3个,ocr磁盘组也恢复了正常的3个,至此OCR掉盘的故障处理完成