分类目录归档:Oracle ASM

ORA-00600 kfrHtAdd01

由于存储掉电,报ORA-15096: lost disk write detected错误,无法mount磁盘组.

Sun Dec 20 16:56:51 2020
SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=1 incarn=0x0c1a7a4e
NOTE: cache began mount (first) of group DATA number=1 incarn=0x0c1a7a4e
NOTE: Assigning number (1,2) to disk (/dev/mapper/multipath12)
NOTE: Assigning number (1,5) to disk (/dev/mapper/multipath15)
NOTE: Assigning number (1,3) to disk (/dev/mapper/multipath13)
NOTE: Assigning number (1,7) to disk (/dev/mapper/multipath17)
NOTE: Assigning number (1,1) to disk (/dev/mapper/multipath11)
NOTE: Assigning number (1,6) to disk (/dev/mapper/multipath16)
NOTE: Assigning number (1,0) to disk (/dev/mapper/multipath10)
NOTE: Assigning number (1,4) to disk (/dev/mapper/multipath14)
Sun Dec 20 16:56:57 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 19 for pid 32, osid 130347
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x0C1A7A4E (DATA)
Sun Dec 20 16:56:57 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 16:56:58 2020
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply)
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_130347.trc:
ORA-15096: lost disk write detected
Abort recovery for domain 1
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x0C1A7A4E (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 130347, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount

通过一系列修复之后报错如下

Sun Dec 20 20:12:35 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 23 for pid 26, osid 67538
Sun Dec 20 20:12:35 2020
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x64848829 (DATA)
Sun Dec 20 20:12:36 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 20:12:36 2020
NOTE: attached to recovery domain 1
NOTE: Fallback recovery: thread 2 read 10751 blocks oldest redo found in ABA 540.6429
NOTE: Fallback recovery: thread 1 read 10751 blocks oldest redo found in ABA 232.4218
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc  (incident=1692689):
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
Incident details in: /grid/app/grid/diag/asm/+asm/+ASM1/incident/incdir_1692689/+ASM1_ora_67538_i1692689.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Dec 20 20:12:39 2020
Sweep [inc][1692689]: completed
Sweep [inc2][1692689]: completed
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc:
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x64848829 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 67538, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x64848829 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x64848829
NOTE: cache deleting context for group DATA 1/0x64848829
GMON dismounting group 1 at 24 for pid 26, osid 67538
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0],
 [38660545], [0], [38687990], [1], [2], [6429], [], []
ERROR: alter diskgroup data mount

分析trace文件

*** 2020-12-20 20:11:54.956
kfdp_query(DATA): 19 
----- Abridged Call Stack Trace -----
ksedsts()+465<-kfdp_query()+530<-kfdPstSyncPriv()+585<-kfgFinalizeMount()+1630<-kfgscFinalize()+1433<
-kfgForEachKfgsc()+285<-kfgsoFinalize()+135<-kfgFinalize()+398<-kfxdrvMount()+5558<-kfxdrvEntry()
+2207<-opiexe()+20624<-opiosq0()+3932<-kpooprx()+274<-kpoal8()+842<-opiodr()+917<-ttcpip()
+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()
+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253 
----- End of Abridged Call Stack Trace -----
2020-12-20 20:11:55.393106 : Start recovery for domain=1, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply):
last written kfcn: 0.38747593 aba=233.4208 thd=1
kfcn_kfrbcd=0.38747593 flags_kfrbcd=0x001c aba=542.6410 thd=2
CE: (0x0x66edc798) group=1 (DATA) fn=4 blk=1
    hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
    mirror=0
    flags_kfcpba=0x38 copies=3 blockIndex=1 AUindex=0 AUcount=0 loctr fcn=0.0
    copy #0:  disk=6  au=35 flags=01
    copy #1:  disk=0  au=34 flags=01
    copy #2:  disk=4  au=52 flags=01
BH: (0x0x66e10d00) bnum=33 type=COD_RBO state=rcv chgSt=not modifying pageIn=rcvRead
    flags=0x00000000 pinmode=excl lockmode=null bf=0x66020000
    kfbh_kfcbh.fcn_kfbh = 0.38747538 lowAba=0.0 highAba=0.0
    modTime=0
    last kfcbInitSlot return code=null chgCount=0 cpkt lnk is null ralFlags=0x00000000
    PINS:
    (kfcbps) pin=91 get by kfr.c line 7879 mode=excl
             fn=4 blk=1 status=pinned
             flags=0x88000000 flags2=0x00000000
             class=0 type=INVALID stateWanted=rcvRead
             bastCount=1 waitStatus=0x00000000 relocCount=0
             scanBastCount=0 scanBxid=0 scanSkipCode=0
             last released by kfc.c 21183
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
last new 0.0
kfrPass2: dump of current log buffer for error 15096 follows
=======================
OSM metadata block dump:
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            8 ; 0x002: KFBTYP_CHNGDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                   17162 ; 0x004: blk=17162
kfbh.block.obj:                       3 ; 0x008: file=3
kfbh.check:                  4226524538 ; 0x00c: 0xfbeba57a
kfbh.fcn.base:                 38747431 ; 0x010: 0x024f3d27
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfracdb.aba.seq:                    542 ; 0x000: 0x0000021e
kfracdb.aba.blk:                   6409 ; 0x004: 0x00001909
kfracdb.ents:                         1 ; 0x008: 0x0001
kfracdb.ub2spare:                     0 ; 0x00a: 0x0000
kfracdb.lge[0].valid:                 1 ; 0x00c: V=1 B=0 M=0
kfracdb.lge[0].chgCount:              1 ; 0x00d: 0x01
kfracdb.lge[0].len:                  68 ; 0x00e: 0x0044
kfracdb.lge[0].kfcn.base:      38747432 ; 0x010: 0x024f3d28
kfracdb.lge[0].kfcn.wrap:             0 ; 0x014: 0x00000000
kfracdb.lge[0].bcd[0].kfbl.blk:    1292 ; 0x018: blk=1292
kfracdb.lge[0].bcd[0].kfbl.obj:       1 ; 0x01c: file=1
kfracdb.lge[0].bcd[0].kfcn.base:38743102 ; 0x020: 0x024f2c3e
kfracdb.lge[0].bcd[0].kfcn.wrap:      0 ; 0x024: 0x00000000
kfracdb.lge[0].bcd[0].oplen:          8 ; 0x028: 0x0008
kfracdb.lge[0].bcd[0].blkIndex:      12 ; 0x02a: 0x000c
kfracdb.lge[0].bcd[0].flags:         28 ; 0x02c: F=0 N=0 F=1 L=1 V=1 A=0 C=0
kfracdb.lge[0].bcd[0].opcode:       135 ; 0x02e: 0x0087
kfracdb.lge[0].bcd[0].kfbtyp:         4 ; 0x030: KFBTYP_FILEDIR
kfracdb.lge[0].bcd[0].redund:        19 ; 0x031: SCHE=0x1 NUMB=0x3
kfracdb.lge[0].bcd[0].pad:        63903 ; 0x032: 0xf99f
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.hi:33108586 ; 0x034: HOUR=0xa DAYS=0x13 MNTH=0xc YEAR=0x7e4
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.lo:0 ; 0x038: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfracdb.lge[0].bcd[0].au[0]:     292415 ; 0x03c: 0x0004763f
kfracdb.lge[0].bcd[0].au[1]:     292452 ; 0x040: 0x00047664
kfracdb.lge[0].bcd[0].au[2]:     292474 ; 0x044: 0x0004767a
kfracdb.lge[0].bcd[0].disks[0]:       2 ; 0x048: 0x0002
kfracdb.lge[0].bcd[0].disks[1]:       1 ; 0x04a: 0x0001
kfracdb.lge[0].bcd[0].disks[2]:       0 ; 0x04c: 0x0000

彻底屏蔽asm的实例恢复,mount磁盘组,尝试启动库进行数据库恢复.如果如果此类asm无法mount问题,无法自行解决请联系我们
电话/微信:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 Oracle ASM, Oracle备份恢复 | 标签为 , , | 评论关闭

asm磁盘类似_DROPPED_0001_DATA名称故障处理

发现一客户数据库的asm磁盘组中有磁盘掉线(通过分析日志确认2016年就已经掉线,而且不在做rebalance)
20201205195855


20201205221937

进一步检查

SQL> /

NAME			       PATH		  GROUP_NUMBER DISK_NUMBER MOUNT_STATUS   HEADER_STATUS
------------------------------ --------------------- ------------ ----------- -------------- ------------------------
MODE_STATUS    STATE		FAILGROUP
-------------- ---------------- --------------------
			       ORCL:DATA2	  0		 0 CLOSED	  MEMBER
ONLINE	       NORMAL

			       ORCL:FLASH1	  0		 1 CLOSED	  MEMBER
ONLINE	       NORMAL

			       ORCL:GRID3	  0		 2 CLOSED	  MEMBER
ONLINE	       NORMAL

_DROPPED_0000_FLASH				  2		 0 MISSING	  UNKNOWN
OFFLINE        FORCING		FLASH1

_DROPPED_0001_DATA				  1		 1 MISSING	  UNKNOWN
OFFLINE        FORCING		DATA2

DATA1			       ORCL:DATA1	  1		 0 CACHED	  MEMBER
ONLINE	       NORMAL		DATA1

FLASH2			       ORCL:FLASH2	  2		 1 CACHED	  MEMBER
ONLINE	       NORMAL		FLASH2

GRID1			       ORCL:GRID1	  3		 0 CACHED	  MEMBER
ONLINE	       NORMAL		GRID1

GRID2			       ORCL:GRID2	  3		 1 CACHED	  MEMBER
ONLINE	       NORMAL		GRID2

GRID4			       ORCL:GRID4	  3		 3 CACHED	  MEMBER
ONLINE	       NORMAL		GRID4

GRID5			       ORCL:GRID5	  3		 4 CACHED	  MEMBER
ONLINE	       NORMAL		GRID5

GRID6			       ORCL:GRID6	  3		 5 CACHED	  MEMBER
ONLINE	       NORMAL		GRID6


12 rows selected.


SQL> select NAME,STATE,TYPE,OFFLINE_DISKS from v$asm_diskgroup;

NAME
------------------------------------------------------------
STATE		       TYPE	    OFFLINE_DISKS
---------------------- ------------ -------------
DATA
MOUNTED 	       NORMAL			1

FLASH
MOUNTED 	       NORMAL			1

GRID
MOUNTED 	       NORMAL			0

主要问题是由于ORCL:FLASH1和ORCL:DATA2磁盘掉线导致处于_DROPPED_0000_FLASH和_DROPPED_0001_DATA状态.底层检查,确定现在这些磁盘都正常.然后使用force命令进行强制增加掉线的磁盘到对应的磁盘组中

SQL> alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force;

Diskgroup altered.

SQL> alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force;

Diskgroup altered.

观察asm 日志,等rebalance完成

Sat Dec 05 16:48:10 2020
SQL> alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (2,2) to disk (ORCL:FLASH1)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk FLASH1
NOTE: requesting all-instance disk validation for group=2
Sat Dec 05 16:48:13 2020
NOTE: skipping rediscovery for group 2/0x58e713e7 (FLASH) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0x58e713e7 (FLASH) on local instance.
Sat Dec 05 16:48:19 2020
GMON updating for reconfiguration, group 2 at 14 for pid 34, osid 12203
NOTE: group 2 PST updated.
NOTE: initiating PST update: grp = 2
GMON updating group 2 at 15 for pid 34, osid 12203
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 2 completed successfully 
NOTE: membership refresh pending for group 2/0x58e713e7 (FLASH)
GMON querying group 2 at 16 for pid 18, osid 41180
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: cache opening disk 2 of grp 2: FLASH1 label:FLASH1
NOTE: Attempting voting file refresh on diskgroup FLASH
NOTE: Refresh completed on diskgroup FLASH. No voting file found.
GMON querying group 2 at 17 for pid 18, osid 41180
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
Sat Dec 05 16:48:25 2020
SUCCESS: refreshed membership for 2/0x58e713e7 (FLASH)
Sat Dec 05 16:48:25 2020
SUCCESS: alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force
NOTE: starting rebalance of group 2/0x58e713e7 (FLASH) at power 1
Starting background process ARB0
Sat Dec 05 16:48:26 2020
ARB0 started with pid=36, OS id=12451 
NOTE: assigning ARB0 to group 2/0x58e713e7 (FLASH) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 2 relocating from 0:2 to 2:2 for diskgroup 2 (FLASH)
NOTE: Attempting voting file refresh on diskgroup FLASH
NOTE: Refresh completed on diskgroup FLASH. No voting file found.
Sat Dec 05 16:48:45 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group FLASH
Sat Dec 05 16:49:06 2020
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x58e713e7 (FLASH)
Sat Dec 05 16:49:08 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
Sat Dec 05 16:49:11 2020
GMON updating for reconfiguration, group 2 at 18 for pid 36, osid 12681
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk _DROPPED_0000_FLASH going offline 
GMON updating for reconfiguration, group 2 at 19 for pid 36, osid 12681
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x58e713e7 (FLASH)
GMON querying group 2 at 20 for pid 18, osid 41180
GMON querying group 2 at 21 for pid 18, osid 41180
NOTE: Disk _DROPPED_0000_FLASH in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x58e713e7 (FLASH)
Sat Dec 05 16:51:56 2020
SQL> alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (ORCL:DATA2)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DATA2
NOTE: requesting all-instance disk validation for group=1
Sat Dec 05 16:51:57 2020
NOTE: skipping rediscovery for group 1/0x58d713e6 (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0x58d713e6 (DATA) on local instance.
Sat Dec 05 16:52:02 2020
GMON updating for reconfiguration, group 1 at 22 for pid 34, osid 12203
NOTE: group 1 PST updated.
NOTE: initiating PST update: grp = 1
GMON updating group 1 at 23 for pid 34, osid 12203
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully 
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 24 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: cache opening disk 2 of grp 1: DATA2 label:DATA2
Sat Dec 05 16:52:08 2020
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
GMON querying group 1 at 25 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
Sat Dec 05 16:52:08 2020
SUCCESS: alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force
NOTE: starting rebalance of group 1/0x58d713e6 (DATA) at power 1
Starting background process ARB0
Sat Dec 05 16:52:08 2020
ARB0 started with pid=37, OS id=13463 
NOTE: assigning ARB0 to group 1/0x58d713e6 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 16:52:44 2020
cellip.ora not found.
NOTE: F1X0 copy 2 relocating from 1:2 to 2:2 for diskgroup 1 (DATA)
Sat Dec 05 16:53:22 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 27 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
SUCCESS: alter diskgroup data rebalance power 11
NOTE: starting rebalance of group 1/0x58d713e6 (DATA) at power 11
Starting background process ARB0
Sat Dec 05 17:27:52 2020
ARB0 started with pid=35, OS id=23318 
NOTE: assigning ARB0 to group 1/0x58d713e6 (DATA) with 11 parallel I/Os
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 17:28:29 2020
cellip.ora not found.
Sat Dec 05 17:28:45 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
Sat Dec 05 18:48:10 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Sat Dec 05 18:48:32 2020
GMON updating for reconfiguration, group 1 at 28 for pid 36, osid 47454
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
Sat Dec 05 18:48:32 2020
NOTE: group 1 PST updated.
SUCCESS: grp 1 disk _DROPPED_0001_DATA going offline 
GMON updating for reconfiguration, group 1 at 29 for pid 36, osid 47454
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: group 1 PST updated.
Sat Dec 05 18:48:32 2020
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 30 for pid 18, osid 41180
GMON querying group 1 at 31 for pid 18, osid 41180
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 18:52:24 2020
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x58d713e6 (DATA)

查询磁盘状态,掉线磁盘已经被加入,asm磁盘组恢复正常
20201205201841


20201205201851
总结:对于normal磁盘组由于某种原因磁盘从磁盘组中掉,v$asm_disk.name类似_DROPPED_0001_DATA,v$asm_disk.state为FORCING,可以通过类似alter diskgroup data add failgroup dg2 disk ‘ORCL:DATA2′ force;方式强制增加掉线的磁盘进入磁盘组,然后待rebalance完成,问题修复

发表在 Oracle ASM | 标签为 | 评论关闭

ORA-15096: lost disk write detected

又一例由于存储掉电导致asm磁盘组,由于ORA-15096: lost disk write detected,导致无法mount的恢复请求

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */
NOTE: cache registered group DATA number=2 incarn=0x73886b6a
NOTE: cache began mount (first) of group DATA number=2 incarn=0x73886b6a
NOTE: Assigning number (2,2) to disk (/dev/asm-data3)
NOTE: Assigning number (2,1) to disk (/dev/asm-data2)
NOTE: Assigning number (2,0) to disk (/dev/asm-data1)
Fri Nov 06 19:06:56 2020
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 94 for pid 30, osid 11596
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asm-data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asm-data2
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asm-data3
NOTE: cache mounting (first) external redundancy group 2/0x73886B6A (DATA)
Fri Nov 06 19:06:57 2020
* allocate domain 2, invalid = TRUE
kjbdomatt send to inst 2
Fri Nov 06 19:06:57 2020
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=25.7986 group=2 (DATA)
NOTE: starting recovery of thread=2 ckpt=33.364 group=2 (DATA)
NOTE: BWR validation signaled ORA-15096
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11596.trc:
ORA-15096: lost disk write detected
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 2/0x73886B6A (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11596, image: oracle@db1 (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
kjbdomdet send to inst 2
detach from dom 2, sending detach message to inst 2
freeing rdom 2
NOTE: detached from domain 2
NOTE: cache dismounted group 2/0x73886B6A (DATA)
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x73886b6a
NOTE: cache deleting context for group DATA 2/0x73886b6a
GMON dismounting group 2 at 95 for pid 30, osid 11596
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15096: lost disk write detected
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */

通过判断,通过一系列处理之后,数据库进行了mount操作发现报错ORA-600 2130

Fri Nov 06 17:03:27 2020
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 40 slaves
Fri Nov 06 17:03:29 2020
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_pr00_7393.trc  (incident=195869):
ORA-00600: internal error code, arguments: [2130], [2], [1], [2], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_195869/ynhis1_pr00_7393_i195869.trc
Fri Nov 06 17:03:30 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

判断redo异常,通过resetlogs打开库,发现报错ORA-00600 2662

Fri Nov 06 18:21:32 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 8670753264
Resetting resetlogs activation ID 306909514 (0x124b114a)
Redo thread 2 enabled by open resetlogs or standby activation
Fri Nov 06 18:21:39 2020
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 8670753267, threshold SCN value is 0
Fri Nov 06 18:21:39 2020
Assigning activation ID 408224320 (0x18550240)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /orabak/data/group_1.289.954514319
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 06 18:21:40 2020
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc  (incident=231847):
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365], [4194545], [], [], [], [], [],[]
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_231847/ynhis1_ora_24310_i231847.trc
Fri Nov 06 18:21:42 2020
Dumping diagnostic data in directory=[cdmp_20201106182142],requested by(instance=1,osid=24310),summary=[incident=231847]
Fri Nov 06 18:21:43 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Error 704 happened during db open, shutting down database
USER (ospid: 24310): terminating the instance due to error 704
Instance terminated by USER, pid = 24310
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (24310) as a result of ORA-1092

处理该错误之后,数据库resetlog之后,数据库open成功但是报错ORA-00600 4137

Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc  (incident=255799):
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_255799/ynhis1_smon_26195_i255799.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Fri Nov 06 18:30:46 2020
replication_dependency_tracking turned off (no async multimaster replication found)
ORACLE Instance ynhis1 (pid = 23) - Error 600 encountered while recovering transaction (25, 33).
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc:
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库可以正常启动关闭,然后安排数据导出导入新库操作,恢复完成.

发表在 Oracle ASM | 标签为 , , , | 评论关闭