联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
在oracle asm的使用过程中由于操作系统层面的错误操作导致asm disk 被破坏,这里列举了几种破坏之后的kfed报错现象(KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type])
asm mount 磁盘组报错(ORA-15040 ORA-15042)
SQL> alter diskgroup DATA mount; alter diskgroup DATA mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "2" is missing from group number "2"
asm alert日志报错(ORA-15335 ORA-15066 ORA-15196等)
ORA-15335: ASM metadata corruption detected in disk group 'DATA' ORA-15130: diskgroup "DATA" is being dismounted ORA-15066: offlining disk "DATA_0002" in group "DATA" may result in a data loss ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483651] [48] [0 != 1]
kfed查看磁盘头报错
文件文件头(不光是disk header的4k,可能是连续的几个au,甚至更多)可能彻底损坏,一般kfed 读取都会看到KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]之类错误
[oracle@fcomtaep2 disks]$ kfed read ASMRECO03 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 7FC18D899400 00000000 00000000 00000000 00000000 [................] Repeat 27 times 7FC18D8995C0 FEEE0001 0001FFFF FFFF0000 00000FFF [................] 7FC18D8995D0 00000000 00000000 00000000 00000000 [................] Repeat 1 times 7FC18D8995F0 00000000 00000000 00000000 AA550000 [..............U.] 7FC18D899600 20494645 54524150 00010000 0000005C [EFI PART....\...] <==== **** Here ****** 7FC18D899610 BD82BBB3 00000000 00000001 00000000 [................] 7FC18D899620 0FFFFFFF 00000000 00000022 00000000 [........".......] 7FC18D899630 0FFFFFDE 00000000 FD8857E5 42D7B49B [.........W.....B] 7FC18D899640 0901FA87 6B3DB5AA 00000002 00000000 [......=k........] 7FC18D899650 00000080 00000080 FE48EB77 00000000 [........w.H.....] 7FC18D899660 00000000 00000000 00000000 00000000 [................] Repeat 25 times 7FC18D899800 EBD0A0A2 4433B9E5 B668C087 C79926B7 [......3D..h..&..] 7FC18D899810 5381F6DF 4626F988 0E4F468D D78D3B28 [...S..&F.FO.(;..] 7FC18D899820 000007A1 00000000 0FFFF85F 00000000 [........_.......] 7FC18D899830 00000000 00000000 00720070 006D0069 [........p.r.i.m.] 7FC18D899840 00720061 00000079 00000000 00000000 [a.r.y...........] 7FC18D899850 00000000 00000000 00000000 00000000 [................] Repeat 186 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
“EFI PART”是分区的元数据,一般是被分区导致asm disk损坏.
[ebernal@dbaasm new2]$ kfed read emcpowerl | head -25 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 2ABD671E9400 00000000 00000000 00000000 00000000 [................] Repeat 31 times 2ABD671E9600 4542414C 454E4F4C 00000001 00000000 [LABELONE........] 2ABD671E9610 E4E1DDB1 00000020 324D564C 31303020 [.... ...LVM2 001] <==== **** Here ****** 2ABD671E9620 50365A77 71327874 34303156 4B4E6136 [wZ6Ptx2qV1046aNK] 2ABD671E9630 35395159 5147634C 487A5A38 63575A37 [YQ95LcGQ8ZzH7ZWc] 2ABD671E9640 00000000 00000019 00030000 00000000 [................] 2ABD671E9650 00000000 00000000 00000000 00000000 [................] 2ABD671E9660 00000000 00000000 00001000 00000000 [................] 2ABD671E9670 0002F000 00000000 00000000 00000000 [................] 2ABD671E9680 00000000 00000000 00000000 00000000 [................] Repeat 215 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
“LVM2 001” 是逻辑卷的名字,该asm disk很可能被做为lvm管理而被破坏
[ebernal@dbaasm tars]$ kfed read rhdisk16 kfbh.endian: 65 ; 0x000: 0x41 kfbh.hard: 73 ; 0x001: 0x49 kfbh.type: 88 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 32 ; 0x003: 0x20 kfbh.block.blk: 1111709260 ; 0x004: blk=1111709260 kfbh.block.obj: 1634861056 ; 0x008: file=131072 kfbh.check: 119 ; 0x00c: 0x00000077 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 2B6FE2AC1400 20584941 4243564C 61720000 00000077 [AIX LVCB..raw...] <==== **** Here ****** 2B6FE2AC1410 00000000 00000000 00000000 00000000 [................] 2B6FE2AC1420 00000000 00000000 30300000 38306430 [..........000d08] 2B6FE2AC1430 30306131 34643030 30303030 31303030 [1a0000d400000001] 2B6FE2AC1440 61006533 766C6D73 7461645F 00003161 [3e.asmlv_data1..] 2B6FE2AC1450 00000000 00000000 00000000 00000000 [................] Repeat 2 times 2B6FE2AC1480 54000000 4D206575 20207961 31312037 [...Tue May 7 11] 2B6FE2AC1490 3A33343A 32203633 0A333130 00000000 [:43:36 2013.....] 2B6FE2AC14A0 65755400 79614D20 20372020 343A3131 [.Tue May 7 11:4] 2B6FE2AC14B0 34323A38 31303220 00000A33 44000000 [8:24 2013......D] 2B6FE2AC14C0 41313830 30303444 6D6D7900 02007900 [081AD400.ymm.y..] 2B6FE2AC14D0 0100E40C 656E6F4E 00000000 00000000 [....None........] 2B6FE2AC14E0 00000000 00000000 00000000 00000000 [................] Repeat 14 times 2B6FE2AC15D0 00000000 00000000 65310000 61653934 [..........1e49ea] 2B6FE2AC15E0 342E3862 00000000 00000000 00000000 [b8.4............] 2B6FE2AC15F0 00000000 00000000 00000000 00000000 [................] Repeat 224 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]
这里的“AIX LVCB..raw” 是AIX OS volume 的元数据库,也就是说,asm disk 被作为了aix os层面破坏
[oracle@dbep2 disks]$ kfed read asm-disk3 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 06000000 00000000 00000000 00000000 00000000 [................] Repeat 25 times 0602100 51e2b7f6 00ed4e00 00000000 00000001 [...Q.N..........] 0602120 00000000 0000000b 00000100 0000003c [............<...] 0602140 00000242 0000007b 5d8468e7 6147782a [B...{....h.]*xGa] 0602160 d17851a2 327552e2 00000000 00000000 [.Qx..Ru2........] 0602200 00000000 00000000 3130752f 91a4f000 [......../u01....] <==== **** Here ****** 0602220 ff8808e4 d5104cff 000000ac 00000100 [.....L..........] 0602240 00000000 00000000 00000000 09d18000 [................] Repeat 254 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]
这里的/u01很可能表明该asm disk被文件系统覆盖
对于asm disk的各种破坏情况,如果是normal/high冗余,那么asm dg没有问题,可以考虑通过删除异常盘,然后重新加入;如果是外部冗余遭遇到asm disk 被破坏,一般asm disk 会dismount,而且无法正常mount,如果有备份的磁盘头,可以尝试还原磁盘头,mount 磁盘组,然后只读方式迁移数据;如果没有备份磁盘头或者还原之后也无法mount,可能需要通过一些额外的方式处理比如通过工具在asm dismount状态下恢复数据文件,甚至通过对asm block/oracle block碎片重组的方式恢复数据.参考相关文章:
oracle asm系列文章汇总
pvid=yes导致asm无法mount
asm disk header 彻底损坏恢复
分区无法识别导致asm diskgroup无法mount
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
asm disk误设置pvid导致asm diskgroup无法mount恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com