标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,670)
- DB2 (22)
- MySQL (73)
- Oracle (1,532)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (91)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
- Linux 8 修改网卡名称
标签归档:asm不能mount
fdisk分区导致asm disk破坏数据库恢复
尝试mount data磁盘组
SQL> alter diskgroup DATADG mount NOTE: cache registered group DATADG number=1 incarn=0xbc43fafd NOTE: cache began mount (first) of group DATADG number=1 incarn=0xbc43fafd NOTE: Assigning number (1,0) to disk (/dev/raw/raw2) Thu Jun 02 10:14:33 2022 NOTE: GMON heartbeating for grp 1 GMON querying group 1 at 27 for pid 27, osid 3853 NOTE: Assigning number (1,1) to disk () GMON querying group 1 at 28 for pid 27, osid 3853 NOTE: cache dismounting (clean) group 1/0xBC43FAFD (DATADG) NOTE: messaging CKPT to quiesce pins Unix process pid: 3853, image: oracle@node1 (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0xBC43FAFD (DATADG) NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xbc43fafd NOTE: cache deleting context for group DATADG 1/0xbc43fafd GMON dismounting group 1 at 29 for pid 27, osid 3853 NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment ERROR: diskgroup DATADG was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "1" is missing from group number "1" ERROR: alter diskgroup DATADG mount Thu Jun 02 10:14:33 2022 ASM Health Checker found 1 new failures
报错信息比较明显 datadg的disk number 为1的磁盘丢失了。通过fdisk确认磁盘情况
Disk /dev/sdb: 42.9 GB, 42949672960 bytes 64 heads, 32 sectors/track, 40960 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006c2be Device Boot Start End Blocks Id System Disk /dev/sda: 53.7 GB, 53687091200 bytes 64 heads, 32 sectors/track, 51200 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00061443 Device Boot Start End Blocks Id System /dev/sda1 * 2 2049 2097152 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 2050 10241 8388608 82 Linux swap / Solaris Partition 2 does not end on cylinder boundary. /dev/sda3 10242 12289 2097152 83 Linux Partition 3 does not end on cylinder boundary. /dev/sda4 12290 51200 39844864 5 Extended Partition 4 does not end on cylinder boundary. /dev/sda5 12291 14338 2097152 83 Linux /dev/sda6 14340 50178 36699136 83 Linux /dev/sda7 50180 51200 1045504 83 Linux Disk /dev/sdc: 214.7 GB, 214748364800 bytes 255 heads, 63 sectors/track, 26108 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x1b3fba6b Device Boot Start End Blocks Id System /dev/sdc1 1 1045 8393931 83 Linux /dev/sdc2 1046 26108 201318547+ 83 Linux Disk /dev/sdd: 536.9 GB, 536870912000 bytes 255 heads, 63 sectors/track, 65270 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x4c63ecad Device Boot Start End Blocks Id System /dev/sdd1 1 65270 524281243+ 83 Linux Disk /dev/sde: 536.9 GB, 536870912000 bytes 255 heads, 63 sectors/track, 65270 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sdf: 536.9 GB, 536870912000 bytes 255 heads, 63 sectors/track, 65270 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000
根据客户反馈,异常的应该是一个500G的磁盘,而其中sdb为分区,通过kfed命令分析,确认sdc1为ocr磁盘,sdc2为datadg的一块磁盘,另外一块磁盘应该在sdd,sde,sdf三者之中,通过kfed分析sde,sdf均不可能是asm disk(一块是文件系统,一块是彻底没有使用的空盘),如果datadg的磁盘没有丢失,那应该就是sdd这块磁盘,通过dd 磁盘100M空间,然后通过kfed进行分析确认
E:\TEMP\xff>kfed read sdd.dd kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 006648400 00000000 00000000 00000000 00000000 [................] Repeat 26 times 0066485B0 00000000 00000000 4C63ECAD 01000000 [..........cL....] 0066485C0 FE830001 003FFFFF CB370000 00003E7F [......?...7..>..] 0066485D0 00000000 00000000 00000000 00000000 [................] Repeat 1 times 0066485F0 00000000 00000000 00000000 AA550000 [..............U.] 006648600 00000000 00000000 00000000 00000000 [................] Repeat 223 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] E:\TEMP\xff>kfed read sdd1.dd kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 006768400 00000000 00000000 00000000 00000000 [................] Repeat 26 times 0067685B0 00000000 00000000 70D364B4 FE000000 [.........d.p....] 0067685C0 FE83FFFF D13FFFFF BB7603EB 00003A93 [......?...v..:..] 0067685D0 00000000 00000000 00000000 00000000 [................] Repeat 1 times 0067685F0 00000000 00000000 00000000 AA550000 [..............U.] 006768600 02038201 00000008 80000001 826037C1 [.............7`.] 006768EA0 00000079 00800105 0000007A 00800105 [y.......z.......] 006768EB0 0000007C 00800105 0000007D 00800105 [|.......}.......] 0067693C0 0000015C 00800105 0000015D 00800105 [\.......].......] 0067693D0 0000015F 00800105 00000160 00800105 [_.......`.......] 0067693E0 00000161 00800105 00000163 00800105 [a.......c.......] 0067693F0 00000164 00800105 00000166 00800105 [d.......f.......] KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] E:\TEMP\xff>kfed read sdd.dd blkn=1|more kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 2 ; 0x002: KFBTYP_FREESPC kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 1 ; 0x004: blk=1 kfbh.block.obj: 2147483649 ; 0x008: disk=1 kfbh.check: 2197087544 ; 0x00c: 0x82f4e538 kfbh.fcn.base: 616391 ; 0x010: 0x000967c7 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdfsb.aunum: 0 ; 0x000: 0x00000000 kfdfsb.max: 254 ; 0x004: 0x00fe kfdfsb.cnt: 254 ; 0x006: 0x00fe kfdfsb.bound: 0 ; 0x008: 0x0000 kfdfsb.flag: 1 ; 0x00a: B=1 kfdfsb.ub1spare: 0 ; 0x00b: 0x00 kfdfsb.spare[0]: 0 ; 0x00c: 0x00000000 kfdfsb.spare[1]: 0 ; 0x010: 0x00000000 kfdfsb.spare[2]: 0 ; 0x014: 0x00000000
通过上述信息分析,基本上可以确认sdd磁盘以前是asm disk,但是被fdisk进行了分区,基于这种情况,通过对磁盘组进行修复
E:\TEMP\xff>kfed read sdd.ok kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 2147483649 ; 0x008: disk=1 kfbh.check: 424926402 ; 0x00c: 0x1953dcc2 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8 kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000 kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 kfdhdb.compat: 186646528 ; 0x020: 0x0b200000 kfdhdb.dsknum: 1 ; 0x024: 0x0001 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: DATADG_0001 ; 0x028: length=11 kfdhdb.grpname: DATADG ; 0x048: length=6 kfdhdb.fgname: DATADG_0001 ; 0x068: length=11 kfdhdb.capname: ; 0x088: length=0 kfdhdb.crestmp.hi: 33074858 ; 0x0a8: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2 kfdhdb.crestmp.lo: 2375520256 ; 0x0ac: USEC=0x0 MSEC=0x1e4 SECS=0x19 MINS=0x23 kfdhdb.mntstmp.hi: 33074858 ; 0x0b0: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2 kfdhdb.mntstmp.lo: 2375522304 ; 0x0b4: USEC=0x0 MSEC=0x1e6 SECS=0x19 MINS=0x23 kfdhdb.secsize: 512 ; 0x0b8: 0x0200 kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80 kfdhdb.dsksize: 512000 ; 0x0c4: 0x0007d000 kfdhdb.pmcnt: 6 ; 0x0c8: 0x00000006 kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001 kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002 kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000 kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000 kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000 kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000 kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000 kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000 kfdhdb.grpstmp.hi: 33072461 ; 0x0e4: HOUR=0xd DAYS=0xa MNTH=0x9 YEAR=0x7e2 kfdhdb.grpstmp.lo: 3452534784 ; 0x0e8: USEC=0x0 MSEC=0x260 SECS=0x1c MINS=0x33
使用rman对数据库进行备份,并且重建磁盘组实现数据0丢失
发表在 Oracle备份恢复
标签为 asm disk分区恢复, asm fdisk, asm不能mount, asm恢复, fdisk asm恢复, ORA-15040, ORA-15042
评论关闭
pvid=yes导致asm无法mount
今天凌晨接到客户恢复请求,对于aix rac,两个ibm存储做mirror的环境中,客户做存储容灾演练,发现磁盘的名称发生改变,然后对其中一个磁盘设置pvid,结果悲剧了导致asm一个磁盘组无法正常起来。然后又aix端删除这些设备,然后重新扫描设备。结果不是一个磁盘组不能mount,而是整个gi就无法正常启动。希望我们给予技术支持。
查看asm 日志,确定asm disk信息
从这里可以确定,一共有两个asm diskgroup,每个group有两个磁盘,hdisk2和hdisk3 为hisdata,hdisk4,和hdisk5为emrdata.
使用kfed分析磁盘头
dd if=/dev/rhdisk2 of=/tmp/xifenfei/rhdisk2.dd bs=1024k count=10 dd if=/dev/rhdisk3 of=/tmp/xifenfei/rhdisk3.dd bs=1024k count=10 dd if=/dev/rhdisk4 of=/tmp/xifenfei/rhdisk4.dd bs=1024k count=10 dd if=/dev/rhdisk5 of=/tmp/xifenfei/rhdisk5.dd bs=1024k count=10 --传输到我电脑上分析 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk2.dd|grep name kfdhdb.dskname: HISDATA_0000 ; 0x028: length=12 kfdhdb.grpname: HISDATA ; 0x048: length=7 kfdhdb.fgname: HISDATA_0000 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk3.dd|grep name kfdhdb.dskname: HISDATA_0001 ; 0x028: length=12 kfdhdb.grpname: HISDATA ; 0x048: length=7 kfdhdb.fgname: HISDATA_0001 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk4.dd|grep name kfdhdb.dskname: EMRDATA_0000 ; 0x028: length=12 kfdhdb.grpname: EMRDATA ; 0x048: length=7 kfdhdb.fgname: EMRDATA_0000 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd|grep name C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd kfbh.endian: 201 ; 0x000: 0xc9 kfbh.hard: 194 ; 0x001: 0xc2 kfbh.type: 212 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 193 ; 0x003: 0xc1 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 000000000 C1D4C2C9 00000000 00000000 00000000 [................] 000000010 00000000 00000000 00000000 00000000 [................] Repeat 254 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][212] C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd blkn=2|grep kfbh kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 33554432 ; 0x004: blk=33554432 kfbh.block.obj: 16777344 ; 0x008: file=128 kfbh.check: 2654889601 ; 0x00c: 0x9e3e6681 kfbh.fcn.base: 1696071680 ; 0x010: 0x65180000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd blkn=510|grep name kfdhdb.dskname: EMRDATA_0001 ; 0x028: length=12 kfdhdb.grpname: EMRDATA ; 0x048: length=7 kfdhdb.fgname: EMRDATA_0001 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0
通过上述分析,基本上确定由于对hdisk5设置了pvid导致该asm disk的磁盘头损坏.这个可以直接使用asm repair功能修复(注意要clear pvid)
C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd |grep name kfdhdb.dskname: EMRDATA_0001 ; 0x028: length=12 kfdhdb.grpname: EMRDATA ; 0x048: length=7 kfdhdb.fgname: EMRDATA_0001 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0
启动crs到cssd进程报错分析
1. 由于删除磁盘,扫描设备导致hdisk[2-5] 权限和用户组不对
2. 由于删除,扫描磁盘导致磁盘共享模式不对
修复磁盘头和解决这两个问题之后,gi启动正常,磁盘组也正常mount,数据库也正常启动,数据0丢失,至此完美恢复
类似客户恢复案例:asm disk误设置pvid导致asm diskgroup无法mount恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com
asm disk误设置pvid导致asm diskgroup无法mount恢复
有朋友找到我说他们把以前存储到AIX直连的存储切换为含光纤交换机的存储网络后,RAC无法启动,让我给予支持.通过分析是由于换链路之后开始磁盘顺序不对,维护人员对其asm disk 设置了pvid,导致asm 磁盘组无法正常mount,从而使得含votedisk的dg的asm disk无法正常访问,从而RAC的cssd进程无法启动,同样数据文件的磁盘组也无法mount,通过kfed修复成功,实现数据0丢失.
平台版本信息(2节点RAC)
$ sqlplus -v SQL*Plus: Release 11.2.0.4.0 Production $ uname -a AIX db2 1 7 00F9733E4C00
GI日志报错信息
2014-12-20 16:44:08.769: [ohasd(6946818)]CRS-2769:Unable to failover resource 'ora.diskmon'. 2014-12-20 16:44:11.775: [cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log 2014-12-20 16:44:26.791: [cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; 、Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log 2014-12-20 16:44:41.812: [cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
从这里可以看出来是由于RAC启动过程中无法获得votedisk使得其无法正常启动,通过分析日志找出来votedisk相关磁盘
2014-12-15 17:36:15.424: [cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk4; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log 2014-12-15 17:36:15.433: [cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk5; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log 2014-12-15 17:36:15.445: [cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk6; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
从这里可以知道rhdisk4,5,6为votedisk对应磁盘,使用kfed查看磁盘头信息
$kfed read /dev/rhdisk4 kfbh.endian: 201 ; 0x000: 0xc9 kfbh.hard: 194 ; 0x001: 0xc2 kfbh.type: 212 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 193 ; 0x003: 0xc1 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 1102BEE00 C9C2D4C1 00000000 00000000 00000000 [................] 1102BEE10 00000000 00000000 00000000 00000000 [................] Repeat 6 times 1102BEE80 00F9733D 67553E0A 00000000 00000000 [..s=gU>.........] 1102BEE90 00000000 00000000 00000000 00000000 [................] Repeat 246 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][212] $kfed read /dev/rhdisk4 blkn=1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 2 ; 0x002: KFBTYP_FREESPC kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 1 ; 0x004: blk=1 kfbh.block.obj: 2147483648 ; 0x008: disk=0 kfbh.check: 3883664132 ; 0x00c: 0xe77c0304 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdfsb.aunum: 0 ; 0x000: 0x00000000 kfdfsb.max: 254 ; 0x004: 0x00fe kfdfsb.cnt: 23 ; 0x006: 0x0017 kfdfsb.bound: 0 ; 0x008: 0x0000 kfdfsb.flag: 1 ; 0x00a: B=1 kfdfsb.ub1spare: 0 ; 0x00b: 0x00 kfdfsb.spare[0]: 0 ; 0x00c: 0x00000000 kfdfsb.spare[1]: 0 ; 0x010: 0x00000000 kfdfsb.spare[2]: 0 ; 0x014: 0x00000000 kfdfse[0].fse: 119 ; 0x018: FREE=0x7 FRAG=0x7 kfdfse[1].fse: 16 ; 0x019: FREE=0x0 FRAG=0x1 ………… $kfed read /dev/rhdisk4 blkn=510 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 254 ; 0x004: blk=254 kfbh.block.obj: 2147483648 ; 0x008: disk=0 kfbh.check: 3460116983 ; 0x00c: 0xce3d31f7 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8 kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000 kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 kfdhdb.compat: 186646528 ; 0x020: 0x0b200000 kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 2 ; 0x026: KFDGTP_NORMAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: CRS_0000 ; 0x028: length=8 kfdhdb.grpname: CRS ; 0x048: length=3 kfdhdb.fgname: CRS_0000 ; 0x068: length=8 …………
由上述分析可以基本上确定是asm disk header 被破坏,进一步分析破坏原因
[db2/dev#]lspv hdisk0 00f9733ef7cf27e9 rootvg active hdisk1 00f9733e21b953e6 rootvg active hdisk2 00f9733e21b97a83 appvg active hdisk3 00f9733e21b98434 appvg active hdisk4 00f9733d67553e0a None hdisk5 00f9733d67553f31 None hdisk6 00f9733d67554011 None hdisk7 00f9733d67554165 None hdisk8 00f9733d675541e5 None hdisk9 00f9733d675542e4 None hdisk10 none None [db2/dev#]ls -l rhdisk* crw------- 2 root system 24, 1 Oct 18 11:45 rhdisk0 crw------- 1 root system 24, 3 Oct 18 13:27 rhdisk1 crw------- 1 root system 24, 5 Dec 20 20:02 rhdisk10 crw------- 1 root system 24, 2 Oct 18 13:32 rhdisk2 crw------- 1 root system 24, 0 Oct 18 13:32 rhdisk3 crw-rw---- 1 grid asmadmin 24, 8 Dec 20 20:02 rhdisk4 crw-rw---- 1 grid asmadmin 24, 9 Dec 20 20:02 rhdisk5 crw-rw---- 1 grid asmadmin 24, 10 Dec 20 20:02 rhdisk6 crw-rw---- 1 grid asmadmin 24, 4 Dec 20 20:02 rhdisk7 crw-rw---- 1 grid asmadmin 24, 6 Dec 20 20:02 rhdisk8 crw-rw---- 1 grid asmadmin 24, 7 Dec 20 20:02 rhdisk9
从这里基本上可以看出来,是由于磁盘头被重写了pvid,导致asm disk header 被破坏.进一步分析asm log,确定哪些磁盘被用作asm disk
SQL> CREATE DISKGROUP CRS NORMAL REDUNDANCY DISK '/dev/rhdisk4', '/dev/rhdisk5', '/dev/rhdisk6' ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ NOTE: Assigning number (1,0) to disk (/dev/rhdisk4) NOTE: Assigning number (1,1) to disk (/dev/rhdisk5) NOTE: Assigning number (1,2) to disk (/dev/rhdisk6) NOTE: initializing header on grp 1 disk CRS_0000 NOTE: initializing header on grp 1 disk CRS_0001 NOTE: initializing header on grp 1 disk CRS_0002 SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY DISK '/dev/rhdisk9' SIZE 614400M ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ NOTE: Assigning number (2,0) to disk (/dev/rhdisk9) NOTE: initializing header on grp 2 disk DATA_0000 SQL> CREATE DISKGROUP FBA EXTERNAL REDUNDANCY DISK '/dev/rhdisk8' SIZE 204800M ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ NOTE: Assigning number (3,0) to disk (/dev/rhdisk8) NOTE: initializing header on grp 3 disk FBA_0000 SQL> CREATE DISKGROUP ARCH EXTERNAL REDUNDANCY DISK '/dev/rhdisk7' SIZE 102400M ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ NOTE: Assigning number (4,0) to disk (/dev/rhdisk7) NOTE: initializing header on grp 4 disk ARCH_0000
这里可以确定asm disk为rhdisk[4-9],通过kfed分析全部和rhdisk4一样的问题,也符合lspv查询出来的结果,使用kfed repair修复asm disk header后
SQL> alter diskgroup data mount; Diskgroup altered. SQL> alter diskgroup fba mount; Diskgroup altered. SQL> alter diskgroup arch mount; Diskgroup altered. SQL> alter diskgroup crs mount; Diskgroup altered. SQL> select group_number,disk_number,path from v$asm_disk; GROUP_NUMBER DISK_NUMBER PATH ------------ ----------- -------------------------------------------------- 2 0 /dev/rhdisk4 2 1 /dev/rhdisk5 2 2 /dev/rhdisk6 1 0 /dev/rhdisk7 4 0 /dev/rhdisk8 3 0 /dev/rhdisk9 6 rows selected. SQL> select group_number,name from v$asm_diskgroup; GROUP_NUMBER NAME ------------ ------------------------------------------------------------ 1 ARCH 2 CRS 3 DATA 4 FBA
这里证明通过kfed对磁盘头的修复,asm磁盘组已经全部mount成功,GI状态也恢复正常
[db2/#]crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.CRS.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.DATA.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.FBA.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.LISTENER.lsnr ONLINE ONLINE db1 ONLINE ONLINE db2 ora.asm ONLINE ONLINE db1 Started ONLINE ONLINE db2 Started ora.gsd OFFLINE OFFLINE db1 OFFLINE OFFLINE db2 ora.net1.network ONLINE ONLINE db1 ONLINE ONLINE db2 ora.ons ONLINE ONLINE db1 ONLINE ONLINE db2 ora.registry.acfs ONLINE ONLINE db1 ONLINE ONLINE db2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE db1 ora.cvu 1 ONLINE ONLINE db1 ora.db1.vip 1 ONLINE ONLINE db1 ora.db2.vip 1 ONLINE ONLINE db2 ora.nkora.db 1 ONLINE ONLINE db1 Open 2 ONLINE ONLINE db2 Open ora.oc4j 1 ONLINE ONLINE db1 ora.scan1.vip 1 ONLINE ONLINE db1
这里忽略了一个问题,在修复磁盘头之前没有清除pvid,导致磁盘头修复后,pvid依然存储在odm中
[db2/dev#]lspv hdisk0 00f9733ef7cf27e9 rootvg active hdisk1 00f9733e21b953e6 rootvg active hdisk2 00f9733e21b97a83 appvg active hdisk3 00f9733e21b98434 appvg active hdisk4 00f9733d67553e0a None hdisk5 00f9733d67553f31 None hdisk6 00f9733d67554011 None hdisk7 00f9733d67554165 None hdisk8 00f9733d675541e5 None hdisk9 00f9733d675542e4 None hdisk10 none None
通过分析发现fba磁盘组中无任何记录,使用该磁盘组进行直接清除pvid测试
$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:13:31 2014 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup fba dismount; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options $ exit You have mail in /usr/spool/mail/root [db2/#]chdev -l hdisk8 -a pv=clear hdisk8 changed [db2/#]lspv hdisk0 00f9733ef7cf27e9 rootvg active hdisk1 00f9733e21b953e6 rootvg active hdisk2 00f9733e21b97a83 appvg active hdisk3 00f9733e21b98434 appvg active hdisk4 00f9733d67553e0a None hdisk5 00f9733d67553f31 None hdisk6 00f9733d67554011 None hdisk7 00f9733d67554165 None hdisk8 none None hdisk9 00f9733d675542e4 None hdisk10 none None [db2/#]su - grid $ sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:15:19 2014 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup fba mount; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options
通过测试直接清除pvid asm 磁盘头依然工作正常,关闭GI,使用chdev清除hdisk[4-9]所有pvid,启动GI一切正常
[db1/#]crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.CRS.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.DATA.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.FBA.dg ONLINE ONLINE db1 ONLINE ONLINE db2 ora.LISTENER.lsnr ONLINE ONLINE db1 ONLINE ONLINE db2 ora.asm ONLINE ONLINE db1 Started ONLINE ONLINE db2 Started ora.gsd OFFLINE OFFLINE db1 OFFLINE OFFLINE db2 ora.net1.network ONLINE ONLINE db1 ONLINE ONLINE db2 ora.ons ONLINE ONLINE db1 ONLINE ONLINE db2 ora.registry.acfs ONLINE ONLINE db1 ONLINE ONLINE db2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE db1 ora.cvu 1 ONLINE ONLINE db1 ora.db1.vip 1 ONLINE ONLINE db1 ora.db2.vip 1 ONLINE ONLINE db2 ora.nkora.db 1 ONLINE ONLINE db1 Open 2 ONLINE ONLINE db2 Open ora.oc4j 1 ONLINE ONLINE db1 ora.scan1.vip 1 ONLINE ONLINE db1 [db1/#]lspv hdisk0 00f9733df7c7a9db rootvg active hdisk1 00f9733d21dad8fe rootvg active hdisk2 00f9733d21dbd08b appvg active hdisk3 00f9733d21dbd2ab appvg active hdisk4 none None hdisk5 none None hdisk6 none None hdisk7 none None hdisk8 none None hdisk9 none None hdisk10 none None
至此设置pvid导致asm disk header损坏的asm 恢复正常,实现数据0丢失。
温馨提示:aix asm disk磁盘中不能设置pvid,否则将会导致asm disk header 损坏,无法正常mount
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com
发表在 Oracle ASM, 非常规恢复
标签为 asm, asm不能mount, asm恢复, kfbtTraverseBlock, KFED-00322, mount, ORA-15042, pvid
一条评论