标签归档:asm disk破坏

asm disk被加入vg恢复

接到客户恢复请求:把oracle asm datagroup中的一个磁盘增加到vg中,现在磁盘组无法mount,数据库无法正常启动.远程登录现场进行分析发现情况如下:
操作系统层面分析
history操作记录
add_asm_disk_to_vg


这里比较明显把一个磁盘做成pv,并且加入到vg中,然后再分配199G给lv_home,系统层面分析lvm信息

--查看pv信息
[root@xff1 ~]# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               VolGroup
  PV Size               277.98 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              71161
  Free PE               0
  Allocated PE          71161
  PV UUID               F6QO3f-065n-mwTW-Xbq2-Xx2y-c8HD-Tkr7V7
   
  --- Physical volume ---
  PV Name               /dev/sdg    <----新加入的磁盘
  VG Name               VolGroup
  PV Size               200.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              51199
  Free PE               255
  Allocated PE          50944
  PV UUID               i69vUG-nCIK-dtxL-FvpD-2WZd-bvLv-n7lwrb

[root@xff1 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_root
  LV Name                lv_root
  VG Name                VolGroup
  LV UUID                JUNnkN-m4zq-D0gh-h42b-cUM1-Wh1q-ZMtQE4
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:47 +0800
  LV Status              available
  # open                 1
  LV Size                50.00 GiB
  Current LE             12800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_home
  LV Name                lv_home
  VG Name                VolGroup
  LV UUID                eZTkLt-cNGX-371i-m8Bd-VdD9-q6Hz-wYDRIJ
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:54 +0800
  LV Status              available
  # open                 1
  LV Size                422.97 GiB      <-----lv大小变成422G,应该是被扩了199G后结果
  Current LE             108281
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_swap
  LV Name                lv_swap
  VG Name                VolGroup
  LV UUID                54P9ok-VpwO-zM68-hvwY-9GBf-89yb-8xQAMn
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:09:23 +0800
  LV Status              available
  # open                 1
  LV Size                4.00 GiB
  Current LE             1024
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1


[root@xff1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       50G  3.9G   43G   9% /
tmpfs                  63G  509M   63G   1% /dev/shm
/dev/sda1             477M   44M  408M  10% /boot
/dev/mapper/VolGroup-lv_home
                      417G  226G  170G  58% /home  <----增加了199g空间,剩余只剩170G,证明增加空间之后最少使用了30G以上  

基于这样的情况,基本上可以确定sdg盘加入VolGroup中并且被分配给 lv_home中,而且还写入了数据(/home空闲空间只剩余170G,lv_home当时扩了199G).
asm层面分析
asm磁盘组无法mount,提示缺少一块磁盘

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */ 
NOTE: cache registered group DATA number=1 incarn=0xa1dbff16
NOTE: cache began mount (first) of group DATA number=1 incarn=0xa1dbff16
NOTE: Assigning number (1,2) to disk (/dev/asmdisk3)
NOTE: Assigning number (1,1) to disk (/dev/asmdisk2)
Sat Apr 25 13:04:58 2020
ERROR: no read quorum in group: required 1, found 0 disks
NOTE: cache dismounting (clean) group 1/0xA1DBFF16 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 81552, image: oracle@rac2db1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xA1DBFF16 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0xa1dbff16
NOTE: cache deleting context for group DATA 1/0xa1dbff16
GMON dismounting group 1 at 19 for pid 30, osid 81552
NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x9 marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15040: diskgroup is incomplete
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */

kfed分析磁盘信息
kfed_lvm


报错比较明显asm disk磁盘头被lvm的信息取代(因为asm disk 被加入到vg中),根据前面的分析,该磁盘被写入数据很可能超过30G,使用kfed分析一个随意au,确认被破坏,证明开始判断基本正确

root@xff1:/home/oracle11g$kfed read /dev/asmdisk1 aun=10000
kfbh.endian:                         51 ; 0x000: 0x33
kfbh.hard:                           55 ; 0x001: 0x37
kfbh.type:                           32 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         42 ; 0x003: 0x2a
kfbh.block.blk:              1329801248 ; 0x004: blk=1329801248
kfbh.block.obj:              1128615502 ; 0x008: file=347726
kfbh.check:                  1094999892 ; 0x00c: 0x41445f54
kfbh.fcn.base:                675103060 ; 0x010: 0x283d4154
kfbh.fcn.wrap:               1448232275 ; 0x014: 0x56524553
kfbh.spare1:                 1598374729 ; 0x018: 0x5f454349
kfbh.spare2:                 1162690894 ; 0x01c: 0x454d414e
7F7843EAD400 2A203733 4F432820 43454E4E 41445F54  [37 * (CONNECT_DA]
7F7843EAD410 283D4154 56524553 5F454349 454D414E  [TA=(SERVICE_NAME]
7F7843EAD420 6361723D 29626432 44494328 5250283D  [=rac2db)(CID=(PR]
7F7843EAD430 4152474F 3A443D4D 4341505C DFCF3153  [OGRAM=D:\PACS1..]
7F7843EAD440 B3BEB7BB 6369445C 65536D6F 72657672  [....\DicomServer]
7F7843EAD450 445C524D 6D6F6369 76726553 524D7265  [MR\DicomServerMR]
7F7843EAD460 6578652E 4F482829 573D5453 362D4E49  [.exe)(HOST=WIN-6]
7F7843EAD470 51414C38 54553645 28294A30 52455355  [8LAQE6UT0J)(USER]
7F7843EAD480 6D64413D 73696E69 74617274 2929726F  [=Administrator))]
7F7843EAD490 202A2029 44444128 53534552 5250283D  [) * (ADDRESS=(PR]
7F7843EAD4A0 434F544F 743D4C4F 28297063 54534F48  [OTOCOL=tcp)(HOST]
7F7843EAD4B0 2E30313D 2E303831 30332E31 4F502829  [=10.180.1.30)(PO]
7F7843EAD4C0 343D5452 37333539 2A202929 74736520  [RT=49537)) * est]
7F7843EAD4D0 696C6261 2A206873 63617220 20626432  [ablish * rac2db ]
7F7843EAD4E0 3231202A 0A343135 2D534E54 31353231  [* 12514.TNS-1251]
7F7843EAD4F0 54203A34 6C3A534E 65747369 2072656E  [4: TNS:listener ]
7F7843EAD500 73656F64 746F6E20 72756320 746E6572  [does not current]
7F7843EAD510 6B20796C 20776F6E 7320666F 69767265  [ly know of servi]
7F7843EAD520 72206563 65757165 64657473 206E6920  [ce requested in ]
7F7843EAD530 6E6E6F63 20746365 63736564 74706972  [connect descript]
………………
7F7843EAE300 6F636944 7265536D 4D726576 69445C52  [DicomServerMR\Di]
7F7843EAE310 536D6F63 65767265 2E524D72 29657865  [comServerMR.exe)]
7F7843EAE320 534F4828 49573D54 4F302D4E 314B304A  [(HOST=WIN-0OJ0K1]
7F7843EAE330 4955304E 55282954 3D524553 696D6441  [N0UIT)(USER=Admi]
7F7843EAE340 7473696E 6F746172 29292972 28202A20  [nistrator))) * (]
7F7843EAE350 52444441 3D535345 4F525028 4F434F54  [ADDRESS=(PROTOCO]
7F7843EAE360 63743D4C 48282970 3D54534F 312E3031  [L=tcp)(HOST=10.1]
7F7843EAE370 312E3038 2930332E 524F5028 35353D54  [80.1.30)(PORT=55]
7F7843EAE380 29383632 202A2029 61747365 73696C62  [268)) * establis]
7F7843EAE390 202A2068 32636172 2A206264 35323120  [h * rac2db * 125]
7F7843EAE3A0 540A3431 312D534E 34313532 4E54203A  [14.TNS-12514: TN]
7F7843EAE3B0 696C3A53 6E657473 64207265 2073656F  [S:listener does ]
7F7843EAE3C0 20746F6E 72727563 6C746E65 6E6B2079  [not currently kn]
7F7843EAE3D0 6F20776F 65732066 63697672 65722065  [ow of service re]
7F7843EAE3E0 73657571 20646574 63206E69 656E6E6F  [quested in conne]
7F7843EAE3F0 64207463 72637365 6F747069 34320A72  [ct descriptor.24]
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][32]

通过上述kfed可以看到第10000 au的位置被写入的是数据库异常之后listener.log的信息(该数据库安装在/home目录中),进一步证明覆盖,通过以下信息证明sdg就是asmdisk1

[root@xff1 dev]# ls -l sdg
brw-rw---- 1 root disk 8,  96 Apr 25 00:05 sdg
[root@xff1 dev]# ls -l asmdisk1
brw-rw---- 1 grid asmadmin 8,  96 Apr 25 00:05 asmdisk1

基于现在的情况,data磁盘组是由三块 200G的磁盘组成,第一块磁盘被意外加入vg,并且写入数据大于30G,无法从asm层面直接通过kfed修复磁盘组,然后直接mount,只能通过oracle asm磁盘数据块重组技术(asm disk header 彻底损坏恢复)实现没有覆盖数据的恢复.
20200426202310


该客户运气还不错,通过仅剩的2019年12月份几天的不成功备份找出来所有的数据文件(无归档),然后强制拉库成功.通过碎片恢复的最新的数据文件数据结合2019年12月份备份,实现绝大部分业务数据恢复,最大限度减少客户损失.对于oracle rac数据库服务器磁盘操作需要谨慎.
如果不幸有类似oracle asm disk被破坏(格式化,dd部分,做成lv等),需要进行恢复支持,可以联系我们,做专业的恢复评估,最大限度,最快速度抢救数据,减少损失
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
恢复过部分asm被格式化案例:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统

发表在 非常规恢复 | 标签为 , , , , | 评论关闭