标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,701)
- DB2 (22)
- MySQL (74)
- Oracle (1,562)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (53)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (571)
- Oracle安装升级 (94)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (81)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- fio测试io,导致磁盘文件系统损坏故障恢复
- ORA-742 写丢失常见bug记录
- Oracle 19c 202501补丁(RUs+OJVM)
- 避免 19c 数据库性能问题需要考虑的事项 (Doc ID 3050476.1)
- Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]
- ORA-600 ktuPopDictI_1恢复
- impdp导入数据丢失sys授权问题分析
- impdp 创建index提示ORA-00942: table or view does not exist
- 数据泵导出 (expdp) 和导入 (impdp)工具性能降低分析参考
- 19c非归档数据库断电导致ORA-00742故障恢复
- Oracle 19c – 手动升级到 Non-CDB Oracle Database 19c 的完整核对清单
- sqlite数据库简单操作
- Oracle 暂定和恢复功能
- .pzpq扩展名勒索恢复
- Oracle read only用户—23ai新特性:只读用户
- 迁移awr快照数据到自定义表空间
- .hmallox加密mariadb/mysql数据库恢复
- 2025年首个故障恢复—ORA-600 kcbzib_kcrsds_1
- 第一例Oracle 21c恢复咨询
- ORA-15411: Failure groups in disk group DATA have different number of disks.
分类目录归档:AIX
pvid=yes导致asm无法mount
今天凌晨接到客户恢复请求,对于aix rac,两个ibm存储做mirror的环境中,客户做存储容灾演练,发现磁盘的名称发生改变,然后对其中一个磁盘设置pvid,结果悲剧了导致asm一个磁盘组无法正常起来。然后又aix端删除这些设备,然后重新扫描设备。结果不是一个磁盘组不能mount,而是整个gi就无法正常启动。希望我们给予技术支持。
查看asm 日志,确定asm disk信息
从这里可以确定,一共有两个asm diskgroup,每个group有两个磁盘,hdisk2和hdisk3 为hisdata,hdisk4,和hdisk5为emrdata.
使用kfed分析磁盘头
dd if=/dev/rhdisk2 of=/tmp/xifenfei/rhdisk2.dd bs=1024k count=10 dd if=/dev/rhdisk3 of=/tmp/xifenfei/rhdisk3.dd bs=1024k count=10 dd if=/dev/rhdisk4 of=/tmp/xifenfei/rhdisk4.dd bs=1024k count=10 dd if=/dev/rhdisk5 of=/tmp/xifenfei/rhdisk5.dd bs=1024k count=10 --传输到我电脑上分析 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk2.dd|grep name kfdhdb.dskname: HISDATA_0000 ; 0x028: length=12 kfdhdb.grpname: HISDATA ; 0x048: length=7 kfdhdb.fgname: HISDATA_0000 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk3.dd|grep name kfdhdb.dskname: HISDATA_0001 ; 0x028: length=12 kfdhdb.grpname: HISDATA ; 0x048: length=7 kfdhdb.fgname: HISDATA_0001 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk4.dd|grep name kfdhdb.dskname: EMRDATA_0000 ; 0x028: length=12 kfdhdb.grpname: EMRDATA ; 0x048: length=7 kfdhdb.fgname: EMRDATA_0000 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd|grep name C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd kfbh.endian: 201 ; 0x000: 0xc9 kfbh.hard: 194 ; 0x001: 0xc2 kfbh.type: 212 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 193 ; 0x003: 0xc1 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 000000000 C1D4C2C9 00000000 00000000 00000000 [................] 000000010 00000000 00000000 00000000 00000000 [................] Repeat 254 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][212] C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd blkn=2|grep kfbh kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 33554432 ; 0x004: blk=33554432 kfbh.block.obj: 16777344 ; 0x008: file=128 kfbh.check: 2654889601 ; 0x00c: 0x9e3e6681 kfbh.fcn.base: 1696071680 ; 0x010: 0x65180000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd blkn=510|grep name kfdhdb.dskname: EMRDATA_0001 ; 0x028: length=12 kfdhdb.grpname: EMRDATA ; 0x048: length=7 kfdhdb.fgname: EMRDATA_0001 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0
通过上述分析,基本上确定由于对hdisk5设置了pvid导致该asm disk的磁盘头损坏.这个可以直接使用asm repair功能修复(注意要clear pvid)
C:\Users\FAL>kfed read H:\temp\xifenfei\tmp\xifenfei\rhdisk5.dd |grep name kfdhdb.dskname: EMRDATA_0001 ; 0x028: length=12 kfdhdb.grpname: EMRDATA ; 0x048: length=7 kfdhdb.fgname: EMRDATA_0001 ; 0x068: length=12 kfdhdb.capname: ; 0x088: length=0
启动crs到cssd进程报错分析
1. 由于删除磁盘,扫描设备导致hdisk[2-5] 权限和用户组不对
2. 由于删除,扫描磁盘导致磁盘共享模式不对
修复磁盘头和解决这两个问题之后,gi启动正常,磁盘组也正常mount,数据库也正常启动,数据0丢失,至此完美恢复
类似客户恢复案例:asm disk误设置pvid导致asm diskgroup无法mount恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com
aix中procmap 查看oracle进程占用系统内存
procmap是用来显示进程地址空间,通过这个命令找出来的“read/write”表示为进程的私有内存,如果对应到oracle 进程的LOCAL中来,也就是对应了是oracle 会话进程占用的操作系统内存,和sga与pga无关,即ORACLE数据库进程占用的额外的系统内存,在计算oracle数据库消耗内存的时候,要考虑sga+pga+process占用的内存
procmap命令使用
$procmap 7931354 7931354 : oracleccicdx (LOCAL=NO) 100000000 95504K read/exec oracle 110000035 2399K read/write oracle 9fffffff0000000 51K read/exec /usr/ccs/bin/usla64 9fffffff000cfe2 0K read/write /usr/ccs/bin/usla64 900000000b05930 2K read/exec /usr/lib/libC.a[shr3_64.o] 9001000a0122930 0K read/write /usr/lib/libC.a[shr3_64.o] 900000000ae6b00 118K read/exec /usr/lib/libC.a[shrcore_64.o] 9001000a030a100 12K read/write /usr/lib/libC.a[shrcore_64.o] 900000000ac8000 118K read/exec /usr/lib/libC.a[ansicore_64.o] 9001000a0300e00 36K read/write /usr/lib/libC.a[ansicore_64.o] 900000000411468 0K read/exec /usr/lib/libicudata.a[shr_64.o] 9001000a0121468 0K read/write /usr/lib/libicudata.a[shr_64.o] 90000000040f738 2K read/exec /usr/lib/libC.a[shr2_64.o] 9001000a0314738 0K read/write /usr/lib/libC.a[shr2_64.o] 9000000008dd800 1699K read/exec /usr/lib/libC.a[ansi_64.o] 9001000a0315a00 277K read/write /usr/lib/libC.a[ansi_64.o] 9000000008bab00 135K read/exec /usr/lib/libC.a[shr_64.o] 9001000a030eb00 19K read/write /usr/lib/libC.a[shr_64.o] 900000000708180 1732K read/exec /usr/lib/libicuuc.a[shr_64.o] 9001000a035cdac 180K read/write /usr/lib/libicuuc.a[shr_64.o] 900000000493d80 2510K read/exec /usr/lib/libicui18n.a[shr_64.o] 9001000a038a148 270K read/write /usr/lib/libicui18n.a[shr_64.o] 900000000473200 91K read/exec /usr/lib/libsrc.a[shr_64.o] 9001000a01127a8 55K read/write /usr/lib/libsrc.a[shr_64.o] 90000000045a300 98K read/exec /usr/lib/libcorcfg.a[shr_64.o] 9001000a04147c8 18K read/write /usr/lib/libcorcfg.a[shr_64.o] 900000000b16200 750K read/exec /usr/lib/liblvm.a[shr_64.o] 9001000a03dd028 219K read/write /usr/lib/liblvm.a[shr_64.o] 900000000444f00 82K read/exec /usr/lib/libcfg.a[shr_64.o] 9001000a03d58f0 26K read/write /usr/lib/libcfg.a[shr_64.o] 90000000040e3a0 2K read/exec /usr/lib/libcrypt.a[shr_64.o] 9001000a0106948 0K read/write /usr/lib/libcrypt.a[shr_64.o] 90000001615d860 5K read/exec /usr/lib/libc.a[aio_64.o] 9001000a3aed568 0K read/write /usr/lib/libc.a[aio_64.o] 9000000003efc00 120K read/exec /usr/lib/libodm.a[shr_64.o] 9001000a0107cc8 40K read/write /usr/lib/libodm.a[shr_64.o] 900000000bd2c80 147K read/exec /usr/lib/libperfstat.a[shr_64.o] 9001000a041a960 14K read/write /usr/lib/libperfstat.a[shr_64.o] 9000000017d7000 0K read/exec /usr/lib/libdl.a[shr_64.o] 9001000a0517000 0K read/write /usr/lib/libdl.a[shr_64.o] 9000000158ed100 8636K read/exec /oracle/product/db10gr2/lib/libjox10.a[shr.o] 8001000a0000b78 587K read/write /oracle/product/db10gr2/lib/libjox10.a[shr.o] 900000000a87000 257K read/exec /usr/lib/libpthreads.a[shr_xpg5_64.o] 9001000a0274000 559K read/write /usr/lib/libpthreads.a[shr_xpg5_64.o] 900000000000800 4025K read/exec /usr/lib/libc.a[shr_64.o] 9001000a0000020 1047K read/write /usr/lib/libc.a[shr_64.o] Total 121863K
简化命令,统计私有内存,procmap 7931354|grep “read/write” |awk -F ” ” ‘{print $2}’,通过相关计算的出来,在当前的操作系统和数据库版本中,一个LOCAL=NO进程占用系统内存为:5758KB
补充说明
1.操作系统版本
$oslevel -r 6100-06
2.数据库版本
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi PL/SQL Release 10.2.0.4.0 - Production CORE 10.2.0.4.0 Production TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio NLSRTL Version 10.2.0.4.0 - Production
3.通过跟踪多个LOCAL=NO进程,发现类似进程占用的系统内存相同,估算给系统oracle进程占用的内存,可以通过该值进行大概估算
4.确认ORACLE使用的内存量不是以往认识的sga+pga,实际上应该是sga+pga+所有oracle进程占用
5.在linux中使用pmap来查看
ORACLE在AIX中产生SOFTWARE PROGRAM ABNORMALLY TERMINATED警告原因
数据库中发现如下错误
该错误的解决方案:ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]
Dump file /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/product/11.1.0/db_1 System name: AIX Node name: sgerp5 Release: 1 Version: 6 Machine: 00C8F0564C00 Instance name: sgerp5 Redo thread mounted by this instance: 1 Oracle process number: 138 Unix process pid: 7602504, image: oracle@sgerp5 (m000) *** 2012-05-11 03:52:35.200 *** SESSION ID:(752.5029) 2012-05-11 03:52:35.200 *** CLIENT ID:() 2012-05-11 03:52:35.200 *** SERVICE NAME:(SYS$BACKGROUND) 2012-05-11 03:52:35.200 *** MODULE NAME:(MMON_SLAVE) 2012-05-11 03:52:35.200 *** ACTION NAME:(Auto-Purge Slave Action) 2012-05-11 03:52:35.200 Dump continued from file: /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049][PC:0x100213C08] [Address not mapped to object] []
errpt错误说明
在产生7445错误的同时观察aix系统错误日志发现SOFTWARE PROGRAM ABNORMALLY TERMINATED错误
sgerp5_[oracle]-->errpt -aj A924A5FC --------------------------------------------------------------------------- LABEL: CORE_DUMP IDENTIFIER: A924A5FC Date/Time: Fri May 11 03:52:55 BEIST 2012 Sequence Number: 471 Machine Id: 00C8F0564C00 Node Id: sgerp5 Class: S Type: PERM WPAR: Global Resource Name: SYSPROC Description SOFTWARE PROGRAM ABNORMALLY TERMINATED Probable Causes SOFTWARE PROGRAM User Causes USER GENERATED SIGNAL Recommended Actions CORRECT THEN RETRY Failure Causes SOFTWARE PROGRAM Recommended Actions RERUN THE APPLICATION PROGRAM IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVE Detail Data SIGNAL NUMBER 6 USER'S PROCESS ID: 7602504 FILE SYSTEM SERIAL NUMBER 14 INODE NUMBER 0 367648 CORE FILE NAME /oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core PROGRAM NAME oracle STACK EXECUTION DISABLED 0 COME FROM ADDRESS REGISTER sskgmcrea 0 PROCESSOR ID hw_fru_id: 1 hw_cpu_id: 2 ADDITIONAL INFORMATION skgdbgcra 224 ?? ksdbgcra 3D0 ssexhd 978 ?? Symptom Data REPORTABLE 1 INTERNAL ERROR 0 SYMPTOM CODE PCSS/SPI2 FLDS/oracle SIG/6 FLDS/skgdbgcra VALU/224
错误原因
This error is logged when a software program abnormally ends and causes a core dump. Users might not be exiting applications correctly, the system might have been shut down while users were working in application, or the user's terminal might have locked up and the application stopped 1)这里也就是说如果oracle进程在aix机器上异常终止,并且产生了一个core dump文件, 就会出现SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息 2)用户登录系统没有正常退出,而系统被关闭 3)用户强制终止一个一个lock,而导致进程停止
本次AIX日志警告原因:由于进程7602504异常终止(ORA-07445错误)并且产生了 /oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core dump 文件,从而有了AIX中的SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息