标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,716)
- DB2 (22)
- MySQL (74)
- Oracle (1,576)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (160)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (575)
- Oracle安装升级 (94)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (81)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (28)
- SQL Server恢复 (9)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- 不当使用_allow_resetlogs_corruption参数引起ORA-600 2662错误
- CSSD signal 11 in thread clssnmRcfgMgrThread故障处理
- 使用sid方式直接访问pdb(USE_SID_AS_SERVICE_LISTENER)
- ORA-00069: cannot acquire lock — table locks disabled for xxxx
- ORA-600 [4000] [a]相关bug
- sql server数据库“正在恢复”故障处理
- 如何判断数据文件是否处于begin backup状态
- CDM备份缺少归档打开数据库报ORA-600 kcbzib_kcrsds_1故障处理
- ORA-07445: exception encountered: core dump [expgod()+43] [IN_PAGE_ERROR]
- 2025年第一起ORA-600 16703故障恢复
- _gc_undo_affinity=FALSE触发ORA-01558
- public授权语句
- 中文环境显示AR8MSWIN1256(阿拉伯语字符集)
- 处理 Oracle 块损坏
- Oracle各种类型坏块说明和处理
- fio测试io,导致磁盘文件系统损坏故障恢复
- ORA-742 写丢失常见bug记录
- Oracle 19c 202501补丁(RUs+OJVM)-19.26
- 避免 19c 数据库性能问题需要考虑的事项 (Doc ID 3050476.1)
- Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]
分类目录归档:Oracle RAC
删除ora.asmgroup资源offline记录
采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在
[grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$ srvctl modify asm -count 2 [grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$
网卡异常导致数据库实例启动异常
一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志
Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Receiver ospid 6386 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc: IPC Send timeout detected. Receiver ospid 6402 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc: Tue Mar 07 19:07:29 2023 Received an instance abort message from instance 1 Please check instance 1 alert and LMON trace files for detail. System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc LMD0 (ospid: 6384): terminating the instance due to error 481 Dumping diagnostic data in directory=[cdmp_20230307190729], requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. Instance terminated by LMD0, pid = 6384
正常节点alert日志
Tue Mar 07 19:02:07 2023 Reconfiguration started (old inc 20, new inc 22) List of instances: 1 2 (myinst: 1) Global Resource Directory frozen Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Tue Mar 07 19:02:08 2023 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 Tue Mar 07 19:02:08 2023 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Tue Mar 07 19:02:27 2023 IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)] Receiver: inst 2 binc 441249706 ospid 59731 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)] Receiver: inst 2 binc 429479852 ospid 6386 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)] Receiver: inst 2 binc 429479854 ospid 6402 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]
通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置
net.ipv4.ipfrag_high_thresh = 16777216 net.ipv4.ipfrag_low_thresh = 15728640
packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常

为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常

后续等网络层面解决之后再启用该网卡
发表在 Oracle RAC
评论关闭
11.2 crs启动超时dd npohasd 处理
客户由于光纤链路故障导致表决盘异常从而使得主机重启,主机重启之后,集群没有正常启动
操作系统和crs版本
[root@rac1 ~]# cat /etc/redhat-release CentOS release 6.9 (Final) [root@rac1 ~]# sqlplus -v SQL*Plus: Release 11.2.0.4.0 Production
人工启动crs hang住一段时间然后报错
[root@rac1 ~]# crsctl start crs CRS-4640: Oracle High Availability Services is already active CRS-4000: Command Start failed, or completed with errors.
查看启动进程
[grid@rac1 ~]$ ps -ef|grep d.bin root 7043 1 0 11:48 ? 00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot root 8311 1 0 11:53 ? 00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot grid 10984 10954 0 12:10 pts/2 00:00:00 grep d.bin
根据经验这个故障很可能就是BUG:17229230 – DURING REBOOT, “OHASD.BIN REBOOT” REMAINS SLEEPING,临时解决方案,一个会话启动crs,然后在另外一个会话发起
/bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
后续crs启动正常
[root@rac1 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rac1 ~]# crsctl status res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE rac1 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE STARTING ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1
终止dd命令,集群启动正常