标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,670)
- DB2 (22)
- MySQL (73)
- Oracle (1,532)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (91)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
- Linux 8 修改网卡名称
分类目录归档:Oracle RAC
udev_start导致vip漂移(常见情况:rac在线加盘操作引起)
客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断
优先恢复业务,把所有vip漂移回来
[grid@rac3 ~]$ srvctl relocate vip -i rac1 -n rac1 -f -v VIP was relocated successfully. [grid@rac3 ~]$ srvctl relocate vip -i rac2 -n rac2 -f -v VIP was relocated successfully. [grid@rac3 ~]$ srvctl relocate vip -i rac3 -n rac3 -f -v VIP was relocated successfully. [grid@rac3 ~]$ srvctl relocate vip -i rac4 -n rac4 -f -v VIP was relocated successfully.
出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移
查看ifcfg配置文件
引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” (pulbic,private和其他需要关注的网络)
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)
删除ora.asmgroup资源offline记录
采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在
[grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$ srvctl modify asm -count 2 [grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$
网卡异常导致数据库实例启动异常
一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志
Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Receiver ospid 6386 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc: IPC Send timeout detected. Receiver ospid 6402 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc: Tue Mar 07 19:07:29 2023 Received an instance abort message from instance 1 Please check instance 1 alert and LMON trace files for detail. System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc LMD0 (ospid: 6384): terminating the instance due to error 481 Dumping diagnostic data in directory=[cdmp_20230307190729], requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. Instance terminated by LMD0, pid = 6384
正常节点alert日志
Tue Mar 07 19:02:07 2023 Reconfiguration started (old inc 20, new inc 22) List of instances: 1 2 (myinst: 1) Global Resource Directory frozen Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Tue Mar 07 19:02:08 2023 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 Tue Mar 07 19:02:08 2023 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Tue Mar 07 19:02:27 2023 IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)] Receiver: inst 2 binc 441249706 ospid 59731 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)] Receiver: inst 2 binc 429479852 ospid 6386 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)] Receiver: inst 2 binc 429479854 ospid 6402 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]
通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置
net.ipv4.ipfrag_high_thresh = 16777216 net.ipv4.ipfrag_low_thresh = 15728640
packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常
为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常
后续等网络层面解决之后再启用该网卡
发表在 Oracle RAC
评论关闭