分类目录归档:Oracle RAC

udev_start导致vip漂移(常见情况:rac在线加盘操作引起)

客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断
20230513203654


优先恢复业务,把所有vip漂移回来

[grid@rac3 ~]$  srvctl relocate vip -i rac1 -n rac1 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac2 -n rac2 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac3 -n rac3 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac4 -n rac4 -f -v
VIP was relocated successfully.

vip恢复正常,业务也恢复正常
20230513203712


出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移
20230513212316

查看ifcfg配置文件
20230513212440

引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” (pulbic,private和其他需要关注的网络)
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)
20230513213713

发表在 Linux, Oracle RAC | 标签为 , , | 评论关闭

删除ora.asmgroup资源offline记录

采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在

[grid@dbserver1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.chad
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.net1.network
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.ons
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.proxy_advm
               OFFLINE OFFLINE      dbserver1                STABLE
               OFFLINE OFFLINE      dbserver2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.FRA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.SYSDG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                Started,STABLE
      2        ONLINE  ONLINE       dbserver2                Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver2.vip
      1        ONLINE  ONLINE       dbserver2                STABLE
ora.xff.db
      1        ONLINE  ONLINE       dbserver1                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       dbserver2                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
--------------------------------------------------------------------------------
[grid@dbserver1 ~]$ srvctl modify asm -count 2
[grid@dbserver1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.chad
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.net1.network
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.ons
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.proxy_advm
               OFFLINE OFFLINE      dbserver1                STABLE
               OFFLINE OFFLINE      dbserver2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.FRA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.SYSDG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                Started,STABLE
      2        ONLINE  ONLINE       dbserver2                Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.cvu
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver2.vip
      1        ONLINE  ONLINE       dbserver2                STABLE
ora.xff.db
      1        ONLINE  ONLINE       dbserver1                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       dbserver2                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
--------------------------------------------------------------------------------
[grid@dbserver1 ~]$ 
发表在 Oracle RAC | 标签为 | 评论关闭

网卡异常导致数据库实例启动异常

一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志

Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Receiver ospid 6386 [
Tue Mar 07 19:07:29 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc:
IPC Send timeout detected. Receiver ospid 6402 [
Tue Mar 07 19:07:29 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc:
Tue Mar 07 19:07:29 2023
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc
LMD0 (ospid: 6384): terminating the instance due to error 481
Dumping diagnostic data in directory=[cdmp_20230307190729],
      requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination].
Instance terminated by LMD0, pid = 6384

正常节点alert日志

Tue Mar 07 19:02:07 2023
Reconfiguration started (old inc 20, new inc 22)
List of instances:
 1 2 (myinst: 1)
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue Mar 07 19:02:08 2023
 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
Tue Mar 07 19:02:08 2023
 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Tue Mar 07 19:02:27 2023
IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)]
Receiver: inst 2 binc 441249706 ospid 59731
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)]
Receiver: inst 2 binc 429479852 ospid 6386
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)]
Receiver: inst 2 binc 429479854 ospid 6402
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]

通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
20230307230341


一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置

net.ipv4.ipfrag_high_thresh = 16777216
net.ipv4.ipfrag_low_thresh = 15728640

临时请库成功,但是数据库实例重组时间依旧过长
20230307230658


packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常
20230307230813

为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常
20230308001033

后续等网络层面解决之后再启用该网卡

发表在 Oracle RAC | 评论关闭