分类目录归档:Oracle RAC

删除ora.asmgroup资源offline记录

采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在

[grid@dbserver1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.chad
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.net1.network
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.ons
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.proxy_advm
               OFFLINE OFFLINE      dbserver1                STABLE
               OFFLINE OFFLINE      dbserver2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.FRA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.SYSDG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                Started,STABLE
      2        ONLINE  ONLINE       dbserver2                Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver2.vip
      1        ONLINE  ONLINE       dbserver2                STABLE
ora.xff.db
      1        ONLINE  ONLINE       dbserver1                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       dbserver2                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
--------------------------------------------------------------------------------
[grid@dbserver1 ~]$ srvctl modify asm -count 2
[grid@dbserver1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.chad
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.net1.network
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.ons
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.proxy_advm
               OFFLINE OFFLINE      dbserver1                STABLE
               OFFLINE OFFLINE      dbserver2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.FRA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.SYSDG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                Started,STABLE
      2        ONLINE  ONLINE       dbserver2                Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.cvu
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver2.vip
      1        ONLINE  ONLINE       dbserver2                STABLE
ora.xff.db
      1        ONLINE  ONLINE       dbserver1                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       dbserver2                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
--------------------------------------------------------------------------------
[grid@dbserver1 ~]$ 
发表在 Oracle RAC | 标签为 | 评论关闭

网卡异常导致数据库实例启动异常

一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志

Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Receiver ospid 6386 [
Tue Mar 07 19:07:29 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc:
IPC Send timeout detected. Receiver ospid 6402 [
Tue Mar 07 19:07:29 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc:
Tue Mar 07 19:07:29 2023
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc
LMD0 (ospid: 6384): terminating the instance due to error 481
Dumping diagnostic data in directory=[cdmp_20230307190729],
      requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination].
Instance terminated by LMD0, pid = 6384

正常节点alert日志

Tue Mar 07 19:02:07 2023
Reconfiguration started (old inc 20, new inc 22)
List of instances:
 1 2 (myinst: 1)
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue Mar 07 19:02:08 2023
 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
Tue Mar 07 19:02:08 2023
 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Tue Mar 07 19:02:27 2023
IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)]
Receiver: inst 2 binc 441249706 ospid 59731
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)]
Receiver: inst 2 binc 429479852 ospid 6386
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)]
Receiver: inst 2 binc 429479854 ospid 6402
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]

通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
20230307230341


一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置

net.ipv4.ipfrag_high_thresh = 16777216
net.ipv4.ipfrag_low_thresh = 15728640

临时请库成功,但是数据库实例重组时间依旧过长
20230307230658


packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常
20230307230813

为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常
20230308001033

后续等网络层面解决之后再启用该网卡

发表在 Oracle RAC | 评论关闭

11.2 crs启动超时dd npohasd 处理

客户由于光纤链路故障导致表决盘异常从而使得主机重启,主机重启之后,集群没有正常启动
操作系统和crs版本

[root@rac1 ~]# cat /etc/redhat-release 
CentOS release 6.9 (Final)
[root@rac1 ~]# sqlplus -v

SQL*Plus: Release 11.2.0.4.0 Production

人工启动crs hang住一段时间然后报错

[root@rac1 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

查看启动进程

[grid@rac1 ~]$ ps -ef|grep d.bin
root       7043      1  0 11:48 ?        00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot
root       8311      1  0 11:53 ?        00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot
grid      10984  10954  0 12:10 pts/2    00:00:00 grep d.bin

根据经验这个故障很可能就是BUG:17229230 – DURING REBOOT, “OHASD.BIN REBOOT” REMAINS SLEEPING,临时解决方案,一个会话启动crs,然后在另外一个会话发起

/bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

后续crs启动正常

[root@rac1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@rac1 ~]# crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown   
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                                                   
ora.crf
      1        ONLINE  ONLINE       rac1                                         
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  OFFLINE                               STARTING            
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1                                         
ora.ctssd
      1        ONLINE  OFFLINE                                                   
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  OFFLINE                                                   
ora.gipcd
      1        ONLINE  ONLINE       rac1                                         
ora.gpnpd
      1        ONLINE  ONLINE       rac1                                         
ora.mdnsd
      1        ONLINE  ONLINE       rac1                                         

终止dd命令,集群启动正常

发表在 Oracle RAC | 标签为 | 评论关闭