标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,671)
- DB2 (22)
- MySQL (73)
- Oracle (1,533)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (560)
- Oracle安装升级 (92)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (78)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- Kylin Linux 安装19c
- ORA-600 krse_arc_complete.4
- Oracle 19c 202410补丁(RUs+OJVM)
- ntfs MFT损坏(ntfs文件系统故障)导致oracle异常恢复
- .mkp扩展名oracle数据文件加密恢复
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
分类目录归档:Oracle RAC
has a disk HB, but no network HB—-traceroute不通导致
重启crs发现cssd进程无法正常启动
明显私网异常,进一步分析发现私网相互可以ping,但是无法traceroute其他节点
客户反馈近期安装了安全软件,客户停掉安全软件之后,traceroute恢复正常
集群也正常启动
[root@his01 cssd]# crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE his01 ONLINE ONLINE his02 ora.FRA.dg ONLINE ONLINE his01 ONLINE ONLINE his02 ora.LISTENER.lsnr ONLINE ONLINE his01 ONLINE ONLINE his02 ora.OCRVOTE.dg ONLINE ONLINE his01 ONLINE ONLINE his02 ora.asm ONLINE ONLINE his01 Started ONLINE ONLINE his02 Started ora.gsd OFFLINE OFFLINE his01 OFFLINE OFFLINE his02 ora.net1.network ONLINE ONLINE his01 ONLINE ONLINE his02 ora.ons ONLINE ONLINE his01 ONLINE ONLINE his02 ora.registry.acfs ONLINE ONLINE his01 ONLINE ONLINE his02 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE his02 ora.cvu 1 ONLINE ONLINE his02 ora.his01.vip 1 ONLINE ONLINE his01 ora.his02.vip 1 ONLINE ONLINE his02 ora.oc4j 1 ONLINE ONLINE his02 ora.orcl.db 1 ONLINE ONLINE his01 Open 2 ONLINE ONLINE his02 Open ora.scan1.vip 1 ONLINE ONLINE his02
12.1人工修改操作系统时间导致数据库异常
有客户数据库版本为12.1.0.1 版本RAC,突发发生重启,让协助分析原因
数据库alert日志报ORA-15064错误
Mon Apr 15 15:06:26 2019 WARNING: inbound connection timed out (ORA-3136) Mon Apr 15 15:41:26 2019 NOTE: ASMB terminating Mon Apr 15 15:41:26 2019 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_asmb_61426.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 1892 Serial number: 29 Mon Apr 15 15:41:26 2019 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_asmb_61426.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 1892 Serial number: 29 Mon Apr 15 15:41:26 2019 System state dump requested by (instance=1, osid=61426 (ASMB)), summary=[abnormal instance termination]. Mon Apr 15 15:41:26 2019 USER (ospid: 61426): terminating the instance due to error 15064 Mon Apr 15 15:41:26 2019 System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_61287.trc Mon Apr 15 15:41:27 2019 opiodr aborting process unknown ospid (1171) as a result of ORA-1092 Mon Apr 15 15:41:27 2019 ORA-1092 : opitsk aborting process
这里看,明显asmb异常导致数据库无法正常访问asm从而出现数据库crash的问题.
分析asm日志
Mon Apr 15 15:41:26 2019 WARNING: client [+ASM1:+ASM] not responsive for 2069s; state=0x1. pid 23155 NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc WARNING: client [orcl1:orcl] not responsive for 2069s; state=0x1. killing pid 61436 NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc WARNING: fencing client [orcl1:orcl] after 2069 seconds (mbr 2) WARNING: client [-MGMTDB:_mgmtdb] not responsive for 2070s; state=0x1. killing pid 24026 NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc WARNING: fencing client [-MGMTDB:_mgmtdb] after 2070 seconds (mbr 1) Mon Apr 15 15:41:26 2019 NOTE: cleaned up ASM client -MGMTDB:_mgmtdb NOTE: cleaned up ASM client orcl1:orcl Mon Apr 15 15:41:43 2019 NOTE: Standard client -MGMTDB:_mgmtdb registered, osid 183707, mbr 0x1 (reg:1371965153) Mon Apr 15 15:42:16 2019 NOTE: Standard client orcl1:orcl registered, osid 184063, mbr 0x2 (reg:2088418628) Mon Apr 15 15:44:30 2019 Warning: VKTM detected a time drift. Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
asm日志中和mos中的GEN0 terminating the ASM instance due to error 15082 (文档 ID 2096988.1)描述比较匹配.根据客户反馈,他们使用ntp进行修改了时间,基本上可以确定是由于oracle的Bug 19032250(在12.1.0.2中修复)在ntp修改时间跨度过大触发的相关问题(人工直接修改时间也可能出现类似问题)
对于rac修改时间建议
1. 如果时间慢了,关闭数据库和集群直接把时间向前调整,启动集群和数据库
2. 如果时间快了,关闭数据库和集群等实际时间过关闭集群和库的时间之后,再往回调整时间,启动集群和数据库
发表在 Oracle RAC
评论关闭
私网直连后遗症:一节点无法启动导致另外节点haip无法启动
该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动
# crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE xifenfei1 Started ora.cluster_interconnect.haip >>>> OFFLINE 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE xifenfei1 ora.crsd 1 ONLINE OFFLINE >>>> OFFLINE ora.cssd 1 ONLINE ONLINE xifenfei1 ora.cssdmonitor 1 ONLINE ONLINE xifenfei1 ora.ctssd 1 ONLINE ONLINE xifenfei1 OBSERVER ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE xifenfei1 ora.evmd 1 ONLINE INTERMEDIATE xifenfei1 ora.gipcd 1 ONLINE ONLINE xifenfei1 ora.gpnpd 1 ONLINE ONLINE xifenfei1 ora.mdnsd 1 ONLINE ONLINE xifenfei1
alerthostname日志
2018-09-02 10:38:56.767: [/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log. 2018-09-02 10:39:00.771: [ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log. 2018-09-02 10:40:00.802: [/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log. 2018-09-02 10:40:04.806: [ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
orarootagent_root日志
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults 2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16 2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage. 2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0. [ CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000] 2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces 2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces
gipcd.log日志
2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces 2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client 2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb 2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)] 2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist 2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query 2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery 2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0
ohasd.log日志
2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd 2018-09-02 10:38:57.255: [ AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root 2018-09-02 10:38:57.255: [ AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500 2018-09-02 10:38:57.255: [ CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500 2018-09-02 10:38:57.256: [ CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log". 2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动
[root@xifenfei1 rules.d]# ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: no ====>网卡链路状态异常 [root@xifenfei1 rules.d]# ifconfig eth0 Link encap:Ethernet HWaddr 6C:92:BF:2B:7B:36 inet addr:10.10.17.42 Bcast:172.17.17.255 Mask:255.255.255.0 inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --------->注意 RX packets:234424 errors:0 dropped:0 overruns:0 frame:0 TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16926236 (16.1 MiB) TX bytes:24269882 (23.1 MiB) Memory:91160000-91180000 eth1 Link encap:Ethernet HWaddr 6C:92:BF:2B:7B:37 inet addr:11.1.1.2 Bcast:11.1.1.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 --------->注意少了RUNNING RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Memory:91140000-91160000
关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)