模拟19c数据库pdb undo异常恢复

对于19c在pdb情况下三种常见故障进行了模拟测试:
模拟19c数据库redo异常恢复
模拟19c数据库pdb undo异常恢复
模拟19c数据库root pdb undo异常恢复
测试在有事务的情况下,删除pdb中的undo数据库异常情况测试
会话1在root pdb中删除表记录,不提交

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO
SQL> delete from system.t_xifenfei;

2316160 rows deleted.

会话2在pdb中删除表记录,不提交

SQL> conn xff/oracle@127.0.0.1/pdb
Connected.
SQL> delete from xff.t_xifenfei;

72351 rows deleted.

会话3 直接abort库

SQL> shutdown abort;
ORACLE instance shut down.

删除pdb中undo文件

[oracle@localhost pdb]$ rm -rf undotbs01.dbf 
[oracle@localhost pdb]$ 

启动数据库

SQL> startup
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 11 - see DBWR trace file
ORA-01110: data file 11: '/u01/app/oracle/oradata/ORA19C/pdb/undotbs01.dbf'

offline异常文件,再open库

SQL> alter database datafile 11 offline drop;
alter database datafile 11 offline drop
*
ERROR at line 1:
ORA-01516: nonexistent log file, data file, or temporary file "11" in the current container

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 PDB                            MOUNTED
SQL> alter session set container=pdb;

Session altered.

SQL> alter database datafile 11 offline drop;

Database altered.

SQL> conn / as sysdba
Connected.

SQL> startup 
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
Database opened.

open pdb

SQL> alter session set container=pdb;

Session altered.

SQL> 
SQL> alter database open;

Database altered.

SQL> conn / as sysdba
Connected.
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO

测试中库open比较简单,后续只要对异常undo进行处理即可

SQL> create undo tablespace undotbs2 datafile '/u01/app/oracle/oradata/ORA19C/pdb/undotbs02.dbf' size 128M autoextend on;

Tablespace created.

SQL> alter system set undo_tablespace=undotbs2;

System altered.

SQL> select tablespace_name,segment_name,status from dba_rollback_segs;

TABLESPACE_NAME                SEGMENT_NAME                   STATUS
------------------------------ ------------------------------ --------------------------------
SYSTEM                         SYSTEM                         ONLINE
UNDOTBS1                       _SYSSMU1_3588498444$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU2_2971032042$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU3_3657342154$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU4_811969446$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU5_3018429039$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU6_442110264$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU7_2728255665$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU8_801938064$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU9_647420285$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU10_2262159254$          NEEDS RECOVERY

SQL>  drop tablespace undotbs1 including contents and datafiles;

Tablespace dropped.

在测试中,undo有事务的情况下,数据库可以正常open,而且运行了一段时间未crash,在这个方面确实比11g及其以前版本有很大改进.当然由于测试环境本身比较单一,可能实际生产中会此类故障处理比较复杂

发表在 Oracle | 标签为 | 评论关闭

模拟19c数据库redo异常恢复

对于19c在pdb情况下三种常见故障进行了模拟测试:
模拟19c数据库redo异常恢复
模拟19c数据库pdb undo异常恢复
模拟19c数据库root pdb undo异常恢复
模拟oracle 19c数据库redo丢失的恢复操作,模拟数据库有事务,在没有提交的情况下redo丢失故障

[oracle@localhost oradata]$ ss

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Nov 16 16:11:16 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0

SQL> conn xff/oracle@127.0.0.1/pdb
Connected.
SQL> create table t_xifenfei as select * from dba_objects;

Table created.

SQL> insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;

72351 rows created.

SQL> 
144702 rows created.

SQL> 
289404 rows created.

SQL> 
578808 rows created.

SQL> 

1157616 rows created.

另外一个会话kill数据库并且删除redo

[root@localhost ~]# ps -ef|grep pmon
oracle    38500      1  0 16:08 ?        00:00:00 ora_pmon_ora19c
root      39030  39009  0 16:11 pts/2    00:00:00 grep --color=auto pmon
[root@localhost ~]# kill -9 38500
[root@localhost ~]# ps -ef|grep pmon
root      39042  39009  0 16:11 pts/2    00:00:00 grep --color=auto pmon
[root@localhost ~]# ls -l /u01/app/oracle/oradata/ORA19C/redo*.log
ls: cannot access /u01/app/oracle/oradata/ORA19C/redo*.log: No such file or directory

启动数据库报错ORA-00313 ORA-00312 ORA-27037

SQL> startup 
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/oradata/ORA19C/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7

因为redo全部丢失只能尝试强制拉库

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
SQL>  recover database until cancel;
ORA-00279: change 2335666 generated at 11/16/2020 16:08:42 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/product/19.2/db_1/dbs/arch1_12_1056620100.dbf
ORA-00280: change 2335666 for thread 1 is in sequence #12


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/ORA19C/system01.dbf'


ORA-01112: media recovery not started


SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],
[], [], [], [], [], [], []
Process ID: 39588
Session ID: 9 Serial number: 32012

数据库报ORA-600 kcbzib_kcrsds_1错误是由于在强制拉库过程中文件异常导致,通过对异常文件进行处理数据库open成功

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> 

SQL> alter session set container=pdb;

Session altered.

SQL> alter database open;

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO

这个是模拟redo丢失或者损坏故障,在实际的生产故障中可能要比这个复杂很多.

发表在 Oracle | 标签为 , , | 评论关闭

记录一次pdb恢复过程中遇到的大量bug

12C版本使用pdb的Oracle数据库,由于在创建index的过程中强制终止,导致业务大量阻塞,然后重启数据库几次之后直接crash,最后直接无法open成功,报ORA-00600 6856

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 22, block # 993741)
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10564: tablespace CWBASEMS01
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 76286
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [],[], [], [], [], []
2020-11-14T15:56:50.736722+08:00
Started redo scan
2020-11-14T15:56:50.977023+08:00
Completed redo scan
 read 82825 KB redo, 10769 data blocks need recovery
2020-11-14T15:56:51.147256+08:00
Started redo application at
 Thread 1: logseq 120309, block 48, offset 0
 Thread 2: logseq 74989, block 2, offset 16, scn 0x00000000f69e1f8d
2020-11-14T15:56:51.151007+08:00
Recovery of Online Redo Log: Thread 1 Group 1 Seq 120309 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_1.262.1023806467
2020-11-14T15:56:51.153989+08:00
Recovery of Online Redo Log: Thread 2 Group 7 Seq 74989 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_7.274.1023806785
Errors in file /u01/app/oracle/……/xifenfei1/trace/xifenfei1_p00d_469777.trc(incident=10079552)(PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [], [], [], [], [], []
2020-11-14T15:56:52.089726+08:00
(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

这个错误比较明显是由于ORA-600 6856错误导致数据库在启动的时候无法进行实例恢复,出现这个错误原因是由于客户创建index的过程中强制终止引起Bug 17437634 – ORA-1578 or ORA-600 [6856] transient in-memory corruption on TEMP segment during transaction recovery / ROLLBACK (eg: after Ctrl-C) – superseded (Doc ID 17437634.8),屏蔽该文件实例恢复,cdb启动成功,但是pdb无法正常open

SQL> alter session set container=pdb1;
 
Session altered.

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check:fno_system], [3], [], []
Process ID: 224476
Session ID: 13311 Serial number: 59525

这个错误比较明显是由于ORA-600 kcffo_online_pdb_check:fno_system,数据库未正常检测到pdb的system文件导致该问题,通过对pdb的system文件进行操作,让数据库识别到该文件,然后继续open库

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt],[3],[4138003978],[4289274940], [2]
Process ID: 224476
Session ID: 13311 Serial number: 59525

该错误是由于数据库在恢复的过程中推了scn,触发了oracle 某种bug导致该问题,通过一些操作之后,数据库可以open,尝试temp表空间增加临时数据文件报ORA-00600 [kcffo_add_tmpf-1] 错误(Bug 29379978 – ORA-00600 [kcffo_add_tmpf-1] when trying to add temp file (Doc ID 29379978.8)).由于该文件无法加入,数据库无法导出
20201115115951


最后没有办法换了思路直接bbed修改文件头,open cdb库,然后open pdb,顺利导出数据.这次的恢复中,深刻的体验到pdb在open过程中的各种bug,实在比较厌烦.

发表在 Oracle备份恢复 | 标签为 , , , | 评论关闭