联系:手机/微信(+86 17813235971) QQ(107644445)
标题:记录一次存储异常数据库恢复后遗症ORA-600[kafspa:columnBuffer1]错误处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
先说下前提,这个库以前是由于存储异常,找硬件厂商做了raid重组,然后我进行数据恢复的,恢复出来数据之后,应用厂商通过验证和补数据,然后迁移到另外一台机器做生产用的,这个库一直没有怎么看,最近检查数据库发现ORA-600[kafspa:columnBuffer1]错误,通过删除异常记录的方式解决.
数据库alert日志
Mon Aug 10 00:00:21 2015 LNS: Standby redo logfile selected for thread 1 sequence 617 for destination LOG_ARCHIVE_DEST_2 Mon Aug 10 00:00:33 2015 Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j002_6900.trc (incident=146517): ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_146517\xff_j002_6900_i146517.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j002_6900.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_280" ORA-20011: Approximate NDV failed: ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], [] ORA-06512: at "SYS.DBMS_STATS", line 31228
分析日志发现
*** 2015-07-19 06:00:30.231 *** SESSION ID:(578.751) 2015-07-19 06:00:30.231 *** CLIENT ID:() 2015-07-19 06:00:30.231 *** SERVICE NAME:(SYS$USERS) 2015-07-19 06:00:30.231 *** MODULE NAME:(DBMS_SCHEDULER) 2015-07-19 06:00:30.231 *** ACTION NAME:(ORA$AT_OS_OPT_SY_220) 2015-07-19 06:00:30.231 Dump continued from file: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_j001_4444.trc ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], [] ========= Dump for incident 146142 (ORA 600 [kafspa:columnBuffer1]) ======== *** 2015-07-19 06:00:30.231 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=g0q33k8qtbcpd) ----- /* SQL Analyze(1) */ select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad ………… to_char(substrb(dump(max("LIST_NO"),16,0,32),1,120)) from "CHF"."T_XIFENFEI" t …………
对表进行收集统计信息
SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE) ; BEGIN DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE); EN D; * 第 1 行出现错误: ORA-20011: Approximate NDV failed: ORA-00600: 内部错误代码, 参数: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], [] ORA-06512: 在 "SYS.DBMS_STATS", line 24232 ORA-06512: 在 "SYS.DBMS_STATS", line 24332 ORA-06512: 在 line 1 SQL> desc "CHF"."T_XIFENFEI" 名称 是否为空? 类型 ----------------------------------------- -------- ----------------- VISIT_DATE DATE ………… GETDRUG_FLAG VARCHAR2(2) …………
通过上面的alert日志和trace文件以及人工收集统计信息,基本上可以定位是由于数据库自动收集统计信息进程在进行统计信息收集之时,对于”CHF”.”T_XIFENFEI”表进行收集统计信息由于某种错误,从而出现该错误.查询mos,发现此类问题主要是由于varchar2类型存储的数据长度超过了表定义长度.
通过验证官方所说
C:\Users\Administrator>exp "'/ as sysdba'" tables="CHF"."T_XIFENFEI" file =y:/1.dmp log=y:/1.log Export: Release 11.2.0.4.0 - Production on 星期四 8月 13 11:03:22 2015 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Produc tion With the Partitioning, OLAP, Data Mining and Real Application Testing options 已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集 即将导出指定的表通过常规路径... 当前的用户已更改为 CHF . . 正在导出表 T_XIFENFEI EXP-00015: 错误出现在行 1339552 (表 T_XIFENFEI, 列 GETDRUG_FLAG), 数据类型 1 EXP-00001: 数据字段截断 - 列长度 =2, 缓冲区大小 =2 实际大小 =17Errors in file : OCI-21500: 内部错误代码, 参数: [kghfrempty:ds], [0x00652FCC8], [], [], [], [], [ ], [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- kgerinv_internal()+ CALL??? skgudmp() 000000000 006447680 000000000 139 006447680 kgerinv()+49 CALL??? kgerinv_internal() 000000001 000676B4D 0064985B0 000000000 kgerin()+49 CALL??? kgerinv() 000000018 000799612 000072000 000000000 kghnerror()+294 CALL??? kgerin() 006447680 00645092C 006447680 000000001 kghfrempty()+639 CALL??? kghnerror() 0000001F0 000000000 BE019800000000 7E01960000 kghgex()+1433 CALL??? kghfrempty()+368 000000000 00652CAD8 000000000 000000000 kghfnd()+808 CALL??? kghgex() 001004000 000000000 001BEDD10 001A7131C kghalo()+610 CALL??? kghfnd() 00012C450 00012C4A0 000000000 006446FD0 kghgex()+445 CALL??? kghalo() 006494848 000000000 001BEDD10 00190A575 kghfnd()+808 CALL??? kghgex() 000000001 0000001A0 000000000 006493D68 kghalo()+610 CALL??? kghfnd() 000000000 006447680 0FFFFFFFF 006447680 kpuhhalo()+358 CALL??? kghalo() 000000000 000000178 07FFFFFFF 000000001 kpuertb_reallocTemp CALL??? kpuhhalo() 00652C498 000003E84 001C0EA44 Buf()+192 000000000 kpuex_reallocTempBu CALL??? kpuertb_reallocTemp 000004007 0018BA3BF 00012CAB0 f()+67 Buf() 001AB296F kpudefn()+347 CALL??? kpuex_reallocTempBu 00012CC38 001004000 001BEDD44 f() 000000004 kpudfn()+1506 CALL??? kpudefn() 00012F3D0 000000004 006520044 000000000 OCIDefineByPos()+10 CALL??? kpudfn() 004327570 000000000 00012F3D0 2 000000004 00000001400116E5 CALL??? OCIDefineByPos() 1043B9300 0043B92C0 0044002B8 004401394 000000014004AFC7 CALL??? 00000001400113BA 00012F380 00012F0E0 000000068 14004B2B6 000000014001E784 CALL??? 000000014004A37E 000013F30 140095A71 140097520 14009F540 00000001400027A7 CALL??? 000000014001E39F 14009F838 00012FB5C 140097520 14009F540 000000014000102C CALL??? 0000000140001E2C 000000005 004327570 1D0D5749D21764D 000000000 000000014006BEF0 CALL??? 000000014000100E 000130000 1AFBFE2D0D8 000000000 000000000 000000007748652D CALL??? 000000014006BDD0 000000000 000000000 000000000 000000000 00000000775BC521 CALL??? 0000000077486520 000000000 000000000 000000000 000000000 call stack performance statistics: total : 0.778000 sec setup : 0.350000 sec stack unwind : 0.099000 sec symbol translation : 0.021000 sec printing the call stack: 0.304000 sec printing frame data : 0.000000 sec printing argument data : 0.000000 sec ----- End of Call Stack Trace -----
这里通过exp验证到数据在GETDRUG_FLAG列上有异常,本来定义列长度为2,可是实际数据长度为17,明显不符
通过plsql定位具体错误rowid
SQL> set serveroutput on SQL> DECLARE 2 TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER; 3 CURSOR c1 IS select /*+index(t PK_T_XIFENFEI_BAK_NEW)*/ rowid from CHF.T_XIFENFEI t; 4 r RowIDTab; 5 rows NATURAL := 20000; 6 bad_rows number := 0 ; 7 errors number; 8 error_code number; 9 myrowid rowid; 10 BEGIN 11 OPEN c1; 12 LOOP 13 FETCH c1 BULK COLLECT INTO r LIMIT rows; 14 EXIT WHEN r.count=0; 15 BEGIN 16 FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS 17 insert into CHF.T_XIFENFEI_new 18 select /*+ ROWID(A) */ * 19 from CHF.T_XIFENFEI A where rowid = r(i); 20 EXCEPTION 21 when OTHERS then 22 BEGIN 23 errors := SQL%BULK_EXCEPTIONS.COUNT; 24 FOR err1 IN 1..errors LOOP 25 error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE; 26 myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX); 27 bad_rows := bad_rows + 1; 28 insert into system.had_rows values('CHF.T_XIFENFEI',myrowid, error_code); 29 END LOOP; 30 END; 31 END; 32 commit; 33 END LOOP; 34 commit; 35 CLOSE c1; 36 dbms_output.put_line('Total Bad Rows: '||bad_rows); 37 END; 38 / Total Bad Rows: 1 PL/SQL 过程已成功完成。 SQL> SELECT row_id FROM system.had_rows ; ROW_ID ------------------ AAAT8wAAEAAAM29AAX SQL> select * from CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX'; select * from CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX' * 第 1 行出现错误: ORA-00600: 内部错误代码, 参数: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
通过这里基本上可以定位到是该条rowid记录有问题,和业务进行沟通,确定该条记录可以删除(也不能访问,其实不删除也没用)
删除异常记录
SQL> delete from CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX'; 已删除 1 行。 SQL> commit; 提交完成。
收集统计信息
SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE) ; PL/SQL 过程已成功完成。
通过清理异常记录,数据库可以正常收集统计信息,未再报ORA-00600[kafspa:columnBuffer1]错误,故障较完美解决
补充几个现象
1. analyze table “CHF”.”T_XIFENFEI” estimate statistics; 分析表统计信息正常,但是dbms_stats收集报错(因为dbms_stats相当对于每个列进行了扫描,而analyze应该不是)
2. 在报ORA-00600[kafspa:columnBuffer1]的情况下,ctas依旧可以成功,但是普通插入不行(因为ctas相当加油append操作),因此在有些情况下,需要慎重append(特别是有逻辑坏块的时候)