异常症状

2012-12-21下午Oracle ERP在销售模块做ATO产品订单进展时,没有弹出任何错误,也没有 弹出订单进展的”确认”提示,直接跳到订单头信息,行信息中也没有出现带*的标准成品料 号。且所有的ATO订单都无法完成订单进展。导致特制单不能及时录入系统,问题较为严重。

解决思路

使用topas查看操作系统是否有异常进程;

登录系统查看并发请求状态;

查系统当前会话、进程数;

看数据库警告日志;

详细过程

1. OS进程检查

登录到erp,运行topas观察系统进程使用资源情况。

2. 并发请求

职责路径:system administrator » Concurrent » Manager » Administer 此时再查看并发管理运行状态,等待处理的请求 因只有系统管理员有该权限,大家也可以通过SQL脚本进行查询,脚本如下(可以APPS运行):

WITH requests_count AS
(SELECT COUNT(1) counts
   	FROM apps.fnd_concurrent_requests fcr
WHERE fcr.request_date > trunc(SYSDATE - 1)
 AND fcr.requested_start_date < trunc(SYSDATE + 1)
 AND fcr.phase_code IN ('P', 'R'))
SELECT 'Attention! Concurrent requests are up to ' || rc.counts "attention"
 	FROM requests_count rc
WHERE rc.counts > 300;

3. 系统当前进程、会话数

Oracle数据库当前数据库设置的最大进程数、会话数在数据初始参数中已经设置,可以查询如下:

1)使用PL/SQL Developer,打开Command Windows即可(以APPS运行)

SQL> show parameter processes
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes                      integer     2
db_writer_processes                  integer     1
job_queue_processes                  integer     10
log_archive_max_processes            integer     2
processes                            integer     800
 
SQL> show parameter sessions
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
java_max_sessionspace_size           integer     0
java_soft_sessionspace_limit         integer     0
license_max_sessions                 integer     0
license_sessions_warning             integer     0
logmnr_max_persistent_sessions       integer     1
mts_sessions                         integer     0
sessions                             integer     1600
shared_server_sessions               integer     0

2)使用SQL脚本查询,打开SQL Windows,或者SQLPlus(以APPS运行)

select * from v$parameter p where p.NAME in ('processes','sessions');

再查看当前系统的进程数、会话数。

SELECT COUNT(*) FROM v$process;	--当前进程数

SELECT COUNT(*) FROM v$session; --当前会话数

当以上数目非常接近系统设定的最大值时,系统就可能出现类似以上的问题,主要是超出最 大会话数,无法进行新的会话、连接等响应。

此时一般会手动处理一些过期的会话,主要是以下方式

--类似session/cookie的会话
SELECT 'kill -9 ' || b.spid
  FROM v$session a,
       v$process b,
       v$sqltext c
 WHERE a.paddr = b.addr
   AND a.username IS NOT NULL
   AND c.hash_value = a.sql_hash_value
   AND c.sql_text LIKE '%SESSION_COOKIE_DOMAIN%'
   AND a.sid != (SELECT sid FROM v$mystat WHERE rownum = 1);

--当前之前还没有正常回收的forms会话
SELECT 'alter system kill session' || '''' || s.sid || ',' || s.serial# || ''';' oracle_level_kill
  FROM v$session s,
       v$process p
 WHERE s.paddr = p.addr
   AND s.sid IN (SELECT se.sid
                   FROM v$session se
                  WHERE sid IN (SELECT session_id FROM v$locked_object)
                    AND se.logon_time < trunc(SYSDATE))
   AND s.action LIKE 'FRM:%'
   AND s.STATUS = 'INACTIVE';

4.查看数据库警告日志

数据库警告日志中记录了数据库在运行过程中的状态信息,包括发生的错误。 登录到服务器主机,如prodora,警告日志文件路径:

注:
以上路径也是在数据库的初始化参数设定,查询如下:
--PL/SQL Developer的Command Windows
SQL> show parameter background_dump_dest;
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/bdump

--sql脚本
select * from v$parameter p where p.NAME in ('background_dump_dest');

如今天的警告日志截取如下:
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_5173482.trc:
ORA-00600: 内部错误代码,参数: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: 内部错误,参数: [77406], [], [], [], [], [], [], []
ORA-06553: PLS-801: 内部错误 [77406]
ORA-00018: 超出最大会话数
ORA-06512: 在"APPS.FND_SIGNON", line 239
Fri Dec 21 15:32:03 2012
Timed out trying to start process J002.
Fri Dec 21 15:32:43 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_3883030.trc:
ORA-00600: 内部错误代码,参数: [4414], [0], [0], [5454], [2], [], [], []
ORA-00018: 超出最大会话数
ORA-00018: 超出最大会话数
Fri Dec 21 15:33:55 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_688340.trc:
ORA-00600: 内部错误代码,参数: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: 内部错误,参数: [77406], [], [], [], [], [], [], []
ORA-06553: PLS-801: 内部错误 [77406]
ORA-00018: 超出最大会话数
ORA-06512: 在"APPS.FND_SIGNON", line 239
Fri Dec 21 15:33:55 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_6520900.trc:
ORA-00600: 内部错误代码,参数: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: 内部错误,参数: [77406], [], [], [], [], [], [], []
ORA-06553: PLS-801: 内部错误 [77406]
ORA-20001: Oracle 错误 -18:FND_SIGNON.AUDIT_END 中检测到 ORA-00018: 超出最大会话数。
ORA-06512: 在"APPS.APP_EXCEPTION", line 72
ORA-06512: 在"APPS.FND_SIGNON", line 18
ORA-06512: 在"APPS.FND_SIGNON", line 317
Fri Dec 21 15:34:43 2012
Completed checkpoint up to RBA [0x514c6.2.10], SCN: 0x0771.4352af6c
Fri Dec 21 15:40:51 2012
Restarting dead background process QMN0
QMN0 started with pid=600
Fri Dec 21 15:41:06 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_6520900.trc:
ORA-00600: 内部错误代码,参数: [17285], [0x1103D2D58], [4294967295], [0x70000008B9DF418], [], [], [], []
Fri Dec 21 15:41:07 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_6520900.trc:
ORA-00600: 内部错误代码,参数: [17285], [0x1103D2D58], [4294967295], [0x70000008B9DF418], [], [], [], []
ORA-00600: 内部错误代码,参数: [17285], [0x1103D2D58], [4294967295], [0x70000008B9DF418], [], [], [], []
Fri Dec 21 15:42:37 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_4214832.trc:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00018: 超出最大会话数
ORA-06512: 在line 1
Fri Dec 21 15:42:40 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_3190972.trc:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00018: 超出最大会话数
ORA-06512: 在line 1

通过发生的ORA-错误,以及应用的异常时间,基本可以确定是由于会话引起的以上异常,清 理一些过期的会话就可以解决。

特别说明

在操作系统中已经做了一些cron任务,定期预警或者执行会话清理。主要如下:

  • 定期清理过期会话(session)
  • 定期检查死锁(deadlock)
  • 定期检查请求(requests)
  • 定期检查无效对象、表空间等(invalidobject, tablespace, tablespaces)
  • 定期检查数据库警告日志(checkalert.sh)

以上脚本均会发送邮件通知。

查看prodora定义的cron job

$ crontab -l
45 07,13,16 * * *   /u1/PROD/prodora/dailyduty.sh session  > /dev/null 2>&1
00 08 * * *   /u1/PROD/prodora/dailyduty.sh database > /dev/null 2>&1
05 08 * * *   /u1/PROD/prodora/dailyduty.sh instance > /dev/null 2>&1
10 08 * * *   /u1/PROD/prodora/dailyduty.sh datafiles > /dev/null 2>&1
15 08 * * *   /u1/PROD/prodora/dailyduty.sh tablespaces > /dev/null 2>&1
20 08 * * 5   /u1/PROD/prodora/dailyduty.sh tablespace > /dev/null 2>&1
00 10,16 * * *   /u1/PROD/prodora/dailyduty.sh deadlock > /dev/null 2>&1
00 15,16,17 * * *   /u1/PROD/prodora/dailyduty.sh requests > /dev/null 2>&1
00 08 * * 5   /u1/PROD/prodora/dailyduty.sh invalidobject > /dev/null 2>&1
55 23 * * 0-4   /u1/PROD/prodora/checkalert.sh > /dev/null 2>&1

主要脚本dailyduty.sh

#!/bin/ksh
#abstract:
#oracle database alert jobs
#history:
#2012-06-11     [email protected]         first release
#variables
script_basepath=/u1/PROD/prodora/sql
mail_date=$(date +%Y-%m-%d\ %H:%M:%S)
receipt=[email protected]
hostname=$(hostname)

#path
ORACLE_HOME=/u1/PROD/prodora/proddb/9.2.0
export ORACLE_HOME
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:$ORACLE_HOME/bin
export PATH
ORACLE_SID=PROD
export ORACLE_SID


#variabls
script_name=$1
if [ "${script_name}" == "" ]; then
   echo "script file name was empty!"
   exit 0
fi
script="${script_basepath}/${script_name}.sql"
if [ ! -e ${script} ]; then
   echo "sql script file ${script} does not exist!"
   exit 0
fi

#sql spool log file
sqllog="${script_basepath}/${script_name}.log"

#sqlplus logon
sqlplus -s '/as sysdba' << EOF
set feed off
set linesize 200
set pagesize 200
spool ${sqllog}
@${script}
spool off
exit
EOF

#grep spool file and kill processes
if [ "${script_name}" == "session" ]; then
   grep "kill -9" ${sqllog} | awk 'NR >1 {print $0}' | ksh
fi

#mail
if [ "${script_name}" != "session" ]; then
   if [ `cat ${sqllog} | wc -l` -gt 0 ]; then
       cat ${sqllog} | mail -s "${mail_date}:${hostname} dba daily check of ${script_name}" ${receipt} 
   fi
fi

参考

  • Oracle Database Administrator Guide
  • Oracle Application Administrator Guide


blog comments powered by Disqus

Published

23 December 2012

Categories

Tags

Github