A Case of ATO Exception
异常症状
2012-12-21下午Oracle ERP在销售模块做ATO产品订单进展时,没有弹出任何错误,也没有 弹出订单进展的”确认”提示,直接跳到订单头信息,行信息中也没有出现带*
的标准成品料 号。且所有的ATO订单都无法完成订单进展。导致特制单不能及时录入系统,问题较为严重。
解决思路
使用topas查看操作系统是否有异常进程;
登录系统查看并发请求状态;
查系统当前会话、进程数;
看数据库警告日志;
详细过程
1. OS进程检查
登录到erp,运行topas观察系统进程使用资源情况。
2. 并发请求
职责路径:system administrator » Concurrent » Manager » Administer 此时再查看并发管理运行状态,等待处理的请求 因只有系统管理员有该权限,大家也可以通过SQL脚本进行查询,脚本如下(可以APPS运行):
WITH requests_count AS
(SELECT COUNT(1) counts
FROM apps.fnd_concurrent_requests fcr
WHERE fcr.request_date > trunc(SYSDATE - 1)
AND fcr.requested_start_date < trunc(SYSDATE + 1)
AND fcr.phase_code IN ('P', 'R'))
SELECT 'Attention! Concurrent requests are up to ' || rc.counts "attention"
FROM requests_count rc
WHERE rc.counts > 300;
3. 系统当前进程、会话数
Oracle数据库当前数据库设置的最大进程数、会话数在数据初始参数中已经设置,可以查询如下:
1)使用PL/SQL Developer,打开Command Windows即可(以APPS运行)
SQL> show parameter processes
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 2
db_writer_processes integer 1
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 800
SQL> show parameter sessions
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
java_max_sessionspace_size integer 0
java_soft_sessionspace_limit integer 0
license_max_sessions integer 0
license_sessions_warning integer 0
logmnr_max_persistent_sessions integer 1
mts_sessions integer 0
sessions integer 1600
shared_server_sessions integer 0
2)使用SQL脚本查询,打开SQL Windows,或者SQLPlus(以APPS运行)
select * from v$parameter p where p.NAME in ('processes','sessions');
再查看当前系统的进程数、会话数。
SELECT COUNT(*) FROM v$process; --当前进程数
SELECT COUNT(*) FROM v$session; --当前会话数
当以上数目非常接近系统设定的最大值时,系统就可能出现类似以上的问题,主要是超出最 大会话数,无法进行新的会话、连接等响应。
此时一般会手动处理一些过期的会话,主要是以下方式
--类似session/cookie的会话
SELECT 'kill -9 ' || b.spid
FROM v$session a,
v$process b,
v$sqltext c
WHERE a.paddr = b.addr
AND a.username IS NOT NULL
AND c.hash_value = a.sql_hash_value
AND c.sql_text LIKE '%SESSION_COOKIE_DOMAIN%'
AND a.sid != (SELECT sid FROM v$mystat WHERE rownum = 1);
--当前之前还没有正常回收的forms会话
SELECT 'alter system kill session' || '''' || s.sid || ',' || s.serial# || ''';' oracle_level_kill
FROM v$session s,
v$process p
WHERE s.paddr = p.addr
AND s.sid IN (SELECT se.sid
FROM v$session se
WHERE sid IN (SELECT session_id FROM v$locked_object)
AND se.logon_time < trunc(SYSDATE))
AND s.action LIKE 'FRM:%'
AND s.STATUS = 'INACTIVE';
4.查看数据库警告日志
数据库警告日志中记录了数据库在运行过程中的状态信息,包括发生的错误。 登录到服务器主机,如prodora,警告日志文件路径:
注:
以上路径也是在数据库的初始化参数设定,查询如下:
--PL/SQL Developer的Command Windows
SQL> show parameter background_dump_dest;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest string /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/bdump
--sql脚本
select * from v$parameter p where p.NAME in ('background_dump_dest');
如今天的警告日志截取如下:
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_5173482.trc:
ORA-00600: 内部错误代码,参数: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: 内部错误,参数: [77406], [], [], [], [], [], [], []
ORA-06553: PLS-801: 内部错误 [77406]
ORA-00018: 超出最大会话数
ORA-06512: 在"APPS.FND_SIGNON", line 239
Fri Dec 21 15:32:03 2012
Timed out trying to start process J002.
Fri Dec 21 15:32:43 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_3883030.trc:
ORA-00600: 内部错误代码,参数: [4414], [0], [0], [5454], [2], [], [], []
ORA-00018: 超出最大会话数
ORA-00018: 超出最大会话数
Fri Dec 21 15:33:55 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_688340.trc:
ORA-00600: 内部错误代码,参数: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: 内部错误,参数: [77406], [], [], [], [], [], [], []
ORA-06553: PLS-801: 内部错误 [77406]
ORA-00018: 超出最大会话数
ORA-06512: 在"APPS.FND_SIGNON", line 239
Fri Dec 21 15:33:55 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_6520900.trc:
ORA-00600: 内部错误代码,参数: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: 内部错误,参数: [77406], [], [], [], [], [], [], []
ORA-06553: PLS-801: 内部错误 [77406]
ORA-20001: Oracle 错误 -18:FND_SIGNON.AUDIT_END 中检测到 ORA-00018: 超出最大会话数。
ORA-06512: 在"APPS.APP_EXCEPTION", line 72
ORA-06512: 在"APPS.FND_SIGNON", line 18
ORA-06512: 在"APPS.FND_SIGNON", line 317
Fri Dec 21 15:34:43 2012
Completed checkpoint up to RBA [0x514c6.2.10], SCN: 0x0771.4352af6c
Fri Dec 21 15:40:51 2012
Restarting dead background process QMN0
QMN0 started with pid=600
Fri Dec 21 15:41:06 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_6520900.trc:
ORA-00600: 内部错误代码,参数: [17285], [0x1103D2D58], [4294967295], [0x70000008B9DF418], [], [], [], []
Fri Dec 21 15:41:07 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_6520900.trc:
ORA-00600: 内部错误代码,参数: [17285], [0x1103D2D58], [4294967295], [0x70000008B9DF418], [], [], [], []
ORA-00600: 内部错误代码,参数: [17285], [0x1103D2D58], [4294967295], [0x70000008B9DF418], [], [], [], []
Fri Dec 21 15:42:37 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_4214832.trc:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00018: 超出最大会话数
ORA-06512: 在line 1
Fri Dec 21 15:42:40 2012
Errors in file /u1/PROD/prodora/proddb/9.2.0/admin/PROD_erpprod/udump/prod_ora_3190972.trc:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00018: 超出最大会话数
ORA-06512: 在line 1
通过发生的ORA-错误,以及应用的异常时间,基本可以确定是由于会话引起的以上异常,清 理一些过期的会话就可以解决。
特别说明
在操作系统中已经做了一些cron任务,定期预警或者执行会话清理。主要如下:
- 定期清理过期会话(session)
- 定期检查死锁(deadlock)
- 定期检查请求(requests)
- 定期检查无效对象、表空间等(invalidobject, tablespace, tablespaces)
- 定期检查数据库警告日志(checkalert.sh)
以上脚本均会发送邮件通知。
查看prodora定义的cron job
$ crontab -l
45 07,13,16 * * * /u1/PROD/prodora/dailyduty.sh session > /dev/null 2>&1
00 08 * * * /u1/PROD/prodora/dailyduty.sh database > /dev/null 2>&1
05 08 * * * /u1/PROD/prodora/dailyduty.sh instance > /dev/null 2>&1
10 08 * * * /u1/PROD/prodora/dailyduty.sh datafiles > /dev/null 2>&1
15 08 * * * /u1/PROD/prodora/dailyduty.sh tablespaces > /dev/null 2>&1
20 08 * * 5 /u1/PROD/prodora/dailyduty.sh tablespace > /dev/null 2>&1
00 10,16 * * * /u1/PROD/prodora/dailyduty.sh deadlock > /dev/null 2>&1
00 15,16,17 * * * /u1/PROD/prodora/dailyduty.sh requests > /dev/null 2>&1
00 08 * * 5 /u1/PROD/prodora/dailyduty.sh invalidobject > /dev/null 2>&1
55 23 * * 0-4 /u1/PROD/prodora/checkalert.sh > /dev/null 2>&1
主要脚本dailyduty.sh
#!/bin/ksh
#abstract:
#oracle database alert jobs
#history:
#2012-06-11 [email protected] first release
#variables
script_basepath=/u1/PROD/prodora/sql
mail_date=$(date +%Y-%m-%d\ %H:%M:%S)
[email protected]
hostname=$(hostname)
#path
ORACLE_HOME=/u1/PROD/prodora/proddb/9.2.0
export ORACLE_HOME
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:$ORACLE_HOME/bin
export PATH
ORACLE_SID=PROD
export ORACLE_SID
#variabls
script_name=$1
if [ "${script_name}" == "" ]; then
echo "script file name was empty!"
exit 0
fi
script="${script_basepath}/${script_name}.sql"
if [ ! -e ${script} ]; then
echo "sql script file ${script} does not exist!"
exit 0
fi
#sql spool log file
sqllog="${script_basepath}/${script_name}.log"
#sqlplus logon
sqlplus -s '/as sysdba' << EOF
set feed off
set linesize 200
set pagesize 200
spool ${sqllog}
@${script}
spool off
exit
EOF
#grep spool file and kill processes
if [ "${script_name}" == "session" ]; then
grep "kill -9" ${sqllog} | awk 'NR >1 {print $0}' | ksh
fi
#mail
if [ "${script_name}" != "session" ]; then
if [ `cat ${sqllog} | wc -l` -gt 0 ]; then
cat ${sqllog} | mail -s "${mail_date}:${hostname} dba daily check of ${script_name}" ${receipt}
fi
fi
参考
- Oracle Database Administrator Guide
- Oracle Application Administrator Guide
blog comments powered by Disqus