-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Linkis Component
linkis-computation-governance/linkis-manager/linkis-ecm
What happened
English:
After ECM (EngineConnManager) service experiences an abnormal shutdown and restart, querying ECM management information throws a TooManyResultsException.
Problem Description:
When ECM is abnormally closed (e.g., forced kill process, server power outage) and then restarted, querying ECM management information through Linkis management console or API results in a TooManyResultsException error. The error message indicates: "Expected one result (or null) to be returned by selectOne(), but found: 2"
Root Cause:
When ECM shuts down abnormally, the ECM status in the database is not properly updated. After restart, a new ECM record is generated, causing duplicate records in the database. When querying, the selectOne() method returns multiple results, triggering the exception.
中文:
ECM(EngineConnManager)服务异常关闭重启后,查询ECM管理信息时抛出TooManyResultsException异常。
问题描述:
当ECM异常关闭(如强制kill进程、服务器断电等)后重新启动,通过Linkis管理控制台或API查询ECM管理信息时出现TooManyResultsException错误。错误信息显示:"Expected one result (or null) to be returned by selectOne(), but found: 2"
根本原因:
ECM异常关闭时,数据库中的ECM状态未正确更新,导致重启后生成了新的ECM记录,造成数据库中存在重复记录,查询时selectOne()方法返回多个结果。
What you expected to happen
English:
After ECM abnormal shutdown and restart:
- Querying ECM management information should return normally
- System should not throw TooManyResultsException
- No duplicate ECM records should exist in the database
- Old ECM records should be properly marked as inactive or removed
中文:
ECM异常关闭重启后应该:
- 查询ECM管理信息应正常返回
- 系统不应抛出TooManyResultsException异常
- 数据库中不应存在重复的ECM记录
- 旧的ECM记录应该被正确标记为不活跃或删除
How to reproduce
English:
- Start ECM (EngineConnManager) service
- Abnormally close ECM service (e.g., force kill process, server power outage)
- Restart ECM service
- Query ECM management information through Linkis management console or API
- Observe that the system throws TooManyResultsException
中文:
- 启动ECM(EngineConnManager)服务
- 异常关闭ECM服务(如强制kill进程、服务器断电等)
- 重新启动ECM服务
- 通过Linkis管理控制台或API查询ECM管理信息
- 观察到系统抛出TooManyResultsException异常
Anything else
English:
Error Details:
TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2
Suggested Solutions:
- Add ECM startup cleanup logic: Before registering a new ECM, check and clean up old records with the same identifier
- Improve selectOne() query: Change to selectList() and implement logic to handle multiple results (e.g., select the latest record, mark others as inactive)
- Add unique constraint: Add database unique constraint to prevent duplicate ECM records
- Implement heartbeat mechanism: Add ECM heartbeat detection to automatically mark inactive ECMs
- Add graceful shutdown hook: Ensure ECM status is properly updated in database during shutdown
Related Code:
- ECM registration logic in LinkisManager
- MyBatis mapper selectOne() queries
- ECM lifecycle management
- Database schema for ECM tables
中文:
错误详情:
TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2
建议解决方案:
- 添加ECM启动清理逻辑:在注册新ECM之前,检查并清理具有相同标识符的旧记录
- 改进selectOne()查询:改为selectList()并实现处理多个结果的逻辑(如选择最新记录,将其他标记为不活跃)
- 添加唯一约束:在数据库中添加唯一约束以防止重复的ECM记录
- 实现心跳机制:添加ECM心跳检测以自动标记不活跃的ECM
- 添加优雅关闭钩子:确保ECM在关闭期间正确更新数据库状态
相关代码:
- LinkisManager 中的 ECM 注册逻辑
- MyBatis mapper 的 selectOne() 查询
- ECM 生命周期管理
- ECM 表的数据库架构
推测原因:ECM异常关闭时,数据库中的ECM状态未正确更新,导致重启后生成了新的ECM记录,造成数据库中存在重复记录,查询时selectOne()方法返回多个结果。