Skip to content

[Bug][ECM] TooManyResultsException error when querying ECM management after abnormal restart #5316

@v-kkhuang

Description

@v-kkhuang

Linkis Component

linkis-computation-governance/linkis-manager/linkis-ecm

What happened

English:

After ECM (EngineConnManager) service experiences an abnormal shutdown and restart, querying ECM management information throws a TooManyResultsException.

Problem Description:
When ECM is abnormally closed (e.g., forced kill process, server power outage) and then restarted, querying ECM management information through Linkis management console or API results in a TooManyResultsException error. The error message indicates: "Expected one result (or null) to be returned by selectOne(), but found: 2"

Root Cause:
When ECM shuts down abnormally, the ECM status in the database is not properly updated. After restart, a new ECM record is generated, causing duplicate records in the database. When querying, the selectOne() method returns multiple results, triggering the exception.


中文:

ECM(EngineConnManager)服务异常关闭重启后,查询ECM管理信息时抛出TooManyResultsException异常。

问题描述:
当ECM异常关闭(如强制kill进程、服务器断电等)后重新启动,通过Linkis管理控制台或API查询ECM管理信息时出现TooManyResultsException错误。错误信息显示:"Expected one result (or null) to be returned by selectOne(), but found: 2"

根本原因:
ECM异常关闭时,数据库中的ECM状态未正确更新,导致重启后生成了新的ECM记录,造成数据库中存在重复记录,查询时selectOne()方法返回多个结果。

What you expected to happen

English:

After ECM abnormal shutdown and restart:

  1. Querying ECM management information should return normally
  2. System should not throw TooManyResultsException
  3. No duplicate ECM records should exist in the database
  4. Old ECM records should be properly marked as inactive or removed

中文:

ECM异常关闭重启后应该:

  1. 查询ECM管理信息应正常返回
  2. 系统不应抛出TooManyResultsException异常
  3. 数据库中不应存在重复的ECM记录
  4. 旧的ECM记录应该被正确标记为不活跃或删除

How to reproduce

English:

  1. Start ECM (EngineConnManager) service
  2. Abnormally close ECM service (e.g., force kill process, server power outage)
  3. Restart ECM service
  4. Query ECM management information through Linkis management console or API
  5. Observe that the system throws TooManyResultsException

中文:

  1. 启动ECM(EngineConnManager)服务
  2. 异常关闭ECM服务(如强制kill进程、服务器断电等)
  3. 重新启动ECM服务
  4. 通过Linkis管理控制台或API查询ECM管理信息
  5. 观察到系统抛出TooManyResultsException异常

Anything else

English:

Error Details:

TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2

Suggested Solutions:

  1. Add ECM startup cleanup logic: Before registering a new ECM, check and clean up old records with the same identifier
  2. Improve selectOne() query: Change to selectList() and implement logic to handle multiple results (e.g., select the latest record, mark others as inactive)
  3. Add unique constraint: Add database unique constraint to prevent duplicate ECM records
  4. Implement heartbeat mechanism: Add ECM heartbeat detection to automatically mark inactive ECMs
  5. Add graceful shutdown hook: Ensure ECM status is properly updated in database during shutdown

Related Code:

  • ECM registration logic in LinkisManager
  • MyBatis mapper selectOne() queries
  • ECM lifecycle management
  • Database schema for ECM tables

中文:

错误详情:

TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2
Image

建议解决方案:

  1. 添加ECM启动清理逻辑:在注册新ECM之前,检查并清理具有相同标识符的旧记录
  2. 改进selectOne()查询:改为selectList()并实现处理多个结果的逻辑(如选择最新记录,将其他标记为不活跃)
  3. 添加唯一约束:在数据库中添加唯一约束以防止重复的ECM记录
  4. 实现心跳机制:添加ECM心跳检测以自动标记不活跃的ECM
  5. 添加优雅关闭钩子:确保ECM在关闭期间正确更新数据库状态

相关代码:

  • LinkisManager 中的 ECM 注册逻辑
  • MyBatis mapper 的 selectOne() 查询
  • ECM 生命周期管理
  • ECM 表的数据库架构

推测原因:ECM异常关闭时,数据库中的ECM状态未正确更新,导致重启后生成了新的ECM记录,造成数据库中存在重复记录,查询时selectOne()方法返回多个结果。

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions