Skip to content

Conversation

@XingY
Copy link
Contributor

@XingY XingY commented Jan 15, 2026

Rationale

The thread dump from a reported issue reveals a deadlock situation where:

  • Thread 1 (search indexer triggered by a previous updateDomain.api) is unable to perform a SELECT over exp.data JOIN expdataclass.provisioned JOIN exp.dataclass
  • Thread 2 (updateDomain.api) is unable to perform a UPDATE on exp.dataclass

A SELECT won't typically end up in a deadlock situation. However an exception is ADD COLUMN, which takes an ACCESS EXCLUSIVE lock, blocking reads/writes. So in this case, thread 1 is waiting for thread 2 to release lock on expdataclass.provisioned, while thread 2 is waiting for thread 1 to release lock on exp.dataclass (caused by indexer updating exp.dataclass.lastindex early in the transaction)

To help avoid this lock situation, this PR moves the indexing of exp.dataclass to be after indexing of exp.data. The bulk of work happens on exp.data and there is no need to lock exp.dataclass for the whole transaction.

Related Pull Requests

Changes

  • move exp.dataclass an exp.materialsource indexing to be done after exp.data and exp.material indexing to avoid holding on to lock.

@XingY XingY requested a review from labkey-jeckels January 15, 2026 17:58
@labkey-jeckels
Copy link
Contributor

I see the search indexer activity in later dumps but it looks to be idle in the first dumps in the log. Thus, I don't think it's causing the deadlock here, though it is piling on.

In the first dump, I see two problems, which may or may not be connected.

One is synchronization in PipelineQueueImpl. There's a job cancellation attempt from https-jsse-nio-443-exec-1 that's holding the lock but apparently not actually closing the DB connection. Other threads like https-jsse-nio-443-exec-10 and https-jsse-nio-443-exec-16 are trying to get the lock.

The second is related to the construct domain. https-jsse-nio-443-exec-2 and https-jsse-nio-443-exec-11 are both trying to update the domain. One is trying to update the exp.dataclass row while the other is trying to add the column to the provisioned table. https-jsse-nio-443-exec-24 and other threads are blocked trying to query that domain. JobThread-2.1 is running a domain validation job that's also blocked trying to query it.

Can you take another look at the earliest dump in the log and see if you agree with my assessment? And if so, do you think your patch will help, or should we pursue a different fix? Your patch seems OK to me, but I'm worried it won't address the root problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants