Skip to content

Conversation

@dariusarnold
Copy link
Contributor

@dariusarnold dariusarnold commented Mar 8, 2025

Fixes #172.

This race condition would make infinisim eventually crash with the message "Mutex released without being held" on my machine.

The function xSemaphoreTake waits for the semaphore to be available to take by looping + sleeping until there is an element in the pxQueue->queue. Then it locks the pxQueue->mutex and checks the queuesize again. Since the first check is not done under the mutex, some other thread could remove the element from the queue before the mutex is locked. Then the second check in xSemaphoreTake will return false. The method DateTimeController::CurrentDateTime() uses a semaphore as a mutex to guard calls to UpdateTime(). But it (and the other methods from DateTimeController) do not check if xSemaphoreTake actually gave them the semaphore. They will try to give it back even if they failed to take it, which leads to the crash in xSemaphoreGive. The other thread in this crash was calling the watchface refresh method, which also calls CurrentDateTime().

To fix the race condition, lock the mutex guarding pxQeue->queue before checking the queue size and unlock before sleeping. When implemented like this, xSemaphoreTake from InfiniSim follows the API from FreeRTOS more closely, which waits the full xTicksToWait while InfiniSim would abort prematurely in some cases.

With the assumption that UpdateTime() will finish within some finite amount of time, this makes the methods in DateTimerController work even without error handling since they have such long wait times.

This race condition would make infinisim eventually crash with the message "Mutex released without being held" on my machine.

The function `xSemaphoreTake` waits for the semaphore to be available to take by  looping + sleeping until there is an element in the `pxQueue->queue`. Then it locks the `pxQueue->mutex` and checks the queuesize again. Since the first check is not done under the mutex, some other thread could remove the element from the queue before the mutex is locked. Then the second check in `xSemaphoreTake` will return false.
The method `DateTimeController::CurrentDateTime()` uses a semaphore as a mutex to guard calls to `UpdateTime()`. But it (and the other methods from `DateTimeController`) do not check if `xSemaphoreTake` actually gave them the semaphore. They will try to give it back even if they failed to take it, which leads to the crash in `xSemaphoreGive`.
The other thread in this crash was calling the watchface refresh method, which also calls `CurrentDateTime()`.

To fix the race condition, lock the mutex guarding `pxQeue->queue`  before checking the queue size and unlock before sleeping. When implemented like this, `xSemaphoreTake` from InfiniSim follows the API from FreeRTOS more closely, which waits the full `xTicksToWait` while InfiniSim would abort prematurely in some cases.

With the assumption that `UpdateTime()` will finish within some finite amount of time, this make the methods in `DateTimerController` work even without error handling since they have such long wait times.
Copy link
Collaborator

@NeroBurner NeroBurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nitpicks. Other than that a BIG thank you! The semaphore crash was driving me crazy for a while, but I didn't find time to investigate.

Thanks for the thorough explanation and methodical fix!

@NeroBurner NeroBurner added the bug Something isn't working label Mar 9, 2025
…TEMPTS

Since xTicksToWait is TickType_t (uint32) and SDL_Delay takes uint32 as well, we should avoid any chance of funny business due to differences in signedness.
@dariusarnold
Copy link
Contributor Author

I was wondering if someone else was effected by this, on my machine it would occur pretty frequently. I am happy to help out.

Copy link
Collaborator

@NeroBurner NeroBurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loocking good 😁

@NeroBurner NeroBurner merged commit 56ed755 into InfiniTimeOrg:main Mar 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash with message "Mutex released without being held" due to race condition in xSemaphoreTake

2 participants