Skip to content

Conversation

@ryanbreen
Copy link
Owner

Summary

  • Fixes spinlock deadlock that caused ~35 ARM64 tests to hang after exec()
  • Root cause: run_userspace_from_ext2() held ext2 lock across non-returning ERET to userspace
  • Fix: explicit drop(fs_guard) after ELF read, before userspace jump

Root Cause Analysis

  1. run_userspace_from_ext2() acquires ext2::root_fs() spinlock
  2. Loads init_shell ELF binary from filesystem
  3. Jumps to userspace via ERET (never returns)
  4. MutexGuard never dropped → spinlock held forever
  5. Child processes calling exec() deadlock trying to acquire the same lock

Changes

  • kernel/src/main_aarch64.rs: Add drop(fs_guard) after ELF data read
  • kernel/src/arch_impl/aarch64/trace.rs: Lock-free trace buffer for debugging
  • kernel/src/arch_impl/aarch64/syscall_entry.rs: Trace points for exec path
  • arm64-parity.md: ARM64 parity tracking document
  • Test infrastructure for ARM64 userspace tests

Test plan

  • fork_test: PASS (child exit 42, parent exit 0)
  • exec_from_ext2_test: PASS (exec'd /bin/hello_world)
  • ARM64 boot test: PASS
  • Full ARM64 test suite re-run (expected significant improvement from 41% baseline)

🤖 Generated with Claude Code

The run_userspace_from_ext2() function acquires the ext2 root_fs() spinlock
to load the init_shell ELF binary, but then jumps directly to userspace via
ERET without returning. Since return_to_userspace() never returns, the
MutexGuard is never dropped and the spinlock remains held forever.

When userspace calls fork() then exec(), the child process tries to load
the new program via load_elf_from_ext2(), which attempts to acquire the
same spinlock - resulting in a deadlock.

The fix adds an explicit drop(fs_guard) after reading the ELF data but
before jumping to userspace. This releases the lock at the earliest safe
point and allows subsequent exec() calls to succeed.

This unblocks ~35 ARM64 tests that were hanging after exec().

Also includes:
- Lock-free trace buffer (trace.rs) for debugging critical paths
- Trace points in syscall_entry.rs for exec path debugging
- ARM64 parity tracking document
- Test infrastructure for ARM64 userspace tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit c482bc1 into main Feb 2, 2026
1 of 2 checks passed
@ryanbreen ryanbreen deleted the arm-exec-fix branch February 2, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants