move atomic primitives to Memory #1135

andreas-karlsson · 2025-12-10T14:40:34Z

This PR proposes adding distinct methods for each atomic memory instruction to the Memory interface, with the following benefits:

It enables future the optimization of atomic operations in Memory implementations.
The Memory.lock(), which is a somewhat leaky abstraction, is no longer used outside of the Memory implementations and has been deprecated.
The changes are backward compatible.

The PR initially also contained changes to make ByteArrayMemory implement atomic operations based on VarHandle. This showed very promising performance improvements, but oddly broke on Java 25, where atomic operations on byteArrayViewVarHandle are no longer supported. That might be a bug as it's not mentioned in the release notes. It could also safely be feature gated as VarHandle.isAccessModeSupported(...) correctly returned false on Java 25. But I felt it better to leave it out for a follow-up PR when more is known. (The feature gated implementations still exists on a branch here.)

runtime/src/main/java/com/dylibso/chicory/runtime/Memory.java

andreaTP

@andreas-karlsson The changes look good, it’s a nice cleanup, and I agree that lock is a bit of a leaky abstraction.

That said, without the “atomic operations based on VarHandle” portion (which I think is a great idea!), the overall value of this PR feels reduced.

Would you be up to opening another PR in parallel to show the direction we’re aiming for? Even with CI broken, it would give us a sense of how far we are from the desired goal.

I honestly assumed the VarHandle API was more stable, but given the incompatibilities you mentioned, I’m now wondering how well this will work with native-image. I think we’re missing a CI step to verify that compatibility.

andreaTP · 2025-12-11T10:57:43Z

runtime/src/main/java/com/dylibso/chicory/runtime/InterpreterMachine.java

-        var replacement = (int) stack.pop(); // c3
-        var expected = (int) stack.pop(); // c2
-        var ptr = readMemPtr(stack, operands); // i
+        var replacement = (int) stack.pop();


nit: I think the names in the comments are reflecting the spec, I'd leave it for the future reader.

runtime/src/main/java/com/dylibso/chicory/runtime/Memory.java

andreaTP · 2025-12-11T11:04:39Z

compiler/src/main/java/com/dylibso/chicory/compiler/internal/Shaded.java

-        synchronized (memory.lock(ptr)) {
-            return memory.read(ptr);
-        }
+        return Byte.toUnsignedInt(memory.atomicReadByte(ptr));


Let me reason out loudly:

assume we release those changes in a 1.7.0 release

a user compiles using the build time compiler with 1.7.0

attempting to use the generated classes with 1.6.0 of the runtime is going to break

correct?

We discussed in #883 and we are aiming for forward and not backward compatibility of generated code.

Yes, calling new runtime methods in Shaded will break on an older runtime... what's the rational behind this requirement? Setups where one dynamically pushes newly compiled classes into an otherwise static installation? I don't see any way around it. What does that mean? This PR would require a new major?

andreas-karlsson · 2025-12-11T15:19:25Z

@andreaTP I did more investigation into the Java 25 issue. It turns out atomic VarHandle operations still work on ByteBuffers but not on regular byte arrays. This test confirms it.

Since isAccessModeSupported returns the correct status, I can gate the implementation granularly. I can add commits enabling atomic optimizations for both memory implementations; ByteArrayMemory will simply fall back to locks on Java 25.

It would be great to pair this with a performance test running on all platforms/versions in CI. I've noticed JMH in the repo and will look into adding a benchmark.

That said, I still believe the refactoring warrants inclusion independent of the optimization. I hadn't thought about the forward compatibility issue though! Will need to wrap my head around that..

andreaTP · 2025-12-11T15:47:56Z

I did more investigation into the Java 25 issue

I'll follow up asking around to relevant people 👍

I can add commits enabling atomic optimizations for both memory implementations

Thanks, look like a reasonable path forward

JMH in the repo and will look into adding a benchmark.

I was going to propose it, sounds great!

dmlloyd · 2025-12-11T16:16:24Z

The issue with Java 25 most likely stems from this issue (and its fix): https://bugs.openjdk.org/browse/JDK-8318966

Basically, you can't (generally speaking) perform multi-byte atomic operations on byte[] because its minimum element alignment is 1 byte, and most CPUs require a minimum alignment that is equal to the size of the atomic type.

The best fix is to detect Java 22 or later, and use MemorySegment (or VarHandle derived from MemorySegment) wrapping a long[] instead of wrapping a byte[].

andreas-karlsson · 2025-12-11T16:44:09Z

@dmlloyd Thanks! That's most likely it. I'll take a proper look at implementation switching to MemorySegment when available.

@andreaTP I took the liberty to squash and rebase the branch. Then fired off two hopeful commits in quick succession, but both turned very red. I'll put this PR back in draft and workout something more solid.

andreas-karlsson · 2025-12-11T17:00:26Z

I only tried the ThreadsProposalTest before pushing, and that worked on both Java 21/25. But it turns out in other cases we do a bunch of operations on ByteBuffers that don't work with direct buffers, and I guess there was a reason ByteBufferMemory didn't use direct buffers. I'll focus the solution on getting ByteArrayMemory to work first.

andreas-karlsson · 2025-12-12T08:08:18Z

@andreaTP I've just pushed a new benchmark based on parallel sieve primes computation. It uses a lot of atomics and also some locking. I also pushed an optimized ByteArrayMemory, but it's basically same as before. Using MemorySegment as suggested isn't a clean fit in ByteArrayMemory, I think it would rather be a separate Memory impl.

      (memoryType)  (numWorkers)   Mode  Cnt   Score   Error  Units
   ByteArrayMemory             4  thrpt    5  86,899 ± 2,367  ops/s
  ByteBufferMemory             4  thrpt    5  24,947 ± 1,094  ops/s

These are the scores I'm getting when ByteArrayMemory is optimized and ByteBufferMemory is not. I guess it'd be better to compare just ByteArrayMemory optimized vs unoptimized. But to do that in CI I guess it would need separate PRs?

Note: The reason it works on Java 25 now is cause it falls back to locking.

andreas-karlsson · 2025-12-12T09:27:13Z

This is still very flaky 😞 will need to do more testing in CI

andreaTP · 2025-12-12T09:51:17Z

@andreas-karlsson hope this helps, to fight flaky tests I usually start with something like this: andreaTP@33ef658 and run the checks on my fork.

Numbers are great!

andreas-karlsson · 2025-12-12T10:21:55Z

@andreaTP Ah! Smart!

andreas-karlsson · 2025-12-12T13:50:32Z

@andreaTP It looks like it was the unguarded grow #1137 that caused the flakiness*. On that branch I'm not using any var-handle atomics but guard grow behind a rw-lock, and I'm confident it would also work with var-handle atomics.

Fixing the grow bug is easy but can't be done in the "old" Memory interface, it would spill the grow lock into the compiler and interpreter. So that's my biggest question right now, what are the prospects of the Memory interface refactor? Will it need a breaking release? If so, when can that be done? I know you're about to go on vacation, but it would be nice to have some guidance on this cause I will have plenty of free time coming weeks 🙂

*Edit: The 2/100 failures are unrelated.

andreaTP · 2025-12-12T16:48:59Z

@andreas-karlsson thanks a lot for the help here! Much appreciated!

So that's my biggest question right now, what are the prospects of the Memory interface refactor? Will it need a breaking release? If so, when can that be done?

If we manage to have the right deprecation in place and we verify forward compatibility we should be able to release a 1.7 with the changes 👍 not a lot of modules are really leveraging the threads proposal at the moment and the risk is low.

andreas-karlsson · 2025-12-14T14:44:47Z

verify forward compatibility

@andreaTP Does this mean creating like a memory-adapter in the compiler? That could be done, but i think the bugs in the current impl. makes it unusable for anything non trivial, so I don't know if it's worth the effort to salvage forward compatibility in this case?

andreaTP · 2025-12-14T14:53:34Z

Fair, feel free to move on 👍

andreaTP · 2026-01-06T11:45:09Z

@andreas-karlsson hi and happy new year!
I'm around now and happy to pick up again those improvements, let me know when is a good time for review 🙏

andreas-karlsson · 2026-01-07T17:16:52Z

@andreaTP Happy new year to you as well!

I've just reacquainted myself with the PR and the state is as follows:

Trying to use a read-lock to guard memory grow was disastrous(!) for performance, so I switched to allocating pages in separate chunks, which I think makes a lot of sense.
I think the memory benchmark is a good addition, using a "real" calculation compiled from Rust, pushing both atomics and locks.
The current ByteArrayMemory impl. works really well, except for Java 25 where it falls back to locks, which leads to performance degradation but is still correct.
The PR doesn't address ByteBufferMemory at all.

I got a bit bogged down before holidays in how to reason about the two memory implementations, and how to introduce a third one(?!) based on the new MemorySegment. There are many options. Should we have a separate implementation of ByteArrayMemory that gets selected by Multi Release Jar? But why would a MemorySegment impl. masquerade as a byte-array impl.? And what's actually the rationale for having even two memory implementations? An alternative would be to instead deprecate ByteArrayMemory and focus on the ByteBufferMemory, which (if using direct buffers) works well with var-handle atomics across all Java versions. We could just make direct buffers an option and fall-back to locks for atomics otherwise. This would consolidate things and lessen the maintenance burden.

Another option would be to scale this PR back to just include the performance benchmark and the changes to the Memory interface and its usage in compiler/interpreter. I slightly favour this because it'd be nice to have the benchmark giving comparable numbers when working on changes to the memory implementations.

Sorry for the long post, but I need some guidance 😅

andreaTP · 2026-01-07T17:59:55Z

Thanks a lot for keeping the engagement @andreas-karlsson !

Trying to use a read-lock to guard memory grow was disastrous(!) for performance, so I switched to allocating pages in separate chunks, which I think makes a lot of sense.

Looks fair!

I think the memory benchmark is a good addition, using a "real" calculation compiled from Rust, pushing both atomics and locks.

Agree 👍 thanks!

The current ByteArrayMemory impl. works really well, except for Java 25 where it falls back to locks, which leads to performance degradation but is still correct.

I think is acceptable, IIRC there is an open Issue for Java 25, so the problem will eventually get solved.

The PR doesn't address ByteBufferMemory at all.

Currently, we are using ByteBufferMemory as a safe default as it works on Android too.
I think the changes can be included in a separate PR if we feel the need for it.

I got a bit bogged down before holidays in how to reason about the two memory implementations, and how to introduce a third one(?!) based on the new MemorySegment. There are many options.

That's correct, at the moment the situation is:

ByteArrayMemory JVM only, high perf
ByteBufferMemory default, safe

I'd be interested in looking at performance numbers to see if another implementation(in the runtime module, viable options are, for example: spitting out a separate sub-module, publishing from another repo etc.) is worth it, maybe we can consider changing the internals of ByteArrayMemory?

Should we have a separate implementation of ByteArrayMemory that gets selected by Multi Release Jar?

In principle, I'd refrain from adding additional complexity if not well justified.

An alternative would be to instead deprecate ByteArrayMemory and focus on the ByteBufferMemory

I'm afraid this is not a desirable option, the gut feeling is that we will fall into spending time chasing Android vs. JVM differences and support instead of focusing on more meaningful targets like perf on the JVM.

Another option would be to scale this PR back to just include the performance benchmark and the changes to the Memory interface and its usage in compiler/interpreter. I slightly favour this because it'd be nice to have the benchmark giving comparable numbers when working on changes to the memory implementations.

This is all good! 🙂 whatever flow works the best for you!

Sorry for the long post, but I need some guidance 😅

Very welcome, you are doing great job in this project and I'm grateful for the time you are spending.

andreas-karlsson commented Dec 10, 2025

View reviewed changes

runtime/src/main/java/com/dylibso/chicory/runtime/Memory.java Outdated Show resolved Hide resolved

andreas-karlsson marked this pull request as ready for review December 11, 2025 01:26

andreaTP reviewed Dec 11, 2025

View reviewed changes

andreas-karlsson force-pushed the atomics-in-memory branch from 12b678f to 6178674 Compare December 11, 2025 16:21

andreas-karlsson marked this pull request as draft December 11, 2025 16:44

andreas-karlsson force-pushed the atomics-in-memory branch 3 times, most recently from ef94562 to 71141f0 Compare December 12, 2025 08:01

andreas-karlsson marked this pull request as ready for review December 12, 2025 08:15

andreas-karlsson marked this pull request as draft December 12, 2025 09:25

andreas-karlsson force-pushed the atomics-in-memory branch from b3bc8aa to 451b781 Compare December 12, 2025 09:42

andreas-karlsson added 3 commits December 18, 2025 14:57

move atomic primitives to Memory

75347ac

new rust multithreaded benchmark

d28c296

paged var-handle ByteArrayMemory

4668dfd

andreas-karlsson force-pushed the atomics-in-memory branch from 451b781 to 4668dfd Compare December 18, 2025 15:07

move atomic primitives to Memory #1135

Are you sure you want to change the base?

move atomic primitives to Memory #1135

Uh oh!

Conversation

andreas-karlsson commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

andreaTP left a comment

Choose a reason for hiding this comment

Uh oh!

andreaTP Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreaTP Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

andreaTP Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

andreas-karlsson Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

andreas-karlsson commented Dec 11, 2025

Uh oh!

andreaTP commented Dec 11, 2025

Uh oh!

dmlloyd commented Dec 11, 2025

Uh oh!

andreas-karlsson commented Dec 11, 2025

Uh oh!

andreas-karlsson commented Dec 11, 2025

Uh oh!

andreas-karlsson commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreas-karlsson commented Dec 12, 2025

Uh oh!

andreaTP commented Dec 12, 2025

Uh oh!

andreas-karlsson commented Dec 12, 2025

Uh oh!

andreas-karlsson commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreaTP commented Dec 12, 2025

Uh oh!

andreas-karlsson commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreaTP commented Dec 14, 2025

Uh oh!

andreaTP commented Jan 6, 2026

Uh oh!

andreas-karlsson commented Jan 7, 2026

Uh oh!

andreaTP commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andreas-karlsson commented Dec 10, 2025 •

edited

Loading

andreas-karlsson commented Dec 12, 2025 •

edited

Loading

andreas-karlsson commented Dec 12, 2025 •

edited

Loading

andreas-karlsson commented Dec 14, 2025 •

edited

Loading