Skip to content

Conversation

@andreas-karlsson
Copy link
Contributor

@andreas-karlsson andreas-karlsson commented Dec 10, 2025

This PR proposes adding distinct methods for each atomic memory instruction to the Memory interface, with the following benefits:

  • It enables future the optimization of atomic operations in Memory implementations.
  • The Memory.lock(), which is a somewhat leaky abstraction, is no longer used outside of the Memory implementations and has been deprecated.
  • The changes are backward compatible.

The PR initially also contained changes to make ByteArrayMemory implement atomic operations based on VarHandle. This showed very promising performance improvements, but oddly broke on Java 25, where atomic operations on byteArrayViewVarHandle are no longer supported. That might be a bug as it's not mentioned in the release notes. It could also safely be feature gated as VarHandle.isAccessModeSupported(...) correctly returned false on Java 25. But I felt it better to leave it out for a follow-up PR when more is known. (The feature gated implementations still exists on a branch here.)

@andreas-karlsson andreas-karlsson marked this pull request as ready for review December 11, 2025 01:26
Copy link
Collaborator

@andreaTP andreaTP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreas-karlsson The changes look good, it’s a nice cleanup, and I agree that lock is a bit of a leaky abstraction.

That said, without the “atomic operations based on VarHandle” portion (which I think is a great idea!), the overall value of this PR feels reduced.

Would you be up to opening another PR in parallel to show the direction we’re aiming for? Even with CI broken, it would give us a sense of how far we are from the desired goal.

I honestly assumed the VarHandle API was more stable, but given the incompatibilities you mentioned, I’m now wondering how well this will work with native-image. I think we’re missing a CI step to verify that compatibility.

var replacement = (int) stack.pop(); // c3
var expected = (int) stack.pop(); // c2
var ptr = readMemPtr(stack, operands); // i
var replacement = (int) stack.pop();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the names in the comments are reflecting the spec, I'd leave it for the future reader.

synchronized (memory.lock(ptr)) {
return memory.read(ptr);
}
return Byte.toUnsignedInt(memory.atomicReadByte(ptr));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me reason out loudly:

  • assume we release those changes in a 1.7.0 release
  • a user compiles using the build time compiler with 1.7.0
  • attempting to use the generated classes with 1.6.0 of the runtime is going to break

correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed in #883 and we are aiming for forward and not backward compatibility of generated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, calling new runtime methods in Shaded will break on an older runtime... what's the rational behind this requirement? Setups where one dynamically pushes newly compiled classes into an otherwise static installation? I don't see any way around it. What does that mean? This PR would require a new major?

@andreas-karlsson
Copy link
Contributor Author

@andreaTP I did more investigation into the Java 25 issue. It turns out atomic VarHandle operations still work on ByteBuffers but not on regular byte arrays. This test confirms it.

Since isAccessModeSupported returns the correct status, I can gate the implementation granularly. I can add commits enabling atomic optimizations for both memory implementations; ByteArrayMemory will simply fall back to locks on Java 25.

It would be great to pair this with a performance test running on all platforms/versions in CI. I've noticed JMH in the repo and will look into adding a benchmark.

That said, I still believe the refactoring warrants inclusion independent of the optimization. I hadn't thought about the forward compatibility issue though! Will need to wrap my head around that..

@andreaTP
Copy link
Collaborator

I did more investigation into the Java 25 issue

I'll follow up asking around to relevant people 👍

I can add commits enabling atomic optimizations for both memory implementations

Thanks, look like a reasonable path forward

JMH in the repo and will look into adding a benchmark.

I was going to propose it, sounds great!

@dmlloyd
Copy link
Collaborator

dmlloyd commented Dec 11, 2025

The issue with Java 25 most likely stems from this issue (and its fix): https://bugs.openjdk.org/browse/JDK-8318966

Basically, you can't (generally speaking) perform multi-byte atomic operations on byte[] because its minimum element alignment is 1 byte, and most CPUs require a minimum alignment that is equal to the size of the atomic type.

The best fix is to detect Java 22 or later, and use MemorySegment (or VarHandle derived from MemorySegment) wrapping a long[] instead of wrapping a byte[].

@andreas-karlsson
Copy link
Contributor Author

@dmlloyd Thanks! That's most likely it. I'll take a proper look at implementation switching to MemorySegment when available.

@andreaTP I took the liberty to squash and rebase the branch. Then fired off two hopeful commits in quick succession, but both turned very red. I'll put this PR back in draft and workout something more solid.

@andreas-karlsson andreas-karlsson marked this pull request as draft December 11, 2025 16:44
@andreas-karlsson
Copy link
Contributor Author

I only tried the ThreadsProposalTest before pushing, and that worked on both Java 21/25. But it turns out in other cases we do a bunch of operations on ByteBuffers that don't work with direct buffers, and I guess there was a reason ByteBufferMemory didn't use direct buffers. I'll focus the solution on getting ByteArrayMemory to work first.

@andreas-karlsson andreas-karlsson force-pushed the atomics-in-memory branch 3 times, most recently from ef94562 to 71141f0 Compare December 12, 2025 08:01
@andreas-karlsson
Copy link
Contributor Author

andreas-karlsson commented Dec 12, 2025

@andreaTP I've just pushed a new benchmark based on parallel sieve primes computation. It uses a lot of atomics and also some locking. I also pushed an optimized ByteArrayMemory, but it's basically same as before. Using MemorySegment as suggested isn't a clean fit in ByteArrayMemory, I think it would rather be a separate Memory impl.

      (memoryType)  (numWorkers)   Mode  Cnt   Score   Error  Units
   ByteArrayMemory             4  thrpt    5  86,899 ± 2,367  ops/s
  ByteBufferMemory             4  thrpt    5  24,947 ± 1,094  ops/s

These are the scores I'm getting when ByteArrayMemory is optimized and ByteBufferMemory is not. I guess it'd be better to compare just ByteArrayMemory optimized vs unoptimized. But to do that in CI I guess it would need separate PRs?

Note: The reason it works on Java 25 now is cause it falls back to locking.

@andreas-karlsson andreas-karlsson marked this pull request as ready for review December 12, 2025 08:15
@andreas-karlsson andreas-karlsson marked this pull request as draft December 12, 2025 09:25
@andreas-karlsson
Copy link
Contributor Author

This is still very flaky 😞 will need to do more testing in CI

@andreaTP
Copy link
Collaborator

@andreas-karlsson hope this helps, to fight flaky tests I usually start with something like this: andreaTP@33ef658 and run the checks on my fork.

Numbers are great!

@andreas-karlsson
Copy link
Contributor Author

@andreaTP Ah! Smart!

@andreas-karlsson
Copy link
Contributor Author

andreas-karlsson commented Dec 12, 2025

@andreaTP It looks like it was the unguarded grow #1137 that caused the flakiness*. On that branch I'm not using any var-handle atomics but guard grow behind a rw-lock, and I'm confident it would also work with var-handle atomics.

Fixing the grow bug is easy but can't be done in the "old" Memory interface, it would spill the grow lock into the compiler and interpreter. So that's my biggest question right now, what are the prospects of the Memory interface refactor? Will it need a breaking release? If so, when can that be done? I know you're about to go on vacation, but it would be nice to have some guidance on this cause I will have plenty of free time coming weeks 🙂

*Edit: The 2/100 failures are unrelated.

@andreaTP
Copy link
Collaborator

@andreas-karlsson thanks a lot for the help here! Much appreciated!

So that's my biggest question right now, what are the prospects of the Memory interface refactor? Will it need a breaking release? If so, when can that be done?

If we manage to have the right deprecation in place and we verify forward compatibility we should be able to release a 1.7 with the changes 👍 not a lot of modules are really leveraging the threads proposal at the moment and the risk is low.

@andreas-karlsson
Copy link
Contributor Author

andreas-karlsson commented Dec 14, 2025

verify forward compatibility

@andreaTP Does this mean creating like a memory-adapter in the compiler? That could be done, but i think the bugs in the current impl. makes it unusable for anything non trivial, so I don't know if it's worth the effort to salvage forward compatibility in this case?

@andreaTP
Copy link
Collaborator

Fair, feel free to move on 👍

@andreaTP
Copy link
Collaborator

andreaTP commented Jan 6, 2026

@andreas-karlsson hi and happy new year!
I'm around now and happy to pick up again those improvements, let me know when is a good time for review 🙏

@andreas-karlsson
Copy link
Contributor Author

@andreaTP Happy new year to you as well!

I've just reacquainted myself with the PR and the state is as follows:

  • Trying to use a read-lock to guard memory grow was disastrous(!) for performance, so I switched to allocating pages in separate chunks, which I think makes a lot of sense.
  • I think the memory benchmark is a good addition, using a "real" calculation compiled from Rust, pushing both atomics and locks.
  • The current ByteArrayMemory impl. works really well, except for Java 25 where it falls back to locks, which leads to performance degradation but is still correct.
  • The PR doesn't address ByteBufferMemory at all.

I got a bit bogged down before holidays in how to reason about the two memory implementations, and how to introduce a third one(?!) based on the new MemorySegment. There are many options. Should we have a separate implementation of ByteArrayMemory that gets selected by Multi Release Jar? But why would a MemorySegment impl. masquerade as a byte-array impl.? And what's actually the rationale for having even two memory implementations? An alternative would be to instead deprecate ByteArrayMemory and focus on the ByteBufferMemory, which (if using direct buffers) works well with var-handle atomics across all Java versions. We could just make direct buffers an option and fall-back to locks for atomics otherwise. This would consolidate things and lessen the maintenance burden.

Another option would be to scale this PR back to just include the performance benchmark and the changes to the Memory interface and its usage in compiler/interpreter. I slightly favour this because it'd be nice to have the benchmark giving comparable numbers when working on changes to the memory implementations.

Sorry for the long post, but I need some guidance 😅

@andreaTP
Copy link
Collaborator

andreaTP commented Jan 7, 2026

Thanks a lot for keeping the engagement @andreas-karlsson !

Trying to use a read-lock to guard memory grow was disastrous(!) for performance, so I switched to allocating pages in separate chunks, which I think makes a lot of sense.

Looks fair!

I think the memory benchmark is a good addition, using a "real" calculation compiled from Rust, pushing both atomics and locks.

Agree 👍 thanks!

The current ByteArrayMemory impl. works really well, except for Java 25 where it falls back to locks, which leads to performance degradation but is still correct.

I think is acceptable, IIRC there is an open Issue for Java 25, so the problem will eventually get solved.

The PR doesn't address ByteBufferMemory at all.

Currently, we are using ByteBufferMemory as a safe default as it works on Android too.
I think the changes can be included in a separate PR if we feel the need for it.

I got a bit bogged down before holidays in how to reason about the two memory implementations, and how to introduce a third one(?!) based on the new MemorySegment. There are many options.

That's correct, at the moment the situation is:

  • ByteArrayMemory JVM only, high perf
  • ByteBufferMemory default, safe

I'd be interested in looking at performance numbers to see if another implementation(in the runtime module, viable options are, for example: spitting out a separate sub-module, publishing from another repo etc.) is worth it, maybe we can consider changing the internals of ByteArrayMemory?

Should we have a separate implementation of ByteArrayMemory that gets selected by Multi Release Jar?

In principle, I'd refrain from adding additional complexity if not well justified.

An alternative would be to instead deprecate ByteArrayMemory and focus on the ByteBufferMemory

I'm afraid this is not a desirable option, the gut feeling is that we will fall into spending time chasing Android vs. JVM differences and support instead of focusing on more meaningful targets like perf on the JVM.

Another option would be to scale this PR back to just include the performance benchmark and the changes to the Memory interface and its usage in compiler/interpreter. I slightly favour this because it'd be nice to have the benchmark giving comparable numbers when working on changes to the memory implementations.

This is all good! 🙂 whatever flow works the best for you!

Sorry for the long post, but I need some guidance 😅

Very welcome, you are doing great job in this project and I'm grateful for the time you are spending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants