Skip to content

Conversation

@rahul2393
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> ☕️

If you write sample code, please follow the samples format.

This commit adds the server abstraction interfaces for location-aware
routing:

- ChannelFinderServer: Interface representing a Spanner server endpoint
  with address, health check, and channel access
- ChannelFinderServerFactory: Factory interface for creating and caching
  server connections
- GrpcChannelFinderServerFactory: gRPC implementation that creates and
  manages gRPC channels for different server endpoints

These interfaces enable the client to maintain connections to multiple
Spanner servers and route requests directly to the appropriate server
based on key location information.

This is part of the experimental location-aware routing for improved
latency.
@rahul2393 rahul2393 requested review from a team as code owners January 8, 2026 05:57
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Jan 8, 2026
@google-cla
Copy link

google-cla bot commented Jan 8, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/java-spanner API. label Jan 8, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @rahul2393, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a sophisticated client-side key range caching system for Google Cloud Spanner. The primary goal is to enhance the efficiency of request routing by allowing the client to intelligently determine the optimal Spanner backend server for a given key or key range. This is achieved through a hierarchical cache that stores information about key ranges, the groups of tablets responsible for those ranges, and the individual tablets and their associated gRPC channels, complete with health checks and leader preference.

Highlights

  • New Key Range Caching Mechanism: Introduces a sophisticated KeyRangeCache to store and manage routing information for Google Cloud Spanner, including tablets, groups, and key ranges, enabling more efficient client-side routing.
  • Abstracted Server Connection Management: Defines ChannelFinderServer and ChannelFinderServerFactory interfaces to abstract the management of gRPC channels to Spanner backend servers, supporting lazy initialization and health checks.
  • gRPC Channel Implementation: Provides GrpcChannelFinderServerFactory as an implementation for managing gRPC channels, utilizing InstantiatingGrpcChannelProvider to build and maintain connections to various server addresses.
  • Intelligent Request Routing Logic: The KeyRangeCache implements logic to determine the optimal server for a given key or key range, considering tablet health, leader preference within a group, and the boundaries of cached key ranges.
  • Resource Lifecycle Management: Incorporates reference counting for CachedGroup and ServerEntry instances to ensure proper lifecycle management and prevent premature cleanup of shared resources within the cache.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@rahul2393 rahul2393 changed the title Feat/key range cache feat: experimental location aware routing- Part 4 Jan 8, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a KeyRangeCache for routing information, which is a valuable performance enhancement. However, the current implementation has several critical issues that need to be addressed. There are significant concerns regarding thread safety and resource management, including memory and channel leaks stemming from an incomplete reference counting mechanism. Additionally, there are some medium-severity issues, such as the use of System.out.println for logging. I've provided specific comments and suggestions to resolve these problems.

Comment on lines +36 to +56
synchronized (channelBuilder) {
InstantiatingGrpcChannelProvider.Builder newBuilder =
channelBuilder.setEndpoint(addr);
return new GrpcChannelFinderServer(addr, newBuilder.build());
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The create method has a concurrency issue. It modifies the shared channelBuilder by calling setEndpoint(), but it doesn't restore the original endpoint. This can lead to other threads creating channels for incorrect endpoints. You should save and restore the original endpoint within the synchronized block to ensure thread safety.

            synchronized (channelBuilder) {
              String originalEndpoint = channelBuilder.getEndpoint();
              try {
                InstantiatingGrpcChannelProvider.Builder newBuilder =
                    channelBuilder.setEndpoint(addr);
                return new GrpcChannelFinderServer(addr, newBuilder.build());
              } finally {
                channelBuilder.setEndpoint(originalEndpoint);
              }
            }

Comment on lines +42 to +49
private final NavigableMap<ByteString, CachedRange> ranges =
new TreeMap<>(ByteString.unsignedLexicographicalComparator());

// Groups indexed by group_uid
private final Map<Long, CachedGroup> groups = new HashMap<>();

// Servers indexed by address - shared across all tablets
private final Map<String, ServerEntry> servers = new HashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This class is not thread-safe. It uses non-concurrent collections like TreeMap and HashMap for its internal state (ranges, groups, servers), and the public methods that access this state (addRanges, fillRoutingInfo, clear, debugString) are not synchronized. If this cache is used by multiple threads, this will lead to race conditions and ConcurrentModificationException. Consider making the public methods synchronized or using concurrent collections.

Comment on lines +104 to +107
if (!serverAddress.equals(tabletIn.getServerAddress())) {
serverAddress = tabletIn.getServerAddress();
server = null; // Will be lazily initialized
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

When a tablet's server address changes, the old server reference is nulled out, but its ServerEntry is not unreferenced. This leads to a resource leak. You should unref the old server entry before assigning the new address.

Suggested change
if (!serverAddress.equals(tabletIn.getServerAddress())) {
serverAddress = tabletIn.getServerAddress();
server = null; // Will be lazily initialized
}
if (!serverAddress.equals(tabletIn.getServerAddress())) {
if (server != null) {
ServerEntry entry = servers.get(serverAddress);
unref(entry);
}
serverAddress = tabletIn.getServerAddress();
server = null; // Will be lazily initialized
}

Comment on lines +211 to +234
return;
}
}

// Tablets changed - rebuild the list, reusing existing tablets where possible
System.out.println("DEBUG [BYPASS]: Rebuilding tablet list");
Map<Long, CachedTablet> tabletsByUid = new HashMap<>();
for (CachedTablet tablet : tablets) {
tabletsByUid.put(tablet.tabletUid, tablet);
}

List<CachedTablet> newTablets = new ArrayList<>(groupIn.getTabletsCount());
for (int t = 0; t < groupIn.getTabletsCount(); t++) {
Tablet tabletIn = groupIn.getTablets(t);
CachedTablet tablet = tabletsByUid.get(tabletIn.getTabletUid());
if (tablet == null) {
tablet = new CachedTablet();
System.out.println(
"DEBUG [BYPASS]: Created new tablet for uid " + tabletIn.getTabletUid());
}
tablet.update(tabletIn);
System.out.println(
"DEBUG [BYPASS]: Tablet["
+ t

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

When rebuilding the tablet list in a group, CachedTablet objects for tablets that are no longer part of the group are discarded, but their associated ServerEntry objects are not unreferenced. This causes a resource leak. You should track which tablets are removed and call unref on their ServerEntry. A common pattern is to put all old tablets in a map, remove them as they are reused, and then iterate over the remaining tablets in the map to unreference their resources.

Comment on lines +334 to +311
long splitId = 0;
ByteString generation;

CachedRange(ByteString startKey, CachedGroup group, long splitId, ByteString generation) {
this.startKey = startKey;
this.group = group;
this.splitId = splitId;
this.generation = generation;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The unref(ServerEntry) method is never called, which is the root cause of multiple resource leaks. Additionally, when a ServerEntry's reference count reaches zero, the underlying ManagedChannel should be shut down to release its resources. This is not currently done.

Suggested change
long splitId = 0;
ByteString generation;
CachedRange(ByteString startKey, CachedGroup group, long splitId, ByteString generation) {
this.startKey = startKey;
this.group = group;
this.splitId = splitId;
this.generation = generation;
private void unref(ServerEntry serverEntry) {
if (serverEntry == null) {
return;
}
if (--serverEntry.refs == 0) {
servers.remove(serverEntry.server.getAddress());
serverEntry.server.getChannel().shutdown();
}
}

Comment on lines +364 to +341
private void unref(ServerEntry serverEntry) {
if (serverEntry == null) {
return;
}
if (--serverEntry.refs == 0) {
servers.remove(serverEntry.server.getAddress());
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

When a CachedGroup's reference count drops to zero, it is removed from the cache, but the ServerEntrys for its tablets are not unreferenced. This will cause a resource leak. Before removing the group, you should iterate through its tablets and unref their servers.

  private void unref(CachedGroup group) {
    if (group == null) {
      return;
    }
    if (--group.refs == 0) {
      for (CachedTablet tablet : group.tablets) {
        if (tablet.server != null) {
          ServerEntry entry = servers.get(tablet.serverAddress);
          unref(entry);
        }
      }
      groups.remove(group.groupUid);
    }
  }

Comment on lines +172 to +174
System.out.println(
"DEBUG [BYPASS]: Group.update for group "
+ groupUid

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code uses System.out.println for debugging purposes. This is not ideal for a library, as it pollutes standard output and cannot be configured. Please replace these with a proper logging framework like java.util.logging.Logger or SLF4J.

This commit adds the KeyRangeCache class that maps key ranges to
specific server locations for routing decisions.

Key features:
- TabletEntry class for tablet metadata (UID, server address, incarnation)
- ServerEntry class for server connection management
- Key range to tablet mapping with efficient lookup
- Lazy server initialization for on-demand connections
- Integration with ChannelFinderServer interfaces

This is part of the experimental location-aware routing for improved latency.
@rahul2393 rahul2393 force-pushed the feat/key-range-cache branch from f4b90d4 to 34a715d Compare January 8, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the googleapis/java-spanner API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants