Skip to content

Conversation

@capistrant
Copy link
Contributor

@capistrant capistrant commented Dec 15, 2025

Description

Add new functionality to Compaction Supervisors. Instead of storing compaction state for segments individually, centralize the state storage in a new metadata table. Individual segments will store a computed fingerprint that references a compaction state in the new metadata table. Since many segments will eventually end up sharing common compaction states, this should greatly reduce duplication in metadata storage.

note: This applies only to Compaction Supervisors. Scheduled compaction on the coordinator.

Compaction State Fingerprinting

Instead of storing CompactionState as the lastCompactionState field in every compaction segment, generate a fingerprint for a CompactionState and attach that to compacted segments. Add new centralized storage for CompactionState where individual states can be looked up by the aforementioned fingerprint. Since it is common for many segments in a data source to share a single CompactionState, this greatly reduces the metadata storage overhead for storing compaction states.

Metadata Store Changes
druid_segments

Add new column compaction_state_fingerprint that stores the fingerprint representation of the segments current compaction state. It can be null if no compaction has taken place.

druid_compactionStates

New metadata table that stores the full CompactionState associated with a fingerprint. Segments can look up their full compaction state here by using the compaction_state_fingerprint that they are associated with.

CompactionStateManager

The CompactionStateManager is responsible for managing the persistence and lifecycle of compaction states. It stores unique compaction configurations (identified by fingerprints) in the metadata database. The manager tracks which compaction states are actively referenced by segments, marking unreferenced states as unused and periodically cleaning up old unused states. This fingerprinting approach allows Druid to efficiently store and retrieve compaction metadata without duplicating identical compaction configurations across multiple segments.

OnHeapCompactionStateManager

Meant to serve as a mechanism for testing and simulations where metadata persistence may not be available/needed

CompactionStateCache

CompactionStateCache is a new component of the HeapMemorySegmentMetadataCache. It is modeled strongly after the existing datasource schema cache. This is where the existing compaction states are cached for reference by compaction supervisors.

CompactSegments Coordinator Duty Roadmap

This PR does not add support for compaction state fingerprinting to the coordinator based scheduled compaction that is carried out by the CompactSegments coordinator duty. This is because the Druid roadmap is to move all scheduled compaction to compaction supervisors. Making the decision to forgo compaction state fingerprint support for the legacy duty based compaction is a conscious choice we are making to help drive usage of supervisors and limit changes to the legacy duty based compaction code. Another PR should be spun up to officially deprecate legacy scheduled compaction on the coordinator.

Legacy lastCompactionState Roadmap

This PR implements no automatic transition to fingerprints for segments who are compacted and store CompactionState in their lastCompactionState field. Instead this PR aims to continue supporting lastCompactionState in Compaction decision making for segments compacted before fingerprinting. This means that legacy segments will not have to be re-compacted simply because they are not fingerprinted, as long as they have the proper CompactionState as specified by the compaction configuration for the data source in question.

This PR also continues to write both the new fingerprint as well as the legacy lastCompactionState by default. This allows normal rolling upgrade order as well as Druid version rollback without un-needed re-compaction. An operator can disable writing lastCompactionState by updating the cluster compaction config, after the Druid upgrade completes. Eventually, Druid code base will cease writing lastCompactionState at all and instead force using fingerprinting going forward. I think this should be done in the Druid version following the first version that this new feature is seen in. Even at this point, lastCompactionState will need to continue to be supported for already written segments, unless we want to devise an automated migration plan that can run in the background of a cluster to get all compacted segments migrated to fingerprinting.

Release note

coming soon

Upgrade Note

Metadata store changes are required for this upgrade. If you already have druid.metadata.storage.connector.createTables set to true no action is needed. If you have this feature disabled, you will need to alter the segments table and create the compactionStates table. Postgres DDL is provided below as a guide. You will have to adapt the syntax to your metadata store backend as well as use proper table naming depending on your configured table prefix and database.

-- create the compaction states lookup table and associated indices
CREATE TABLE druid_compactionStates (
    id BIGSERIAL NOT NULL,
    created_date VARCHAR(255) NOT NULL,
    datasource VARCHAR(255) NOT NULL,
    fingerprint VARCHAR(255) NOT NULL,
    payload BYTEA NOT NULL,
    used BOOLEAN NOT NULL,
    used_status_last_updated VARCHAR(255) NOT NULL,
    PRIMARY KEY (id),
    UNIQUE (fingerprint)
  );

  CREATE INDEX idx_druid_compactionStates_fingerprint ON druid_compactionStates(fingerprint);
  CREATE INDEX idx_druid_compactionStates_used ON druid_compactionStates(used, used_status_last_updated);
-- modify druid_segments table to have a column for storing compaction state fingerprints
ALTER TABLE druid_segments ADD COLUMN compaction_state_fingerprint VARCHAR(255);

Key changed/added classes in this PR
  • CompactionStatus
  • CompactionConfigBasedJobTemplate
  • CompactionState
  • SQLMetadataConnector
  • CompactionStateManager
  • CompactionStateCache
  • CompactSegments
  • KillUnreferencedCompactionState

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LifecycleStop
public void stop()
{
fingerprintCache.invalidateAll();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cache object need any other lifecycle cleanup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about if the operator has create tables disabled and does not properly create the table before upgrading?

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature, @capistrant !

I have started going through the PR, leaving a partial review here.
I am yet to go through several changes, such as the ones made in CompactionStatus, DatasourceCompactibleSegmentIterator, etc.

* <p>
* Useful for simulations and unit tests where database persistence is not needed.
*/
public class HeapMemoryCompactionStateManager extends CompactionStateManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be cleaner to let CompactionStateManager be an interface, and let both the heap-based and the concrete class implement it.

* In-memory implementation of {@link CompactionStateManager} that stores
* compaction state fingerprints in heap memory without requiring a database.
* <p>
* Useful for simulations and unit tests where database persistence is not needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is used only in tests, we should probably put it in the test source root src/test/java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is where I originally put it, but then I tried to use it in a simulation class which is in the app code, not test. Let me review this though, maybe I am mistaken on how it is all working with the simulations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Are you referring to CoordinatorSimulationBuilder or some other class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no CompactionRunSimulator, https://github.com/apache/druid/pull/18844/files#diff-b8a4fdf52e09ff26fa6f5610c021d196b9fa99673b83051de794ed07257be13b ... It creates CompactSegments instance, which as of now requires a CompactionStateManager. But I guess if we go the route of not supporting fingerprinting in the coordinator duty led compaction, this may not be a problem and it can be moved to the test space.

Comment on lines 814 to 815
|`druid.manager.compactionState.cacheSize`|The maximum number of compaction state fingerprints to cache in memory on the coordinator and overlord. Compaction state fingerprints are used to track the compaction configuration applied to segments. Consider increasing this value if you have a large number of datasources with compaction configurations.|`100`|
|`druid.manager.compactionState.prewarmSize`|The number of most recently used compaction state fingerprints to load into cache on Coordinator startup. This pre-warms the cache to improve performance immediately after startup.|`100`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Coordinator and Overlord (with segment metadata caching enabled) already keep all used segments in memory, including the respective (interned) CompactionState objects as well.
I don't think the number of distinct CompactState objects that we keep in memory will increase after this patch.

Do we still need to worry about the cache size of these objects?
Does a cache miss trigger a fetch from metadata store?

{

/**
* Lazy initialization holder for deterministic ObjectMapper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't just inject this mapper annotated with @Sorted or @Deterministic as a lazy singleton. It may be injected into CompactionStateManager and fingerprints will always be created by that class rather than using a static utility method.

if (segmentIterator.hasNext()) {
// If we are going to create compaction jobs for this compaction state, we need to persist the fingerprint -> state
// mapping so compacted segments from these jobs can reference a valid compaction state.
params.getCompactionStateManager().persistCompactionState(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templates should only perform lightweight (i.e. non-IO) read-only operations as createCompactionJobs may be called frequently.
We should not do any persistence here.
Instead, the params can hold some mapping where we can add this compaction state and perform persistence on-demand (perhaps in the CompactionJobQueue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the guidance. Will work on how to get this out of hot path

}
}

private static Function<Set<DataSegment>, Set<DataSegment>> addCompactionStateFingerprintToSegments(String compactionStateFingerprint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's re-use the static function from AbstractTask itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! I didn't know if it was bad form to reach into that class from MSQ. But I like having just one impl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to use AbstractTask in the MSQ code. Alternatively, you can put the method in IndexTaskUtils too.

Tasks.DEFAULT_STORE_COMPACTION_STATE
);

String compactionStateFingerprint = querySpec.getContext()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String compactionStateFingerprint = querySpec.getContext()
final String compactionStateFingerprint = querySpec.getContext()

pre-compute
pre-computed
pre-computing
pre-dates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predates need not be hyphenated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes my inability to spell, compounded by my inability to google how to spell, is embarrassing. this is one of those times. will fix

* </p>
*/
@ManageLifecycle
public class CompactionStateManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that pre-warming the cache is really necessary. The fingerprint needs to be retrieved only when running the CompactionJobQueue on Overlord or CompactSegments on Coordinator.

  1. Let's always keep all the compaction states in memory. We are already keeping all the used segments in memory. The distinct CompactionState objects will be fairly small in number and size.
  2. The states can be cached in HeapMemorySegmentMetadataCache which already serves as a cache for used segments, pending segments and schemas.
  3. If possible, let's support the fingerprint flow only with compaction supervisors and not the Coordinator-based CompactSegments duty. That would simplify the new flow and be another motivation for users to migrate to using compaction supervisors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, let's support the fingerprint flow only with compaction supervisors and not the Coordinator-based CompactSegments duty. That would simplify the new flow and be another motivation for users to migrate to using compaction supervisors.

would we want to deprecate CompactSegments compaction on the coordinator in this case? so we aren't forever supporting compaction without fingerprints + compaction with fingerprints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the plan was to deprecate CompactSegments once compaction supervisors took off. I don't fully recall if compaction supervisors is already marked GA or not. They would also have to be made the default, if we want to start deprecation of CompactSegments.

But I feel all of this should be out of scope for the current PR.

If supporting the fingerprint logic in CompactSegments is not additional work and does not complicate the flow, we can leave it as is.

My only concern is that there should be just one service that is responsible for persisting new fingerprints. I would prefer that to be the Overlord, so that it always has a consistent cache state. So we either just don't support fingerprints on the Coordinator or we handle persistence by calling an Overlord API.

(I am yet to go through the whole PR to identify all the call sites that may persist a compaction state. I have only found the one in CompactionConfigBasedJobTemplate so far.)

{
final ObjectMapper sortedMapper = new DefaultObjectMapper();
sortedMapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);
sortedMapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ObjectMapper.configure
should be avoided because it has been deprecated.
{
ObjectMapper mapper = new DefaultObjectMapper();
mapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);
mapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
ObjectMapper.configure
should be avoided because it has been deprecated.
{
ObjectMapper mapper = new DefaultObjectMapper();
mapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);
mapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
ObjectMapper.configure
should be avoided because it has been deprecated.
{
ObjectMapper mapper = new DefaultObjectMapper();
mapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);
mapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
ObjectMapper.configure
should be avoided because it has been deprecated.
.withGranularitySpec(new UserCompactionTaskGranularityConfig(Granularities.DAY, null, null))
.build();

CompactionState expectedState = CompactionStatus.createCompactionStateFromConfig(compactionConfig);

Check notice

Code scanning / CodeQL

Unread local variable Note test

Variable 'CompactionState expectedState' is never read.
@capistrant
Copy link
Contributor Author

@kfaraz I pushed up changes addressing the review you had to date. I ended up going with the idea of dropping support for fingerprinting from the CompactSegments duty. I also did my best to wrap my head around the segment metadata caching and tying in compaction state caching to that instead of having the caching as a one off in CompactionStateManager. now the manager just handles persists and then the lifecycle management of states in the database as they go unused and age out. I also took a crack at moving when the compaction states are persisted, per your suggestion. They no longer happen in the template code, but rather will happen when tasks are about to be ran.

I have talked briefly with @clintropolis and I think he will be submitting a review soon with thoughts on changing the naming in the metastore to drop the "compaction" from the names since we are already do more than just compaction and are trending towards adding even more functionality than pure compaction. If you have any thoughts on table and column naming, I'd love to hear it. If we do refactor the naming, I think it will be up for debate how much we change naming in the app code in this PR vs another refactoring PR that focuses on making the naming more generic in the app code when it comes to compaction supervisors.

@kfaraz
Copy link
Contributor

kfaraz commented Jan 5, 2026

Thanks for the update, @capistrant !
I will try to finish a review of the latest changes today.

@techdocsmith
Copy link
Contributor

A couple of questions re metadata naming. Forgive me if I don't understand the naming conventions well. Take or leave as you will.

druid_segments
Is druid necessary here? What other segments migth we distinguish the Druid segments from?

druid_compactionStates
Suggest using similar underscore/all lowercase instead of underscore + camelcase: compaction_states. Same question about needing to specify druid

compaction_state_fingerprint
If this is a column within the druid_compactionStates, could it just be called fingerprint ?

@kfaraz
Copy link
Contributor

kfaraz commented Jan 7, 2026

@techdocsmith , I have tried to answer your queries below.

druid_segments
Is druid necessary here? What other segments migth we distinguish the Druid segments from?

This is an existing convention in the Druid codebase. The default names of the metadata tables use the prefix druid_ and use a camel-cased name. Howevers, users can always configure their cluster to use some other name.

compaction_state_fingerprint
If this is a column within the druid_compactionStates, could it just be called fingerprint ?

This is the foreign key column in the druid_segments table which refers to the fingerprint column in the druid_compactionStates table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants