ClickHouse Merge Performance Bottleneck causing slower insertions with High-Volume Time-Series Ingestion

## Merge System Bottleneck Causing 35-Minute Write Lag at 55K rows/sec - Merge Throughput Unable to Keep Up with Insert Rate**

## Environment
ClickHouse Altinity Version: 25.3.6
Cluster: 3-node cluster (replication enabled)
Table Engine: ReplicatedMergeTree
Workload: High-volume time-series ingestion
Operating System: Rocky Linux 9
Memory: 128GB, 32GB, 32GB
CPU: 16 cores per node
Kafka: 10 Partitions, 10 Kafka Connectors writing 100K batches to DB from Kafka Topic

We discovered a merge system performance bottleneck when ingesting high-volume time-series data into a replicated 3-node ClickHouse cluster.

**Data pipeline stalled due to merge system bottleneck and eventually consumer lag being increased and slower insertions:**

| Metric | Value |
|--------|-------|
| Insert Rate | 51.6K rows/sec |
| Merge Rate | ~41K rows/sec |
| Deficit | 10K rows/sec backing up |
| Write Lag | 35 minutes |
| Kafka Backlog | 22M rows |
| System Status | Stable but at ceiling |

After 156 minutes: **485M rows ingested, but 35-minute delay before data is queryable**.

## Key Findings

### Performance Metrics

**CPU Usage:**
- Peak: 491% (4.9 cores equivalent)
- Active merge threads: 3.07 cores (19% utilization)
- **Observation:** NOT CPU-bound, system has 16 cores but only using 3

**Memory Usage:**
- Peak: 6.49 GB RSS (20% of 32GB available)
- Status: **NOT memory-constrained**, plenty of headroom

**I/O Performance:**
- 306 parts being managed (stable, not growing)
- 32 concurrent merges (saturated at max limit)
- Replication factor 3 = 918 total part files across cluster
- Average part size: 1.59M rows per part
- Disk throughput: Bottleneck identified

**Others:**
- Insert Rate: 55.6K rows/sec (sustained for 156 minutes)
- Write Lag: 35 minutes (data queued in Kafka consumer lag around 22-28M rows)
- Consumer Backlog: 22M rows accumulating

### Merge Activity Progression

| Time | Total Rows | Rate | Write Lag | Merges | Parts |
|------|-----------|------|-----------|--------|-------|
| T+30m | ~150M | 52K/s | 5-10m | 10-15 | ~150 |
| T+60m | ~280M | 54K/s | 15-20m | 25-30 | ~250 |
| T+90m | ~330M | 52K/s | ~25m | 30-32 | ~290 |
| T+120m | 403.5M | 56K/s | ~32m | 32 | 305 |
| **T+156m** | **485.4M** | **51.6K/s** | **~35m** | **32** | **306** |

**Critical observation:** After T+90min, merge count plateaus at 32 (max concurrent limit), part count stabilizes around 300-310, and write lag continues accumulating.

## Root Cause

**Replication serialization blocks merge completion:**

```
Merge Operation Timeline:
1. Merge part locally          → 100-500ms
2. Write merged part to disk   → 200-800ms
3. Replicate to Node 2         → 1000-2000ms (network/disk wait)
4. Replicate to Node 3         → 1000-2000ms (sequential or parallel?)
5. Keeper coordination sync    → 50-100ms
6. Acknowledge completion      → 50-100ms

Total: ~2000-3500ms per merge operation

With 32 concurrent merges × 2.5 seconds average = 80 seconds per full merge cycle
Kafka batches arrive every 30 seconds

Result: Queue builds faster than merges complete → lag accumulates
```

**Why CPU/Memory are Low but System is Bottlenecked:**
- Disk I/O operations are blocking (waiting for network replication)
- CPU has nothing to do while waiting for disk/network
- Memory is allocated but unused (merge buffers waiting on I/O)
- Replication latency is the constraint, not compute resources

**System Equilibrium Reached:**
- Merge throughput: ~41K rows/sec (calculated from sustained 10K rows/sec deficit)
- Insert throughput: 55.6K rows/sec (from Kafka)
- Deficit: 10K rows/sec accumulates as write lag
- At 35 minutes: 10K × 2,100 sec = 21M rows in backlog (matches observed 22M)

## Current Table Schema

```sql
CREATE TABLE default.`tdf-data` ON CLUSTER 'clickhouse' (
    name String,
    value Variant(Float64,Int64,String,BOOLEAN),
    status BOOLEAN,
    timestamp DateTime64(6),
    received_at DateTime64(6) DEFAULT now()
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/tdf-data', '{replica}')
PARTITION BY toStartOfDay(timestamp)
PRIMARY KEY (name, toStartOfHour(timestamp))
ORDER BY (name, toStartOfHour(timestamp), timestamp)
TTL toDateTime(timestamp) + INTERVAL 1 DAY
SETTINGS
    max_partitions_per_insert_block = 100,
    max_replicated_merges_in_queue = 32,
    max_parts_in_total = 300,
    parts_to_delay_insert = 100,
    parts_to_throw_insert = 150,
    max_bytes_to_merge_at_max_space_in_pool = 536870912,
    max_bytes_to_merge_at_min_space_in_pool = 10737418240,
    max_insert_block_size = 100000,
    merge_max_block_size = 100000,
    max_delay_to_insert = 10,
    min_delay_to_insert_ms = 100,
    index_granularity = 32768,
    old_parts_lifetime = 120;
```

We almost tried every configuration and came to conclusion that the data merge is slow and cannot keep up with the speed of insertions and hence slows insertions and results.

Looking for someone to share their thoughts/solution on this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ClickHouse Merge Performance Bottleneck causing slower insertions with High-Volume Time-Series Ingestion #1252

Merge System Bottleneck Causing 35-Minute Write Lag at 55K rows/sec - Merge Throughput Unable to Keep Up with Insert Rate**

Environment

Key Findings

Performance Metrics

Merge Activity Progression

Root Cause

Current Table Schema

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value
Insert Rate	51.6K rows/sec
Merge Rate	~41K rows/sec
Deficit	10K rows/sec backing up
Write Lag	35 minutes
Kafka Backlog	22M rows
System Status	Stable but at ceiling

Time	Total Rows	Rate	Write Lag	Merges	Parts
T+30m	~150M	52K/s	5-10m	10-15	~150
T+60m	~280M	54K/s	15-20m	25-30	~250
T+90m	~330M	52K/s	~25m	30-32	~290
T+120m	403.5M	56K/s	~32m	32	305
T+156m	485.4M	51.6K/s	~35m	32	306

ClickHouse Merge Performance Bottleneck causing slower insertions with High-Volume Time-Series Ingestion #1252

Description

Merge System Bottleneck Causing 35-Minute Write Lag at 55K rows/sec - Merge Throughput Unable to Keep Up with Insert Rate**

Environment

Key Findings

Performance Metrics

Merge Activity Progression

Root Cause

Current Table Schema

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions