Add recovery mechanism for node failure scenarios. #277
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit introduces a comprehensive recovery system that enables data synchronization when a node fails in a multi-node replication cluster. The implementation includes support for rescue subscriptions that can recover missing transactions from peer nodes using recovery slots and forwarding mechanisms. The recovery system addresses the critical need to maintain data consistency across all nodes in a distributed replication environment when one or more nodes become unavailable or fall behind in replication progress.
The recovery system tracks subscription state through additional fields that indicate rescue status, temporary subscription flags, and recovery boundaries defined by LSN positions and timestamps. These fields allow the system to distinguish between normal subscriptions and temporary rescue subscriptions created during recovery operations. Recovery slots preserve WAL history for rescue operations, allowing lagging nodes to catch up by replaying transactions from a more advanced peer node that has successfully received and applied the missing data.
A forwarding-based recovery procedure configures subscriptions to forward transactions from failed node origins, enabling automatic recovery without requiring manual WAL replay. This approach leverages the existing replication infrastructure to cascade transactions from nodes that have the missing data to nodes that need to catch up. The forwarding mechanism works by updating subscription parameters to include all transaction origins, ensuring that transactions originally from the failed node are propagated through the replication topology to reach the lagging node.
The system includes helper functions for monitoring recovery progress and verifying data consistency across nodes. These functions allow administrators to track the status of recovery operations, verify that data has been successfully synchronized, and ensure that all nodes have reached a consistent state. The recovery process can be monitored through subscription status views and custom recovery status functions that report the current state of rescue operations.
Recovery slots are managed through a dedicated shared memory context that tracks active recovery slots across the cluster. The recovery slot management system ensures that WAL is preserved for rescue operations by maintaining logical replication slots that can be cloned for use by rescue subscriptions. The slot management includes mechanisms to advance recovery slots to the minimum position across all peer subscriptions, ensuring that historical transactions remain available for recovery operations even as normal replication progresses.
The implementation includes a cluster management script that facilitates testing and demonstration of recovery scenarios. This script automates the creation of multi-node replication clusters, simulates node failures, and verifies recovery operations. The script provides detailed output about the state of each node including row counts, LSN positions, and subscription statuses, making it easier to understand and debug recovery scenarios.
Recovery operations are designed to be transparent to applications running on the cluster. The system automatically handles the creation and cleanup of temporary rescue subscriptions, ensuring that recovery operations do not interfere with normal replication once recovery is complete. The recovery system integrates seamlessly with the existing subscription management infrastructure, allowing recovery to proceed without manual intervention once initiated.