-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Assuming gossip would be the approach to fully decentralize the PD service, we'd like to know -
- what stats need to be collected and shared across the cluster.
- how different stats versions reported by the same replica are maintained.
- when scheduling actions are required for any certain replica, which node should be responsible for making such scheduling decisions.
- in a fully decentralized environment, it is always possible for more than just one node to believe that it is the decision maker, how such conflicts are handled/resolved. e.g. when both node X and Y believe that replica A from shard B need to be replaced, X and Y might end up with different conclusions, X might want replica A to be replaced by a replica C on node Z while Y might want replica A to be replaced by a replica D on node W. how such situation should be handled.
- in order to support the solution for item 4) above, what changes are required for MatrixCube.
- when certain nodes are isolated from ALL other nodes, how to handle such situation.
- when certain nodes are isolated from most other nodes, will they still be able to get a full picture of the cluster (probably with higher delays)? will they still be able to function normally?
- what stats need to be persistently stored onto disks. why they need to be persistently stored.
- what testing techniques can be applied.
- for such a typical system, how to break it down into some high level components, what are they.