Skip to content

Investigate how prophet can be made fully decentralized #492

@lni

Description

@lni

Assuming gossip would be the approach to fully decentralize the PD service, we'd like to know -

  1. what stats need to be collected and shared across the cluster.
  2. how different stats versions reported by the same replica are maintained.
  3. when scheduling actions are required for any certain replica, which node should be responsible for making such scheduling decisions.
  4. in a fully decentralized environment, it is always possible for more than just one node to believe that it is the decision maker, how such conflicts are handled/resolved. e.g. when both node X and Y believe that replica A from shard B need to be replaced, X and Y might end up with different conclusions, X might want replica A to be replaced by a replica C on node Z while Y might want replica A to be replaced by a replica D on node W. how such situation should be handled.
  5. in order to support the solution for item 4) above, what changes are required for MatrixCube.
  6. when certain nodes are isolated from ALL other nodes, how to handle such situation.
  7. when certain nodes are isolated from most other nodes, will they still be able to get a full picture of the cluster (probably with higher delays)? will they still be able to function normally?
  8. what stats need to be persistently stored onto disks. why they need to be persistently stored.
  9. what testing techniques can be applied.
  10. for such a typical system, how to break it down into some high level components, what are they.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions