Open Problems
Replacing Timely Dataflow's Non-BFT Scheduler
The timely dataflow library currently relies on a 'decentralized' scheduling system whereby streams are partitioned evenly between worker nodes and each worker node processes the entire dataflow graph on its "shard" of the data. However, it operates under assumptions of typical distributed systems deployed in data centers. It is not suitable for adversarial environments where worker nodes can exhibit arbitrary byzantine failures. This is by far the biggest open problem Maru has. That said, at a high level, we've already given this a fair bit of thought (just not as much as we would have liked to), and we have the following partial solution, which is probably broken and needs more work:
- Every epoch, worker nodes that meet a minimum stake "commit" to participating via an on-chain smart contract. This smart contract counts the number of participating worker nodes
num_workers
. - A random value
r
is sampled from a randmness beacon. Each node downloads and verifiesr
. - Every node syncs their view of the dataflow graph with an on-chain "operator registry" smart contract, which contains all of the dataflow operators and links to decentralized storage (e.g. arweave) from which to pull their definitions and bytecode.
- Each element of every stream is partitioned into
num_workers
buckets using a (non-cryptographic) hash function -bucket(x) = h(x || r) % num_workers
. Redundancy can be added by assigning multiple shards to each worker by repeating the process with permuted versions of the hash (e.g.h(x || r || 0)
,h(x || r || 1)
, etc). - If a node receives deltas, and any of the deltas are assigned to a node other than itself, the sender is slashed and removed from the active set.
- When a node receives deltas in a particular stream, it updates its multiset hash for the stream.
- When a node receives an accumulated proof for a stream, it verifies it against the block hash(es) for whatever source chains are needed. If it's invalid, the sender of the proof gets slashed.
- Everything else is exactly the same as in timely dataflow.
This more or less is a way to securely and randomly re-shuffle data shards and/or allow nodes to join/leave during each epoch. This deals with nodes violating the stream partition assignments, nodes executing operators incorrectly, and nodes not sending the deltas they're supposed to (thanks to multiset hashes).
Note that this solution does not yet address data availability for intermediate streams.