Dataflow

Differential Dataflow is already a well-defined computational model, and there already exists a robust distributed implementation. Thus we can implement Maru operators "abstractly" as a composition of multiple differential dataflow operators.

Even though the existing implementation relies upon a centralized scheduler, a decentralized implementation can be made of the same computational model, upon which we can run Maru operators just the same as in the centralized version.

Aside: You might ask the question, 'why not just make it decentralized from the beginning"? Because proving complex streaming operations is hard enough, not to mention the fact that we need to build a business. One problem at a time, anon.

In this article, we refer to Multiset Hashes as "MSH". See the chapter about them in background-knowledge for more information before reading this article.

Maru Operator Definition

A Maru operator is the combination of a set of input maru streams, output maru streams, some form of logic that, given a delta, produces another delta, and an optional arrangement (In Maru's initial release, arrangements will not be supported).

A "maru stream" is defined as a group of three independent differential dataflow streams:

  • Delta stream
    • stream of differential dataflow "deltas".
  • cumulative MSH stream
    • each element is the cumulative multiset hash of every delta that came "before" that element's timestamp.
  • cumulative proof stream
    • Proofs take the cumulative multiset hash of both the previous operator's input and the prevoius operator's output as public input, and prove that the collection committed to by the output MSH is correct given the collection committed to by the input MSH

Unary Operator Implementation

A unary operator's execution goes like this:

  1. for each input delta, run the operator's logic to determine the corresponding set of output deltas. Emit the execution trace for the logic.
  2. concurrently:
    • a. generate / accumulate proofs:
      • i. for each execution trace, generate a STARK-AIR proof that the logic was executed correctly.
      • ii. recursively accumulate each of these proofs into a single "cumulative operator proof", and, in doing so, "build up" the "expected" cumulative multiset hash in the proofs
      • iii. recursively accumulate this "cumulative operator proof" with that of the previous operator's cumulative stream proofs, checking that the input MSH for the current operator and the output MSH for the previous operator match. The result is the output cumulative proof stream.
    • b. update output's cumulative multiset hash with new deltas produced by the operator's logic.
  3. output the output delta stream, cumulative MSH stream, and cumulative proof stream.

We can implement this using differential dataflow primitives using the configuration shown in the diagram below.

Unary Maru operator using differential dataflow operators

Non-Unary Operator Implementation

Operators with different numbers of inputs and/or outputs are implemented the same way, simply duplicating the relevant parts.

Firstly, we generate one STARK proof for the emission of all outputs given all inputs. Then we simply have multiple recursive accumulation pipelines running in parallel, one for each output maru stream. In each of those pipelines, we have a sequence of "binary map" operators that merge the cumulative operator proofs with the previous operator's cumulative stream proofs, one for each input stream.