In this file we document how the GraphZeppelin control works using flowcharts and step by step descriptions. In our flow chart we connect objects A and B with a directed edge A --> B that indicates object A calling a method of object B.
The driver is responsible for managing and coordinating the CPU WorkerThreadGroup, the GutteringSystem, and the SketchAlgorithm. The driver can do this for any vertex based sketch algorithm (which the driver is templatized upon) so long as the algorithm implements the required functions.
Steps to initialize the sketch algorith and driver.
- User constructs the
SketchAlgorithm. - User constructs the
GraphSketchDriver. GraphSketchDriverpullsnum_verticesfrom algorithm.GraphSketchDriverrequests the desired batch size from the algorithm.GraphSketchDrivertells the algorithm to allocate space for its worker threads.GraphSketchDriverconstructsGutteringSystem.GraphSketchDriverconstructsWorkerThreadGroup.
flowchart TD
A[User] -->|2. Construct| B[GraphSketchDriver]
A -->|1. Construct| C
B -->|3. get_num_vertices\n4. get_desired_update_batch\n5. allocate_worker_memory\n8. print_configuration| C[Sketch Algorithm]
B -->|6. Construct| D[GutteringSystem]
B -->|7. Construct| E[WorkerThreadGroup]
When processing a stream, the driver coordinates its own threads, the GutteringSystem which batches updates, the WorkerThreadGroup which applies sketch updates, and application of updates to the graph sketch algorithm. Once the setup steps 1-2 complete, for each stream update until the breakpoint (either query or end of stream) we perform steps 3-7.
- User tells the driver to read stream until a query breakpoint or end of stream.
GraphSketchDriverresumes its CPU worker threads (may have been paused by a query).- When processing an update
GraphSketchDriverfirst calls the algorithm'spre_insertfunction to allow the algorithm to do per update work before the updates are batched. Maintaining a cached query result for Connected Components is an example of why we allow this option. GraphSketchDriverinserts the update into theGutteringSystem.WorkerThreadGrouppulls a batch of updates from theGutteringSystem.WorkerThreadGroupdoes not know what sketch algorithm we are running so callsGraphSketchDriver'sbatch_callbackfunction.GraphSketchDrivercalls the algorithm'sapply_update_batch()function.
flowchart TD
A[User] -->|1. process_stream_until| B[GraphSketchDriver]
B -->|2. resume| E[WorkerThreadGroup]
B -->|4. insert| D[GutteringSystem]
E -->|5. get_data| D
E -->|6. batch_callback| B
B --->|3. pre_insert\n7. apply_update_batch| C[Sketch Algorithm]
To perform a query, the user must first call driver.prep_query() in which the driver ensures the query is safe to perform. Specifically, the driver must ensure that all stream updates have been processed before allowing the query to continue. If step 2 has_cached_query() returns true, the driver can safely skip steps 3-4 and immediately allow the user to perform the query.
- User wants to preform a query so calls
prep_query. GraphSketchDriverchecks if the algorithm has a valid cached query answer. In this case it can immediately return control to the user.GraphSketchDriverflushes theGutteringSystemof all updates.GraphSketchDrivercallspauseto wait for theWorkerThreadGroupto finish applying all updates.- The user preforms the desired query. For example
connected_components().
flowchart TD
A[User] -->|1. prep_query| B[GraphSketchDriver]
A -->|5. query| C
B -->|2. has_cached_query| C[Sketch Algorithm]
B -->|3. flush| D[GutteringSystem]
B -->|4. pause| E[WorkerThreadGroup]