0x371 Computing

1. Batch Processing
2. Stream Processing
- 2.1. MillWheel
3. Reference

This note is about topics of shared-nothing parallel computing

1. Batch Processing

1.1. MapReduce

MapReduce (Dean and Ghemawat, 2008)¹ is a programmming model for batch processing

1.1.1. Abstraction

Mapreduce has three phases:

Map phase

read a collection of values or KV pairs from input source
Mapper is applied to each input element and emits zero or more KV pairs.

Shuffle phase:

group KV pairs with the same key from Mapper and outputs each key and a stream of values from that key

Reduce phase:

take the key-grouped data and reuces vlaues in parallel

1.2. Spark

1.2.1. Lower Abstraction (RDD)

RDD (Resilient Distributed Dataset)

low-level API about how to do something
compile-time type safety (e.g. Integer, String, Binary)

There are two manipulations:

transformations: lazy operations. e.g. map, filter, flatMap, reduceKey
actions: compute a result (e.g. count, collect, reduce, take) they are eager and trigger execution

1.2.2. Higher Abstraction

high level abstractions are automatically converted into RDD via plan optimization

Spark SQL

DataFrame API

Dataset API

1.3. FlumeJava

FlumeJava (Chambers et al., 2010)² is a library representing immutable parallel collections and its parallel processing.

Execution is based on dataflow graph contructed internally, which get optimized on appropriate primitives (e.g. MapReduce)

1.3.1. PCollection

PCollection<T> is an unordered immutable bag of elements derived from Java Collection<T>.

1.3.2. PTransform

PTransform is the name used in Apache Beam API. It is to transform PCollection. The general-purpose predefined PTransform are the followings:

1.3.2.1. ParDo

parallelDo (ParDo) is one of he main way to manipulate PCollection is to invoke data-parallel operation. It takes DoFn object

# The input PCollection of Strings.
words = ...

# The DoFn to perform on each element in the input PCollection.
class ComputeWordLengthFn(beam.DoFn):
  def process(self, element):
    return [len(element)]

# Apply a ParDo to the PCollection "words" to compute lengths for each word.
word_lengths = words | beam.ParDo(ComputeWordLengthFn())

There are a few lightweight DoFn provided

beam.Map: emits one output for each input element (e.g. words | beam.Map(len))
beam.FlatMap: emits zero, one or more output for each input element
beam.Filter: copies input to outputs conditionally

1.3.2.2. GroupByKey

GroupByKey is a reduction operation, analogous to the Shuffle phrase in mapreduce. It has the signature: PTable<K,V> -> PTable<K, Collection<V>>

1.3.2.3. Combine

Combine is a transform to reduce a collections of elements. The combining function has to be commutative and associative to enable optimizaiton.

1.3.3. Optimization

FlumeJava optimizer transforms an execution plan into an optimized through a series of independent graph transformation.

ParallelDo Fusion:: separate parallelDo f and g are replaced by a single \(g \circ f\)

Combiner Lift: Input elements are combined per key and window before they are shuffled

2. Stream Processing

2.1. MillWheel

MillWheel (Akidau et al., 2013)³

latency reduction
unbounded data
correctedness
out-of-order processing

3. Reference

[1] Docker Deep Dive: Zero to Docker in a single book

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113. ↩
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. ACM Sigplan Notices, 45(6):363–375. ↩
Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. Millwheel: Fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment, 6(11):1033–1044. ↩