Skip to content

0x371 Computing

This note is about topics of shared-nothing parallel computing

1. MPI

MPI defines the syntax and semantics of portable message-passing libraries routines.

Check this tutorial

1.1. Communicator

Communicator objects connect groups of processes in the MPI session. the default one is MPI_COMM_WORLD

1.2. Point-to-Point Communication

1.2.1. Send and Recv

MPI_send, MPI_Recv are both blocking API sending number of count of type datatype in data stream with following syntax

MPI_Send(
    void* data,
    int count,
    MPI_Datatype datatype,
    int destination,
    int tag,
    MPI_Comm communicator)

MPI_Recv(
    void* data,
    int count,
    MPI_Datatype datatype,
    int source,
    int tag,
    MPI_Comm communicator,
    MPI_Status* status)

1.2.2. All-to-All

1.3. Collectives Operations

1.4. Python API

mpi4py provides a python API, see the following example:

It can be launched with something like mpirun -n 8 /python hello_mpi.py

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
world = comm.Get_size()

import os

os.environ["CUDA_VISIBLE_DEVICES"] = str(rank)
print("rank ", rank, " world ", world)

2. Batch Processing

2.1. MapReduce

2.1.1. Abstraction

Mapreduce has three phases:

Map phase

  • read a collection of values or KV pairs from input source
  • Mapper is applied to each input element and emits zero or more KV pairs.

Shuffle phase:

  • group KV pairs with the same key from Mapper and outputs each key and a stream of values from that key

Reduce phase:

  • take the key-grouped data and reuces vlaues in parallel

2.2. Spark

2.2.1. Lower Abstraction (RDD)

RDD (Resilient Distributed Dataset)

  • low-level API about how to do something
  • compile-time type safety (e.g. Integer, String, Binary)

There are two manipulations:

  • transformations: lazy operations. e.g. map, filter, flatMap, reduceKey
  • actions: compute a result (e.g. count, collect, reduce, take) they are eager and trigger execution

2.2.2. Higher Abstraction

high level abstractions are automatically converted into RDD via plan optimization

Spark SQL

DataFrame API

Dataset API

2.3. FlumeJava

FlumeJava is a library representing immutable parallel collections and its parallel processing.

Execution is based on dataflow graph contructed internally, which get optimized on appropriate primitives (e.g. MapReduce)

2.3.1. Abstraction

PCollection\<T> is an unordered immutable bag of elements derived from Java Collection\<T>.

2.3.2. Elementwise Transforms

The main way to manipulate PCollection is to invoke data-parallel operation primitives such as parallelDo or ParDo

  • Map: emits one output for each input element
  • FlatMap: emits zero, one or more output for each input element
  • Filter: copies input to outputs conditionally

2.3.3. Aggregation

3. Stream Processing

3.1. MillWheel

https://research.google/pubs/millwheel-fault-tolerant-stream-processing-at-internet-scale/

4. Reference

  • [1] Docker Deep Dive: Zero to Docker in a single book