This page summarizes protocols, common tools and software for distributed computing.
how to use slurm
scontrol update nodename=<nodename> state=resume
Distributed File Systems
NFS (Network File System) is an open-protocol developed by Sun (only protocol, no implementation details). It was one of the first usage of distributed systems. It enables easy sharing of files across clients and enable centralized administration.
NFS’s protocol is stateless (to achieve crash recovery), each call contains all the necessary information. The protocol usually contains the file handle to identify the file, the file handler can be thought as tuple of (file system identifier, file identifier), the second one can be inode.
To improve the performance, NFS caches the blocks in client memory, but it also introduces the cache consistency issue. To solve the issue, NFS
- flush-on-close: when closing file, its contents are forced into the server
- attribute-cache: check modification by using a local attribute-cache, timer is set to invalidate it.
- cache strategy: files are cached in client local disk
- cache strategy:
in chunk server and no cache client server
Kafka is a general publish/subscribe messaging system, it can publish message from different sources into multiple topics, and let subscriber receive those topics
- denormalization: precompute join queries (to prevent multiple
- caching: add
memcache/ redisabove database (to prevent querying database)
- master/slave replication: master read/write, slaves read
- master/master replication: hard to achieve absolute consistency
- sharding: split database into different ranges
HBase (BigTable) is a sparse multi-dimension map
$$key(row, column, timestamp) \rightarrow value $$
- No query language, just CRUD API (very fast)
- HBase shell
- Java API (python, Scala wrapper)
- builtin REST service
- fast access to a row with
- each row has a small number of column family
- a column family can have multiple columns (can be
verylarge number, can be sparse)
- each cell can have multiple versions indexed by timestamps
The following table is a row example from Bigtable paper, the anchor column family is used to compute page ranks.
Cassandra (dynamo) has no master node
- High Availability, High Partition
- Low Consistency (Eventual Consistency)
- CQL (something like limited SQL)
- Replication: Consistent Hashing
- membership: Gossip-based protocol
- Aggregation: aggregation pipline, mapreduce
CDN (Content Delivery Network)
CDN is generally a concept of geographically-aware caching
LB can server 1M queries per second
DNS side Network Load Balancing (L3/L4)
HTTP Load Balancing (L7)
This section collects some articles about some actual systems deployed globally.
This section is some tips and my personal measurement of several cloud vendors
- ping latency (same zone us-centrala): 1 ms
my measurements with iperf.
- same zone: 1.94 Gbits/sec
- same region: 1.93Gbits/sec
- us-europe: 236Mbits/sec
- us-asia: 143Mbits/sec