0x373 Storage

1. NFS
2. AFS
3. HDFS
4. GFS
5. IPFS

1. NFS

Filesystem (NFS (Network File System)) NFS is an open-protocol developed by Sun (only protocol, no implementation details). It was one of the first usage of distributed systems. It enables easy sharing of files across clients and enable centralized administration.

NFS's protocol is stateless (to achieve crash recovery), each call contains all the necessary information. The protocol usually contains the file handle to identify the file, the file handler can be thought as tuple of (file system identifier, file identifier), the second one can be inode.

To improve the performance, NFS caches the blocks in client memory, but it also introduces the cache consistency issue. To solve the issue, NFS

flush-on-close: when closing file, its contents are forced into the server
attribute-cache: check modification by using a local attribute-cache, timer is set to invalidate it.

2. AFS

cache strategy: files are cached in client local disk

3. HDFS

NameNode

namenode stores metadata
avoid sigle point failure after Hadoop2. See document here.
Basically it has two or more redundant NameNodes running in the same cluster. One is active, the other are in standby. namenodes have access to shared storage device (e.g: NFS mount from NAS)

DataNode

datanode store actual data
use underlying OS file system to store blocks

Client

talk with NameNode to fetch metadata first
then talk to dataNode for read/write

4. GFS

cache strategy: no cache in chunk server and client server

0x373 Storage

1. NFS

2. AFS

3. HDFS

4. GFS

5. IPFS