0x373 Storage
1. NFS
Filesystem (NFS (Network File System)) NFS is an open-protocol developed by Sun (only protocol, no implementation details). It was one of the first usage of distributed systems. It enables easy sharing of files across clients and enable centralized administration.
NFS's protocol is stateless (to achieve crash recovery), each call contains all the necessary information. The protocol usually contains the file handle to identify the file, the file handler can be thought as tuple of (file system identifier, file identifier), the second one can be inode.
To improve the performance, NFS caches the blocks in client memory, but it also introduces the cache consistency issue. To solve the issue, NFS
- flush-on-close: when closing file, its contents are forced into the server
- attribute-cache: check modification by using a local attribute-cache, timer is set to invalidate it.
2. AFS
cache strategy: files are cached in client local disk
3. HDFS
NameNode
- namenode stores metadata
- avoid sigle point failure after Hadoop2. See document here.
- Basically it has two or more redundant NameNodes running in the same cluster. One is active, the other are in standby. namenodes have access to shared storage device (e.g: NFS mount from NAS)
DataNode
- datanode store actual data
- use underlying OS file system to store blocks
Client
- talk with NameNode to fetch metadata first
- then talk to dataNode for read/write
4. GFS
cache strategy: no cache in chunk server and client server