0x374 Virtualization
1. Scheduler
1.2. slurm
how to use slurm
scontrol
scontrol update nodename=<nodename> state=resume
1.3. YARN
ResourceManager
- keeps the metadata of jobs
- hosts on a different host from HDFS NameNode
NodeManager
- run on each node, co-located with HDFS DataNode
- manage YARN container (resource allocation done by resourcemanager)
Orchestration (Kubernetes)
borg is a container management system at Google, built to manage long-running services and batch jobs.
Inspired by Borg, Kubernetes was built to manage long-running proceses, designed to orchestrate multiple micro-services.
Unlike HPC systems such as slurm, which assumes fixed system size and infinite workload, cloud orchestration assumes
- "infinite" resources are available
- workload is finite
Components
A kubernetes cluster consists of:
- control plane (master node): manages the worker
- worker node: run containerized applications
Control Plane has the following components:
- API server (kube-apiserver): frontend of Kubernetes
- etcd key value store for cluster data
- scheduler: select a node for newly created pods
- controller manager: manages a few controlers such as node controler, job controler etc.
Worker Node has teh following components:
- kubelet: agent that run on each node, which make sure container are running in a Pod
- kube-proxy network proxy maintains network rules
- container runtime: responsible for managing the execution and lifecycle of containers.
Reference
Borg, Omega, and Kubernetes Lessons learned from three container-management systems over a decade