Skip to content

0x374 Virtualization

1. Scheduler

1.2. slurm

how to use slurm

scontrol

scontrol update nodename=<nodename> state=resume

1.3. YARN

ResourceManager

  • keeps the metadata of jobs
  • hosts on a different host from HDFS NameNode

NodeManager

  • run on each node, co-located with HDFS DataNode
  • manage YARN container (resource allocation done by resourcemanager)

Orchestration (Kubernetes)

borg is a container management system at Google, built to manage long-running services and batch jobs.

Inspired by Borg, Kubernetes was built to manage long-running proceses, designed to orchestrate multiple micro-services.

Unlike HPC systems such as slurm, which assumes fixed system size and infinite workload, cloud orchestration assumes

  • "infinite" resources are available
  • workload is finite

Components

A kubernetes cluster consists of:

  • control plane (master node): manages the worker
  • worker node: run containerized applications

kubernetes

Control Plane has the following components:

  • API server (kube-apiserver): frontend of Kubernetes
  • etcd key value store for cluster data
  • scheduler: select a node for newly created pods
  • controller manager: manages a few controlers such as node controler, job controler etc.

Worker Node has teh following components:

  • kubelet: agent that run on each node, which make sure container are running in a Pod
  • kube-proxy network proxy maintains network rules
  • container runtime: responsible for managing the execution and lifecycle of containers.

Reference

Borg, Omega, and Kubernetes Lessons learned from three container-management systems over a decade