World of WEB (WOW): 2020

This post is a follow up of this post. In Previous Post, we discussed about Virtual Machines & Containers architecture. In this post, we will discuss about Micro VMs & Unikernels.

Micro VMs are light weight VMs which provide isolation by Hardware Virtualization & improved security. Micro VM have a minified kernel, it looks like Container & behaves like a VM. This might be confusing, there is a difference Micro VMs provide Hardware backed isolation and Containers are Operating System based Isolation.

Firecracker is a Virtual Machine Monitor a.k.a Hypervisor which create & manage microVMs using KVM solution. As we discussed in the previous post, KVM uses QEMU to provide Hardware Emulation for I/O Operations.

Firecracker is written in Rust (one of the memory safe language). Currently, AWS Lambda is backed by Firecracker. It intercepts the user request, first spins up a Micro VM & then creates the function to serve requests.

Firecracker provides strong isolation by hardware virtualization. It jails every Micro VMs using a Jailer program, acts as a second line of defense making it more robust.

Its start up time is 125ms which can be reduced when started from a snapshot (4 ms).

In 2014, AWS Lambda is supported by EC2, where an EC2 instance is instantiated for every customer. Then in 2018, Firecracker was introduced to reduce this overhead, it is designed for short lived processes, super powerful when creating a lot of instances (imagine the same with having a full blown VM for every instance). With this lambda serves, around 1.4 trillion request per month.

Reference: https://firecracker-microvm.github.io/

Unikernels are designed to run a single process/application with an immutable OS. It holds a minified version of OS, that is required to run a specific process. It is similar to Container but can run only ONE process. It supports multi-threading, there is no multiple process support hence process scheduler is not required here.

Unikernels runs in the kernel space with single address space model. This is the main difference between VM and Unikernels. VM has Kernel Address Space & User Address Space model, user address Space (where user applications run) are translated/mapped to kernel address space for executing instructions (this prevents the kernel failure due to user application).

Application is compiled to build Unikernel image whose size will be between 500 KB - 32 MB. Unikernels do not have shell & SYSCALLs (since it directly runs on kernel), only function calls are possible which is based on memory address (hard for a attracker to track memory addresses).

Unikernels provide increased security with immutable images & reduced attack surface, faster boot time & more optimization

Reference: http://unikernel.org/blog/2017/unikernels-are-secure

There are various flavours of Unikernels available, Include OS (C++), Mirage OS, Click OS.

OSv is an Operating System specifically designed to run single application in VM. It supports Java, Node, C, C++ applications.

UniK is a compilation & orchestration tool for Unikernel which supports Golang, Java, Nodejs applications.

Firecracker supports both OSv & UniK images.

Reference: https://www.mikelangelo-project.eu/technology/universal-unikernel-osv/

Summary of Various Techniques:

In this blogpost, I will share my understanding of Virtual Machines & Containers architectures. Before that, the intention of this blog post is not to compare Virtual Machines with Containers as their purpose of existence is different.

We will start with Virtual Machines.

Virtual Machines were spin up by Hypervisor. Hypervisor is a software which creates & runs Virtual Machines on top of Physical Hardware. It allows to host multiple Guest OS on top of a single host machine at the same time. Traditional Hypervisor protects Hardware & BIOS, Virtualize CPU, Storage & Network

Hypervisors can be classified into Two Categories.

Type 1: Native/Bare Metal Hypervisor

This kind of Hypervisors directly run on Hardware. It spans VMs. Public Cloud Platforms like AWS/Azure/GCP uses these Hypervisors to spin up new environments on top of their physical servers.

eg. Xen/KVM (AWS), Hyper-V (Azure), VMWare ESXI

We will discuss in detail in the upcoming sections.

Type 2: User Space Hypervisor

This kind of Hypervisors run on top of the Host Machine OS. These are similar to other applications in the system with special privileges. Guest OS is spun on top of the Host OS. Required Memory is carved from Host Memory. Say you have 8 GB RAM in the Host Machine, if you required 4 GB for the VM, then this memory is reserved from the memory available in the Host Machine.

eg. Virtual Box

Let’s deep dive into Type 1 Hypervisor. Security is achieved via the CPU Protection Rings. There are four different rings available from Ring "0" to Ring "3", where Ring "0" has highest level of privilege usually Kernel/Supervisor runs here. Hypervisor manages or supervises multiple supervisors; it runs in Ring "0". Ring "3" is the least level of privilege where user applications will be running.

Since Hypervisors run in Ring "0" then Guest OS hosted by these Hypervisors will have same privileges which can be dangerous. Guest OS should not have direct access to Ring "0" Operations.

This problem can be overcome by 3 different approaches,

Full Virtualization – This approach emulates all the operations of Ring "0". This is Full Emulation due to which performance will be degraded.
Para Virtualization – Hypervisors provides API to perform Operations in Ring "0" with this Guest OS will not have to run in Ring 0 but Guest OS should be modified to access these API. Performance will be better than Full Virtualization, but Guest OS modification is required.
Hardware Virtualization – Intel/AMD added this feature in the CPU Chips to run Guest OS to perform Ring "0" operations using special instructions. True Ring "0" is protected. Since there is NO Emulation, performance will be great. Guest OS will run in a Virtual Ring "-1", in case of failure Guest OS will not bring the Ring 0 down.

Microsoft Hyper-V: It provides Isolation by creating/managing partitions for every VM. It implements a Parent-Child Partition Relationship, where HyperV runs in Parent Partition. Only Parent Partition has access to Hardware. Parent Partition creates child partition & hosts Guest OS.

There are various components involved using which the child partition access the resources.

Virtualization Service Provider(VSP) provides Hyper Visor functionalities like device access requests, network, IO, etc.

VM Bus is used for inter partition communication, child partition has to communicate with parent partition to execute instructions.

Azure uses Hyper-V for its Server Virtualization. This provides Hardware Virtualization with a blend of minimal emulation.

Kernel based VM (KVM): Kernel basically manages processes, allocation of CPU resources and much more. KVM turns kernel into Hypervisor.

KVM provides Hardware Virtualization using CPU Virtualization Chipsets. It also supports Para-Virtualization using QEMU (IO Emulation).

AWS uses KVM in Project Nitro for Virtualization with supreme performance. Ealier Xen Hypervisors are used.

That’s about Virtual Machines, let's now explore Containers.

Containers can be expressed as Light weight VMs which run on top of Host OS. Containers usually packs all the necessary stuff required to run your application, so it can be deployed in any compatible runtime environment & run your application successfully.

To keep it simple, Containers are outcome of OS Virtualization. VMs are outcome of Full Virtualization.

Single Kernel is shared across all the Containers via Container Engine. Isolation in containers are achieved via Namespaces & C-Groups. Docker is the leading Container Engine in market.

Namespaces are feature of Linux Kernel, It restricts what a process can see, like File System & Networking. This helps to achieve Isolation. There are 7 areas where namespaces can be created/managed to restrict access: IPC, Network, Mount Points, Process ID, Users & Groups, UTS, NIS Domain Name & C-Groups.

C-Groups are used to constraint/limit Memory & CPU resources. It helps to allocate memory/CPU for a container process. These are not hard limits, if resources are available in excess then those will be utilized. eg., You can limit a container to create 5 processes inside a container.

Container Runtime are responsible for setting up Namespaces & C-Groups & execute commands. It manages the Life Cycle of a container.

Open Container Initiative (OCI) provides industry standards for container formats & runtimes.

https://github.com/opencontainers

OCI contains two specifications:

Runtime Spec: Provides guideline to manage the container like create, run, delete, etc.

Image Spec: Provides guideline to manage images like push, pull, cache, etc.

CRI-O is an implementation of Kubernetes to enable the usage of OCI compatible runtimes. That’s how Kubernetes manages to execute docker images. It also supports Kata Containers.

Pseudo Code to create a container:

Create a base filesystem (like file system from busybox/alphine)
Create Namespaces for pid, network, fs mount points using SYSCALL (SYSCALL is request for a service from kernel of OS)
Set your executable path
Use SYSCALL to set hostname & chroot
Optionally, setup C-Groups
Finish, hand it over to runtime to run the container

Thankfully these are taken care by the container engine, we can concentrate on our applications.

I hope, this post would have helped you with the concepts of Virtual Machines & Containers. I will write a follow up on the recent/upcoming developments in serverless computing.

World of WEB (WOW)

Saturday, 26 September 2020

Micro VMs & Unikernels

Thursday, 24 September 2020

Virtual Machines & Containers

Recent Posts

Micro VMs & Unikernels

Older Posts