In this blogpost, I will share my understanding of Virtual Machines & Containers architectures. Before that, the intention of this blog post is not to compare Virtual Machines with Containers as their purpose of existence is different.
We will start with Virtual Machines.
Hypervisors can be classified into Two Categories.
Type 1: Native/Bare Metal Hypervisor
This kind of Hypervisors directly run on Hardware. It spans VMs. Public Cloud Platforms like AWS/Azure/GCP uses these Hypervisors to spin up new environments on top of their physical servers.
eg. Xen/KVM (AWS), Hyper-V (Azure), VMWare ESXI
We will discuss in detail in the upcoming sections.
Type 2: User Space Hypervisor
This kind of Hypervisors run on top of the Host Machine OS. These are similar to other applications in the system with special privileges. Guest OS is spun on top of the Host OS. Required Memory is carved from Host Memory. Say you have 8 GB RAM in the Host Machine, if you required 4 GB for the VM, then this memory is reserved from the memory available in the Host Machine.
eg. Virtual Box
Let’s deep dive into Type 1 Hypervisor. Security is achieved via the CPU Protection Rings. There are four different rings available from Ring "0" to Ring "3", where Ring "0" has highest level of privilege usually Kernel/Supervisor runs here. Hypervisor manages or supervises multiple supervisors; it runs in Ring "0". Ring "3" is the least level of privilege where user applications will be running.
Since Hypervisors run in Ring "0" then Guest OS hosted by these Hypervisors will have same privileges which can be dangerous. Guest OS should not have direct access to Ring "0" Operations.
This problem can be overcome by 3 different approaches,
- Full Virtualization – This approach emulates all the operations of Ring "0". This is Full Emulation due to which performance will be degraded.
- Para Virtualization – Hypervisors provides API to perform Operations in Ring "0" with this Guest OS will not have to run in Ring 0 but Guest OS should be modified to access these API. Performance will be better than Full Virtualization, but Guest OS modification is required.
- Hardware Virtualization – Intel/AMD added this feature in the CPU Chips to run Guest OS to perform Ring "0" operations using special instructions. True Ring "0" is protected. Since there is NO Emulation, performance will be great. Guest OS will run in a Virtual Ring "-1", in case of failure Guest OS will not bring the Ring 0 down.
Microsoft Hyper-V: It provides Isolation by creating/managing partitions for every VM. It implements a Parent-Child Partition Relationship, where HyperV runs in Parent Partition. Only Parent Partition has access to Hardware. Parent Partition creates child partition & hosts Guest OS.
There are various components involved using which the child partition access the resources.
Virtualization Service Provider(VSP) provides Hyper Visor functionalities like device access requests, network, IO, etc.
VM Bus is used for inter partition communication, child partition has to communicate with parent partition to execute instructions.
Azure uses Hyper-V for its Server Virtualization. This provides Hardware Virtualization with a blend of minimal emulation.
Kernel based VM (KVM): Kernel basically manages processes, allocation of CPU resources and much more. KVM turns kernel into Hypervisor.
KVM provides Hardware Virtualization using CPU Virtualization Chipsets. It also supports Para-Virtualization using QEMU (IO Emulation).
AWS uses KVM in Project Nitro for Virtualization with supreme performance. Ealier Xen Hypervisors are used.
That’s about Virtual Machines, let's now explore Containers.
Containers can be expressed as Light weight VMs which run on top of Host OS. Containers usually packs all the necessary stuff required to run your application, so it can be deployed in any compatible runtime environment & run your application successfully.
To keep it simple, Containers are outcome of OS Virtualization. VMs are outcome of Full Virtualization.
Single Kernel is shared across all the Containers via Container Engine. Isolation in containers are achieved via Namespaces & C-Groups. Docker is the leading Container Engine in market.
Namespaces are feature of Linux Kernel, It restricts what a process can see, like File System & Networking. This helps to achieve Isolation. There are 7 areas where namespaces can be created/managed to restrict access: IPC, Network, Mount Points, Process ID, Users & Groups, UTS, NIS Domain Name & C-Groups.
C-Groups are used to constraint/limit Memory & CPU resources. It helps to allocate memory/CPU for a container process. These are not hard limits, if resources are available in excess then those will be utilized. eg., You can limit a container to create 5 processes inside a container.
Container Runtime are responsible for setting up Namespaces & C-Groups & execute commands. It manages the Life Cycle of a container.
Open Container Initiative (OCI) provides industry standards for container formats & runtimes.
https://github.com/opencontainers
OCI contains two specifications:
Runtime Spec: Provides guideline to manage the container like create, run, delete, etc.
Image Spec: Provides guideline to manage images like push, pull, cache, etc.
CRI-O is an implementation of Kubernetes to enable the usage of OCI compatible runtimes. That’s how Kubernetes manages to execute docker images. It also supports Kata Containers.
Pseudo Code to create a container:
- Create a base filesystem (like file system from busybox/alphine)
- Create Namespaces for pid, network, fs mount points using SYSCALL (SYSCALL is request for a service from kernel of OS)
- Set your executable path
- Use SYSCALL to set hostname & chroot
- Optionally, setup C-Groups
- Finish, hand it over to runtime to run the container
Thankfully these are taken care by the container engine, we can concentrate on our applications.
I hope, this post would have helped you with the concepts of Virtual Machines & Containers. I will write a follow up on the recent/upcoming developments in serverless computing.






No comments:
Post a Comment
Note: only a member of this blog may post a comment.