This article was first published on the GTOC WeChat account. It summarizes the virtual-machine introduction chapter from the book Virtual Machine Systems and Processes: A General-Purpose Platform.
Abstraction layers
The classic way to manage computer-system complexity is to divide the system into different abstraction layers through a set of well-defined interfaces. Different layers have clear responsibilities: the closer a layer is to the bottom, the more it focuses on hardware implementation; the closer it is to the top, the more it focuses on application logic.
At the same time, abstraction layers allow designers to ignore or simplify low-level implementation details, which makes higher-level component design easier.
A typical example is the operating-system abstraction of a hard disk.
A hard disk is divided into tracks and sectors. After being abstracted by the operating system (file system + disk driver), the disk as seen by applications becomes a collection of files of different sizes.
Upper-layer applications can create, read, and write files without needing to care what the disk is physically made of or how it is implemented.
The operating system abstracts a physical disk into files: applications see files, while the file system and disk driver map them onto tracks and sectors on the real disk.
Well-defined interfaces break the computer-design problem into pieces, allowing hardware and software design to proceed more or less independently.
The instruction set architecture (ISA) is such an interface.
For example, a hardware team designs a processor based on the RISC-V instruction set, and a compiler team develops a compiler that translates high-level languages into that instruction set. As long as both sides follow the instruction-set specification, the compiled software can run correctly on that processor.
Another important standardized interface in a computer system is the operating-system interface, which is defined as a set of function calls.
Application developers do not need to care about the operating system’s internal implementation. They only need to follow the constraints of the operating-system interface to develop their own applications, which allows software and hardware to evolve and be upgraded independently.
However, well-defined interfaces also have limitations. Subsystems or components designed for different specifications are difficult to reuse and coordinate to work together. A common example is the tight coupling between a particular operating system and a particular instruction set.
Once an application is distributed in binary form, it becomes difficult to run it on hardware platforms designed for different instruction-set specifications.
Below the hardware/software interface, hardware resources also constrain the flexibility of the software system.
For example, an operating system developed for a single processor or for a multiprocessor with shared storage is often designed to directly manage hardware resources.
In that case, the system’s hardware resources are managed by a single operating system, which in turn limits system flexibility, including the flexibility available to upper-layer application software.
In particular, in multi-user or user-group sharing scenarios, security and fault-isolation requirements further reduce system flexibility.
Virtualization from the Perspective of Interfaces
Virtualization provides a way to relax the above constraints and increase flexibility. It provides stronger guarantees, especially in resource utilization, resource isolation, and security.
When a system (or subsystem), such as a processor, memory, or an input/output device, is virtualized, its interface and all resources visible through that interface are mapped onto the interface and resources of the real system that implements it.
Formally, virtualization turns one real host system into multiple different guest systems. Guest systems have the same implementation details as the host system, which is the biggest difference from abstraction.
Virtualization does not need to hide implementation details.
Guest states S_i and S_j are mapped through virtualization to host states S'_i and S'_j, and guest operations e(S_i) and e(S_j) are translated to V(S_i) and V(S_j) on the host.
Let us look at a virtual-disk example to understand how virtualization can provide different interfaces or resources at the same abstraction layer.
Virtual-disk software places each virtual disk on top of a large host file; the host file system translates guest read and write requests to the real disk.
In some applications, a complete large hard disk needs to be split into many small virtual disks.
Virtual-disk software uses the file abstraction provided by the operating system as an intermediate step, mapping each virtual disk to a large file on the real disk.
Each virtual disk superficially contains logical tracks and sectors. Although its capacity is smaller than that of the large disk, its details are otherwise identical. Read and write operations on the virtual disk are mapped first to file reads and writes on the host system, and then to reads and writes on the corresponding real disk.
From Virtualization to Virtual Machines
The concept of virtualization can be applied not only to a subsystem, but also to an entire machine.
A virtual machine (VM) adds a layer of software on top of a real machine to support the desired virtual-machine architecture.
For example, the well-known VMware can run different operating systems while sharing hardware resources with the host.
In general, virtual machines can bypass the compatibility limits of real machines and the limits imposed by hardware resources, thereby improving software portability and flexibility.
Virtual machines have a wide range of use cases. For example, multiple VM instances can provide the operating-system environment required by an individual or a user group on a single hardware platform, while also providing resource isolation and stronger security guarantees.
A large multiprocessor server can also be divided into multiple virtual servers, enabling dynamic balancing of server resource management.
Virtual machines can also use emulation to provide cross-platform software compatibility.
For example, a platform that implements the x86 instruction set can be transformed into a virtual platform capable of running the RISC-V instruction set.
This compatibility can be provided at the system level (using QEMU system emulation to run Windows/Linux) or at the program/process level (using QEMU user-mode emulation to run WPS applications or certain games).
In addition to emulation, virtual machines can also provide dynamic, online binary optimization.
Finally, through emulation, virtual machines can support programs built for existing standard instruction sets while also implementing new proprietary instruction sets, such as very long instruction words (VLIWs).
Beyond providing virtualization for real machines, virtual machines are also widely used for cross-platform support in high-level languages.
Programs written in such high-level languages are precompiled into binary code for the virtual machine, and then any real machine that implements that virtual machine can run the compiled code. Examples include the Java JVM and the Python PVM.
Conclusion
Operating-system developers, language designers, compiler developers, and hardware designers all study and build virtual machines from their own perspectives.
Although each virtual-machine application has its own unique properties, their basic concepts and techniques have many commonalities across the VM space.
Because different virtual-machine architectures and supporting technologies are developed by different working groups, it is especially important to unify the knowledge system for virtual machines and to understand the common supporting technologies behind the various forms of virtualization.
In the book Virtual Machine Systems and Processes: A General-Purpose Platform, virtual-machine families are discussed using a unified approach. The book covers the common supporting technologies of virtual machines and explains their generality by exploring multiple applications.
We will continue this discussion in future articles.