Sparkler: A KVM-based Virtual Machine Manager

[Join the discussion on Hacker News here.]

Serverless computing is quite the rage these days and AWS Lambda is on the forefront of this. A while ago, they released Firecracker, the engine behind Lambda. Unsurprisingly, it was based on Linux’s KVM (Kernel-based Virtual Machine) technology, but what was surprising was how it gave up the ability to run all kinds of operating systems to become this super-sleek virtual machine manager that can run only Linux, but can bring up a virtual machine in about 125 ms! I take a deep look at how Firecracker works, along with an analysis of what it does well and what it doesn’t in this article. But I wanted to go a lot deeper than just looking at Firecracker’s code. How about building a tiny virtual machine manager (VMM) and a super tiny “operating system” to understand how KVM really works? That’s exactly what we’ll be doing with Sparkler.

Sparkler: A light-weight Firecracker

While it certainly is fun reading Firecracker’s source code and figuring out what is going on under the hood, it is not as much fun as firing up your favorite editor and whipping up a lightweight virtual machine environment under Linux. With Sparkler, will build a virtual machine monitor (VMM) that manages a virtual machine while providing a certain environment to the virtual machine, which it runs. You can find Sparkler’s source code here on Github. We will also write a tiny “operating system” which will run inside the virtual machine. The VMM emulates some interesting hardware: a device that can read the latest tweet from Command Line Magic’s Twitter handle, a device that can get the weather from certain cities, another device that can read fetch the latest air quality measurements from certain cities and finally a console device that lets the virtual machine read the keyboard and output text to the terminal.

AWS Firecracker uses Linux’s KVM virtualization toolkit to create and run virtual machines. As we progress, we’ll see how exactly Firecracker’s awesome speed and security are achieved. But first, let’s lay down some groundwork to better understand how we can take advantage of Linux’s KVM (Kernel-based Virtual Machine) to build something like Firecracker. To demonstrate how this works, we build Sparkler, a lightweight virtual environment. The Sparkler environment or the virtual machine monitor (VMM) is written in C and is a KVM-based virtual machine, while the “operating system” we run inside that environment is written in assembly language. The Sparkler VM has an interesting structure, unlike any other virtual machine you might know of. Born in the internet age, it is a truly native citizen.

Sparkler Architecture

The Sparkler virtual machine exposes 4 devices and here is what these “devices” do:

  • Console: this device is like a serial port. It allows the virtual machine to display information and also get user input via the keyboard where required.
  • Twitter device: reading from this device makes available the latest tweet from one of my favorite Twitter handles, @climagic.
  • Weather Info device: Reading from this device, the virtual machine can get the latest weather forecast for 6 different cities.
  • Air Quality Info device: This device makes available air quality information for 6 different cities

Here is how a session in Sparkler looks like

A Sparkler session

Hardware virtualization background

Starting 2005, most Intel chips have had support for hardware virtualization. Before such support was available, virtual machines worked by either emulating every single instruction or at least had to emulate privileged instructions because those cause faults when running in user space. With Intel VT and AMD’s SVM technology, a new processor mode was created where operating system code could run natively, with full speed on the real hardware CPU, without the need to emulate or trap regular or privileged instructions. The hypervisor or the virtual machine monitor can let the CPU know when to “exit”, that is, given control to the hypervisor. For example, on accessing I/O ports with the IN or OUT instructions, when accessing certain privileged CPU registers that are normally only accessed by the operating system or when the virtual machine executes an instruction like CPUID, which provides information on the CPU (the hypervisor might want to control CPU features the guest sees).

In this article, I refer to “VT” technology as a term to include the corresponding, equivalent AMD SVM technology as well.

The Unixification of hardware virtualization

KVM has several interesting features, but we shall look at the interface it provides to Intel’s VT technology. You can program KVM using the well known UNIX file paradigm. Let’s look at some code from main.c in Sparkler.

    kvm = open("/dev/kvm", O_RDWR | O_CLOEXEC);
    if (kvm == -1)
        err(1, "/dev/kvm");

    vmfd = ioctl(kvm, KVM_CREATE_VM, (unsigned long)0);
    if (vmfd == -1)
        err(1, "KVM_CREATE_VM");

    /* Allocate one aligned page of guest memory to hold the code. */
    mem = mmap(NULL, 0x8000, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (!mem)
        err(1, "allocating guest memory");

    /* Read our monitor program into RAM */
    int fd = open("monitor", O_RDONLY);
    if (fd == -1)
        err(1, "Unable to open stub");
    struct stat st;
    fstat(fd, &st);
    read(fd, mem, st.st_size);

    struct kvm_userspace_memory_region region = {
            .slot = 0,
            .guest_phys_addr = 0x1000,
            .memory_size = 0x8000,
            .userspace_addr = (uint64_t)mem,
    };
    ret = ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &region);
    if (ret == -1)
        err(1, "KVM_SET_USER_MEMORY_REGION");

    vcpufd = ioctl(vmfd, KVM_CREATE_VCPU, (unsigned long)0);
    if (vcpufd == -1)
        err(1, "KVM_CREATE_VCPU");

    /* Map the shared kvm_run structure and following data. */
    ret = ioctl(kvm, KVM_GET_VCPU_MMAP_SIZE, NULL);
    if (ret == -1)
        err(1, "KVM_GET_VCPU_MMAP_SIZE");
    mmap_size = ret;
    if (mmap_size < sizeof(*run))
        errx(1, "KVM_GET_VCPU_MMAP_SIZE unexpectedly small");
    run = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpufd, 0);
    if (!run)
        err(1, "mmap vcpu");

    /* Set CPUID */
    struct kvm_cpuid2 *cpuid;
    int nent = 40;
    unsigned long size = sizeof(*cpuid) + nent * sizeof(*cpuid->entries);
    cpuid = (struct kvm_cpuid2*) malloc(size);
    bzero(cpuid, size);
    cpuid->nent = nent;

    ret = ioctl(kvm, KVM_GET_SUPPORTED_CPUID, cpuid);
    if (ret < 0) {
        free(cpuid);
        err(1, "KVM_GET_SUPPORTED_CPUID");
    }

    for (int i = 0; i < cpuid->nent; i++) {
        if (cpuid->entries[i].function == 0x80000002)
            __get_cpuid(0x80000002, &cpuid->entries[i].eax, &cpuid->entries[i].ebx, &cpuid->entries[i].ecx, &cpuid->entries[i].edx);
        if (cpuid->entries[i].function == 0x80000003)
            __get_cpuid(0x80000003, &cpuid->entries[i].eax, &cpuid->entries[i].ebx, &cpuid->entries[i].ecx, &cpuid->entries[i].edx);
        if (cpuid->entries[i].function == 0x80000004)
            __get_cpuid(0x80000004, &cpuid->entries[i].eax, &cpuid->entries[i].ebx, &cpuid->entries[i].ecx, &cpuid->entries[i].edx);
    }

    ret = ioctl(vcpufd, KVM_SET_CPUID2, cpuid);
    if (ret < 0) {
        free(cpuid);
        err(1, "KVM_SET_CPUID2");
    }
    free(cpuid);

    /* Initialize CS to point at 0, via a read-modify-write of sregs. */
    ret = ioctl(vcpufd, KVM_GET_SREGS, &sregs);
    if (ret == -1)
        err(1, "KVM_GET_SREGS");
    sregs.cs.base = 0;
    sregs.cs.selector = 0;
    ret = ioctl(vcpufd, KVM_SET_SREGS, &sregs);
    if (ret == -1)
        err(1, "KVM_SET_SREGS");

    /* Initialize registers: instruction pointer for our code, addends, and
     * initial flags required by x86 architecture. */
    struct kvm_regs regs = {
            .rip = 0x1000,
            .rflags = 0x2,
    };
    ret = ioctl(vcpufd, KVM_SET_REGS, &regs);
    if (ret == -1)
        err(1, "KVM_SET_REGS");

    char *latest_tweet      = NULL;
    char *weather_forecast  = NULL;
    char *aq_report         = NULL;
    int tweet_str_idx       = 0;
    int weather_str_idx     = 0;
    int aq_str_idx          = 0;

    /* Run the VM while handling any exits for device emulation */
    while (1) {
        ret = ioctl(vcpufd, KVM_RUN, NULL);
        if (ret == -1)
            err(1, "KVM_RUN");
        switch (run->exit_reason) {
            case KVM_EXIT_HLT:
                puts("KVM_EXIT_HLT");
                return 0;
            case KVM_EXIT_IO:
                if (run->io.direction == KVM_EXIT_IO_OUT) {
                    switch (run->io.port) {
                        case SERIAL_PORT:
                            putchar(*(((char *)run) + run->io.data_offset));
                            break;
                        default:
                            printf("Port: 0x%x\n", run->io.port);
                            errx(1, "unhandled KVM_EXIT_IO");
                    }
                } else {
                    /* KVM_EXIT_IO_IN */
                    switch (run->io.port) {
                        case SERIAL_PORT:
                            *(((char *)run) + run->io.data_offset) = getche();
                            break;
                        case TWITTER_DEVICE:
                            if (latest_tweet == NULL)
                                latest_tweet = fetch_latest_tweet();
                            char tweet_chr = *(latest_tweet + tweet_str_idx);
                            *(((char *)run) + run->io.data_offset) = tweet_chr;
                            tweet_str_idx++;
                            if (tweet_chr == '\0') {
                                free(latest_tweet);
                                latest_tweet = NULL;
                                tweet_str_idx = 0;
                            }
                            break;
                        case WEATHER_DEVICE_CHENNAI:
                        case WEATHER_DEVICE_DELHI:
                        case WEATHER_DEVICE_LONDON:
                        case WEATHER_DEVICE_CHICAGO:
                        case WEATHER_DEVICE_SFO:
                        case WEATHER_DEVICE_NY:
                            if (weather_forecast == NULL) {
                                char city[64];
                                if (run->io.port == WEATHER_DEVICE_CHENNAI)
                                    strncpy(city, "Chennai", sizeof(city));
                                else if (run->io.port == WEATHER_DEVICE_DELHI)
                                    strncpy(city, "New%20Delhi", sizeof(city));
                                else if (run->io.port == WEATHER_DEVICE_LONDON)
                                    strncpy(city, "London", sizeof(city));
                                else if (run->io.port == WEATHER_DEVICE_CHICAGO)
                                    strncpy(city, "Chicago", sizeof(city));
                                else if (run->io.port == WEATHER_DEVICE_SFO)
                                    strncpy(city, "San%20Francisco", sizeof(city));
                                else if (run->io.port == WEATHER_DEVICE_NY)
                                    strncpy(city, "New%20York", sizeof(city));

                                weather_forecast = fetch_weather(city);
                            }
                            char weather_chr = *(weather_forecast + weather_str_idx);
                            *(((char *)run) + run->io.data_offset) = weather_chr;
                            weather_str_idx++;
                            if (weather_chr == '\0') {
                                free(weather_forecast);
                                weather_forecast = NULL;
                                weather_str_idx = 0;
                            }
                            break;
                        case AIR_QUALITY_DEVICE_CHENNAI:
                        case AIR_QUALITY_DEVICE_DELHI:
                        case AIR_QUALITY_DEVICE_LONDON:
                        case AIR_QUALITY_DEVICE_CHICAGO:
                        case AIR_QUALITY_DEVICE_SFO:
                        case AIR_QUALITY_DEVICE_NY:
                            if (aq_report == NULL) {
                                char city[64];
                                char country[3];
                                if (run->io.port == AIR_QUALITY_DEVICE_CHENNAI) {
                                    strncpy(city, "Chennai", sizeof(city));
                                    strncpy(country, "IN", sizeof(country));
                                }
                                else if (run->io.port == AIR_QUALITY_DEVICE_DELHI) {
                                    strncpy(city, "Delhi", sizeof(city));
                                    strncpy(country, "IN", sizeof(country));
                                }
                                else if (run->io.port == AIR_QUALITY_DEVICE_LONDON) {
                                    strncpy(city, "London", sizeof(city));
                                    strncpy(country, "GB", sizeof(country));
                                }
                                else if (run->io.port == AIR_QUALITY_DEVICE_CHICAGO) {
                                    strncpy(city, "Chicago-Naperville-Joliet", sizeof(city));
                                    strncpy(country, "US", sizeof(country));
                                }
                                else if (run->io.port == AIR_QUALITY_DEVICE_SFO) {
                                    strncpy(city, "San%20Francisco-Oakland-Fremont", sizeof(city));
                                    strncpy(country, "US", sizeof(country));
                                }
                                else if (run->io.port == AIR_QUALITY_DEVICE_NY) {
                                    strncpy(city, "New%20York-Northern%20New%20Jersey-Long%20Island", sizeof(city));
                                    strncpy(country, "US", sizeof(country));
                                }
                                aq_report = fetch_air_quality(country, city);
                            }
                            char aq_chr = *(aq_report + aq_str_idx);
                            *(((char *)run) + run->io.data_offset) = aq_chr;
                            aq_str_idx++;
                            if (aq_chr == '\0') {
                                free(aq_report);
                                aq_report = NULL;
                                aq_str_idx = 0;
                            }
                            break;
                        default:
                            printf("Port: 0x%x\n", run->io.port);
                            errx(1, "unhandled KVM_EXIT_IO");
                    }
                }

                break;
            case KVM_EXIT_FAIL_ENTRY:
                errx(1, "KVM_EXIT_FAIL_ENTRY: hardware_entry_failure_reason = 0x%llx",
                     (unsigned long long)run->fail_entry.hardware_entry_failure_reason);
            case KVM_EXIT_INTERNAL_ERROR:
                errx(1, "KVM_EXIT_INTERNAL_ERROR: suberror = 0x%x", run->internal.suberror);
            default:
                errx(1, "exit_reason = 0x%x", run->exit_reason);
        }
    }

The pseudocode to create and run a VM with KVM

  • open("/dev/kvm") : Open the global KVM device
  • ioctl(KVM_CREATE_VM) : Create a virtual machine
  • mmap(size) : Create memory region for the guest to use
  • read("monitor") : Read our operating system binary into the allocated memory
  • ioctl(KVM_CREATE_VCPU) : Create a VCPU for use in our newly created virtual machine
  • ioctl(KVM_SET_REGS) : Set initial values for some registers
  • while(1)
    • run = ioctl(KVM_RUN) : Run the VM till there is an exit
    • switch(run->exit_reason) : Decide based on exit reason
      • case KVM_EXIT_HLT: VM executed the halt instruction. Let’s exit.
      • case KVM_EXIT_IO: There was I/O from the VM. Handle it.

As you can see, with just simple Linux system calls like open()read()write()mmap() and ioctl(), we’re able to create and run hardware virtualization-based VMs.

Another way to handle VM exits is via eventfd(), which can be done with the KVM_IOEVENTFD ioctl() call. This creates a file descriptor for any MMIO memory range that needs to be monitored for reads and writes. This file descriptor can then be passed to poll() or epoll_* calls and events dealt with in a better fashion. This is what Firecracker does. Now, let’s look at the “operating system” that runs inside of Sparkler.

Our tiny little Sparkler operating system

I had trouble calling this an operating system, so I’m calling this a monitor program, which is a very common term used in embedded systems for operating system-like programs that are not quite operating systems themselves. Intel CPUs since Westmere (introduced 2010) have supported something called unrestricted guest mode. This means essentially that the virtual CPU starts running in real mode, or 16-bit mode, much like a real PC. The operating system can then switch the CPU to 32-bit or 64-bit mode as required. Our monitor program does not switch to 32-bit or 64-bit mode, but lives its life as a 16-bit program.

As part of the Sparkler build process, NASM turns monitor.asm into monitor, which is the binary program which we then load into guest memory from main.c. This is a file with no real structure, just raw CPU instructions and data.

Although we call this the monitor program, the sparkler program, that runs and interacts with KVM is called the VMM or the virtual machine monitor. Do not confuse these two terms during the course of reading this article. I’ll use the terms “sparkler” and “VMM” interchangeably to refer to the same thing.

bits 16

SERIAL_PORT             equ 0x3f8
TWITTER_DEVICE          equ 0x100
WEATHER_DEVICE_BASE     equ 0x100
AIR_QUALITY_DEVICE_BASE equ 0x200

start:
    mov ax, 0x100
    add ax, 0x20
    mov ss, ax
    mov sp, 0x1000
    cld

    mov ax, 0x100
    mov ds, ax

    mov si, welcome_msg
    call print_str

    jmp menu_loop

press_key:
    mov si, press_any_key
    call print_str
    call get_users_choice
menu_loop:
    call display_main_menu
    call get_users_choice
    cmp al, 0x31
    je .cpu_details
    cmp al, 0x32
    je .latest_tweet
    cmp al, 0x33
    je .weather
    cmp al, 0x34
    je .air_quality
    cmp al, 0x35
    je .halt

    mov si, illegal_choice
    call print_str
    jmp press_key

    .cpu_details:
        call print_cpu_details
        jmp press_key
    .latest_tweet:
        call print_latest_tweet
        call print_new_line
        jmp press_key
    .weather:
        mov si, weather_str
        call print_str
        call print_new_line
        mov si, cities_str
        call print_str
        call print_new_line
        mov si, your_choice
        call print_str
        sub ax, ax
        call get_users_choice
        sub ax, 0x30                    ; turn it from ascii to number

        cmp ax, 1
        jl  .illegal_choice
        cmp ax, 6
        jg .illegal_choice

        add ax, WEATHER_DEVICE_BASE     ; this gives us the port number for the city
        mov dx, ax
        call print_weather
        jmp press_key
    .air_quality:
        mov si, air_quality_str
        call print_str
        call print_new_line
        mov si, cities_str
        call print_str
        call print_new_line
        mov si, your_choice
        call print_str
        sub ax, ax
        call get_users_choice
        sub ax, 0x30                        ; turn it from ascii to number

        cmp ax, 1
        jl  .illegal_choice
        cmp ax, 6
        jg .illegal_choice

        add ax, AIR_QUALITY_DEVICE_BASE     ; this gives us the port number for the city
        mov dx, ax
        call print_weather
        jmp press_key

        .illegal_choice:
            call print_new_line
            mov si, illegal_choice
            call print_str
            jmp press_key
    .halt:
        hlt

data:
    welcome_msg         db `Welcome to Sparkler!\n`, 0

    ; Used by the menu system
    main_menu           db  `\nMain menu:\n==========\n`, 0
    main_menu_items     db  `1. CPU Info\n2. Latest CliMagic Tweet\n3. Get Weather\n4. Get Air Quality\n5. Halt VM\n`, 0
    your_choice         db  `Your choice: \n`, 0
    illegal_choice      db  `You entered an illegal choice!\n\n`, 0
    press_any_key       db  `Press any key to continue...\n`, 0

    ; Used by our CPU ID routines
    cpu_info_str        db  `\nHere is your CPU information:\n`, 0
    cpuid_str           db  `Vendor ID\t: `, 0
    brand_str           db  `Brand string\t: `, 0
    cpu_type_str        db  `CPU type\t: `, 0
    cpu_type_oem        db  'Original OEM Processor', 0
    cpu_type_overdrive  db  'Intel Overdrive Processor', 0
    cpu_type_dual       db  'Dual processor', 0
    cpu_type_reserved   db  'Reserved', 0
    cpu_family_str      db  `Family\t\t: `, 0
    cpu_model_str       db  `Model\t\t: `, 0
    cpu_stepping_str    db  `Stepping\t: `, 0

    ; Used by devices which fetch over the internet
    fetching_wait       db  `\nFetching, please wait...\n`, 0


    weather_str         db `\nChoose the city to get weather forecast for:`, 0
    air_quality_str     db `\nChoose the city to get air quality report for:`, 0
    ; Cities
    cities_str          db  `1. Chennai\n2. New Delhi\n3. London\n4. Chicago\n5. San Francisco\n6. New York`,0

    cpuid_function      dd  0x80000002

get_users_choice:
    mov dx, SERIAL_PORT
    in ax, dx
    ret

display_main_menu:
    mov si, main_menu
    call print_str
    mov si, main_menu_items
    call print_str
    mov si, your_choice
    call print_str
    ret

print_latest_tweet:
    mov si, fetching_wait
    call print_str
    mov dx, TWITTER_DEVICE
    .get_next_char:
        in ax, dx
        cmp ax, 0
        je .done
        call print_char
        jmp .get_next_char

    .done:
        ret

; To be called with weather port alreay in DX
print_weather:
    mov si, fetching_wait
    call print_str
    .get_next_char:
        in ax, dx
        cmp ax, 0
        je .done
        call print_char
        jmp .get_next_char

    .done:
        ret

print_cpu_details:
    mov si, cpu_info_str
    call print_str

    mov si, cpuid_str
    call print_str
    call print_cpuid
    call print_new_line

    call print_cpu_info

    mov si, brand_str
    call print_str
    call print_cpu_brand_string
    call print_new_line
    ret

print_cpuid:
    mov eax, 0
    cpuid
    push ecx
    push edx
    push ebx

    mov cl, 3
    .next_dword:
        pop eax
        mov bl, 4
        .print_register:
            call print_char
            shr eax, 8
            dec bl
            jnz .print_register
        dec cl
        jnz .next_dword

    ret

print_cpu_brand_string:
    mov al, '"'
    call print_char
    .next_function:
        mov eax, [cpuid_function]
        cpuid
        push edx
        push ecx
        push ebx
        push eax

    mov cl, 4
    .next_dword:
        pop eax
        mov bl, 4
        .print_register:
            call print_char
            shr eax, 8
            dec bl
            jnz .print_register
        dec cl
        jnz .next_dword

    inc dword[cpuid_function]
    cmp dword[cpuid_function], 0x80000004
    jle .next_function

    mov al, '"'
    call print_char
    ret

print_cpu_info:
    mov eax, 1
    cpuid

    mov si, cpu_type_str
    call print_str
    mov ecx, eax                        ; save a copy
    shr eax, 12
    and eax, 0x0005
    cmp al, 0
    je .type_oem
    cmp al, 1
    je .type_overdrive
    cmp al, 2
    je .type_dual
    cmp al, 3
    je .type_reserved

    .type_oem:
        mov si, cpu_type_oem
        jmp .print_cpu_type
    .type_overdrive:
        mov si, cpu_type_oem
        jmp .print_cpu_type
    .type_dual:
        mov si, cpu_type_dual
        jmp .print_cpu_type
    .type_reserved:
        mov si, cpu_type_reserved
        jmp .print_cpu_type

    .print_cpu_type:
    call print_str
    call print_new_line

    ; Family
    mov si, cpu_family_str
    call print_str
    mov eax, ecx
    shr eax, 8
    and ax, 0x000f

    cmp ax, 15                  ; if Family == 15, Family is derived as the
    je .calculate_family        ; sum of Family + Extended family bits

    jmp .family_done            ; else

    .calculate_family:
        mov ebx, ecx
        shr ebx, 20
        and bx, 0x00ff
        add ax, bx
    .family_done:
        call print_word_hex

    ; Model
    mov si, cpu_model_str
    call print_str
    cmp al, 6                   ; If family is 6 or 15, the model number
    je .calculate_model         ; is derived from the extended model ID bits
    cmp al, 15
    je .calculate_model

    mov eax, ecx                ; else
    shr eax, 4
    and ax, 0x000f
    jmp .model_done

    .calculate_model:
        mov eax, ecx
        mov ebx, ecx
        shr eax, 16
        and ax, 0x000f
        shl eax, 4
        shr ebx, 4
        and bx, 0x000f
        add eax, ebx
    .model_done:
        call print_word_hex

    ; Stepping
    mov si, cpu_stepping_str
    call print_str
    mov eax, ecx
    and ax, 0x000f
    call print_word_hex

    ret

print_new_line:
    push dx
    push ax
    mov dx, SERIAL_PORT
    mov al, `\n`
    out dx, al
    pop ax
    pop dx
    ret

print_char:
    push dx
    mov dx, SERIAL_PORT
    out dx, al
    pop dx
    ret

print_str:
    push dx
    push ax
    mov dx, SERIAL_PORT
    .print_next_char:
        lodsb               ; load byte pointed to by SI into AL and SI++
        cmp al, 0
        je .printstr_done
        out dx, al
        jmp .print_next_char
    .printstr_done:
        pop ax
        pop dx
        ret

; Print the 16-bit value in AX as HEX
print_word_hex:
    xchg al, ah             ; Print the high byte first
    call print_byte_hex
    xchg al, ah             ; Print the low byte second
    call print_byte_hex
    call print_new_line
    ret

; Print lower 8 bits of AL as HEX
print_byte_hex:
    push dx
    push cx
    push ax

    lea bx, [.table]        ; Get translation table address

    ; Translate each nibble to its ASCII equivalent
    mov ah, al              ; Make copy of byte to print
    and al, 0x0f            ;     Isolate lower nibble in AL
    mov cl, 4
    shr ah, cl              ; Isolate the upper nibble in AH
    xlat                    ; Translate lower nibble to ASCII
    xchg ah, al
    xlat                    ; Translate upper nibble to ASCII

    mov dx, SERIAL_PORT
    mov ch, ah              ; Make copy of lower nibble
    out dx, al
    mov al, ch
    out dx, al

    pop ax
    pop cx
    pop dx
    ret
.table: db "0123456789ABCDEF", 0

The monitor program is written in assembly language and is assembled using the venerable NASM or Netwide Assembler. It starts, and enters a loop in which it displays a menu with various options the user can choose from. This text output and user input is done via the SERIAL_DEVICE or the “Console device” you can see in the Sparkler architecture diagram.

For all devices that are available to Sparkler, the communication happens via the CPU’s IN and OUT instructions. These instructions cause a VM exit and are handled by our sparkler program, emulating these devices. Similarly, there are other devices that allow you to get the latest Tweet from a particular Twitter account, the weather for certain cities and the air quality report for certain cities.

The Sparkler web service

Although we use libcurl to fetch content off the internet and use json-parser to parse JSON, doing this is a real pain from C. This is very apparent if you’re like me and you’ve been exposed to the simplicity of handling this kind of stuff with higher-level languages. And so, I wrote a quick-and-dirty Sparkler Web Service that outputs JSON that is easily parsable from C. Also, it lets you try out Sparkler in its full glory without you first having to register for a Twitter developer account for you to access the Twitter API, to be able to fetch the tweet. This NodeJS service runs on the excellent Heroku platform for free. You can check out some JSON it outputs by clicking on these links here:

As you can see, I’ve made output from these different APIs structurally similar while removing a whole lot of JSON data we’ll never use. This lets us handle this with C fairly easily. When the monitor program requests for information from the sparkler program, it makes a request to the web service, parses that information and returns it to the monitor program as a simple string. Trust me, you don’t want to be parsing JSON in assembly language.

Hope you had fun

Summarizing, we went into great detail of how KVM works by building a virtual machine monitor (VMM) in C. This is the piece that interfaces with KVM to create a hardware based virtual machine in which we ran a small program written in assembly language which talks to the VMM using devices that the VMM emulates. While there is a simple “console” device that lets the VM input and output text, there are other more complex devices that can read a tweet, get the weather and air quality for a few cities.

About me

My name is Shuveb Hussain and I’m the author of this Linux-focused blog. You can follow me on Twitter where I post tech-related content mostly focusing on Linux, performance, scalability and cloud technologies.

Discover more from Unixism

Subscribe now to keep reading and get access to the full archive.

Continue reading