Booting Your Own Kernel on Raspberry Pi via Uart

For the last few weeks, I’ve been reading a great book about building your own operating system for a Raspberry Pi from scratch. It has been a great experience, and I’ve learned a lot. One thing I didn’t like was my development workflow, which looked like this:

  1. Make a code change.
  2. Build the change.
  3. Unplug the Raspberry Pi.
  4. Remove the SD card.
  5. Plug the SD card to my laptop.
  6. Copy the newly built kernel over to the SD card.
  7. Eject the SD card.
  8. Place the SD card back on the Raspberry Pi.
  9. Connect the Raspberry Pi to my computer.
  10. Close previous terminal since the screen command made it pretty useless.
  11. Connect to the Raspberry Pi using the screen command.
  12. Test my change.

All of this was pretty time consuming making the feedback loop very slow (especially, when I forgot a crucial step such as placing the SD card back in the Raspberry Pi). I searched to see if there was a way to improve my workflow and found an issue in the github repo called UART boot. I read through it and decided to build the UART boot myself as a learning exercise. In this blog post, I go through the process that I took to make UART boot work, including my wrong assumptions/failures and how I overcame them.

Requirements

To make this project worth my time, I decided that it needed to be able to do the following:

  1. The kernel must act as both the UART bootloader and the kernel itself (a single executable).
  2. I must be able to send a newly compiled kernel over UART, and the Raspberry Pi must boot.
  3. I must be able to have an interactive session over UART using my program (basically, replace screen).
  4. The screen replacement must leave my terminal in a working state (I assume there’s a way to make this work in screen but, I wanted to do this as a learning exercise).

Prerequisites

If you want to follow along, you’ll need:

  1. To meet all the prerequisites from the book.
  2. To have completed the first lesson.
  3. To have completed the second exercise to use UART instead of Mini UART (this might still work with Mini UART, but I haven’t tried).
  4. Python 3.7 (This is the version I used).

Note If you haven’t completed the exercise and would still like to follow along, follow the setup instructions below. I also tagged each commit to make it easier to follow along/jump ahead.

Setup

If you already have your project, make sure that you’ve built your kernel and have it in the SD card. Otherwise, you can clone my repository and can follow along:

$ git clone https://github.com/nicolasmesa/PiOS
$ git checkout tags/uart-boot-initial

Git tags

I’ve created commit tags for each section in case you want to experiment with a specific part of the project. You can checkout the tags by running the command:

$ git checkout tags/<tag-name>

I’ll reference the tag name at the end of each section.

Git tag: uart-boot-initial - link

screen replacement

To be able to send the kernel over UART, we need to establish a serial connection with the Raspberry Pi and send the kernel bytes. The screen replacement will do the same thing with the difference that it will also read from the serial connection and stdin and will write to stdout. If we’re able to make this work, sending the kernel should be easy.

Let’s get started by installing pyserial (I suggest you do this in a virtual environment):

(rpi_os) $ pip install pyserial==3.4

Let’s create a file called boot_send.py and add the following code:

import os
import select
import serial
import sys
import time
import tty

class UartConnection:

    def __init__(self, file_path, baud_rate):
        self.serial = serial.Serial(file_path, baud_rate)

    def send_string(self, string):
        return self.send_bytes(bytes(string, "ascii"))

    def send_bytes(self, bytes_to_send):
        return self.serial.write(bytes_to_send)

    def read(self, max_len):
        return self.serial.read(max_len)

    def read_buffer(self):
        return self.read(self.serial.in_waiting)

    def read_buffer_string(self):
        return self._decode_bytes(self.read_buffer())

    def start_interactive(self, input_file, output_file):
        try:
            tty.setcbreak(input_file.fileno())
            while True:
                rfd, _, _ = select.select([self.serial, input_file], [], [])

                if self.serial in rfd:
                    r = self.read_buffer_string()
                    output_file.write(r)
                    output_file.flush()

                if input_file in rfd:
                    r = input_file.read(1)
                    self.send_string(r)
        except KeyboardInterrupt:
            print("Got keyboard interrupt. Terminating...")
        except OSError as e:
            print("Got OSError. Terminating...")
        finally:
            os.system("stty sane")

    def _decode_bytes(self, bytes_to_decode):
        return bytes_to_decode.decode("ascii")

Wow, that’s a lot of code! Let’s go through it.

Constructor

The constructor (__init__) receives two arguments which we use to create a serial connection:

  • file_path: The path to the file where the UART device is (/dev/cu.SLAB_USBtoUART in my computer).
  • baud_rate: The baud rate to use in the serial connection.
Utility methods

We have a bunch of utility methods with varying levels of abstraction, but they all end up calling either send_bytes (which calls serial.write) or read (which calls serial.read).

start_interactive

The start_interactive method receives two arguments (input_file and output_file). We will use these arguments as stdin and stdout, respectively. The first thing this method does is call tty.setcbreak(input_file.fileno()) on the stdin file. tty.setcbreak turns the tty into “raw” mode and allows us to read the characters as they are typed instead of buffering them until the user presses enter.

The select function receives a set of file descriptors to monitor. When called, it blocks until one of those files has data that we can read. When the select function returns, the rfd variable has a list of file descriptors that have data available. Next, we verify if the serial file descriptor has data available, and, if it does, we read it all and send the output to our terminal (output_file). We also check if our input_file has data, and, if it does, we send it over the serial connection to the Raspberry Pi. Note that in this case, we don’t write anything to the output_file and we rely on the Raspberry Pi echoing back the character that we sent.

The os.system("stty sane") call makes our tty not be in raw mode anymore. This function allows us to keep using the same terminal (instead of the problem I had with screen).

def main():
    import sys
    uart_connection = UartConnection(
        # Change these to match your setup
        file_path='/dev/cu.SLAB_USBtoUART',
        baud_rate=115200
    )
    time.sleep(1)
    uart_connection.start_interactive(sys.stdin, sys.stdout)

if __name__ == '__main__':
    main()

Now, let’s execute the script. Assuming everything worked, you should see something like this:

(rpi_os) $ python boot_send.py
Hello world!

You should be able to type anything you want, and the Raspberry Pi should echo it back. Press ctrl-c to exit, and your terminal should still work normally.

Note: If you’re using the git branch I provided, you’ll need to hit enter to see the “Hello world!” message. I did this to make sure that the kernel only sends the Hello world! message after boot_send has connected successfully.

Git tag: uart-boot-screen-replacement - link

Sending the kernel over UART

Sending the kernel should be no different than sending characters over serial; everything gets translated to bytes, after all.

Protocol

We’ll use the following protocol to send the kernel:

  1. The Raspberry Pi boots into the kernel and blocks while waiting to read a single line.
  2. If the line it reads matches the word kernel then we go into UART booting mode, if not, we skip the UART boot.
  3. When in UART booting mode, the boot_send script sends an integer (4 bytes) containing the size of the kernel that we’re about to send.
  4. The Raspberry Pi takes note of this size and sends it back.
  5. The boot_send script verifies that the Raspberry Pi got the correct number.
  6. The boot_send script starts reading the kernel and sends it byte by byte over the serial connection.
  7. The Raspberry Pi receives each byte and…
  8. Keeps a running sum of all the bytes (a checksum).
  9. Places that byte in memory (overwriting the current kernel).
  10. When the Raspberry Pi receives the number of bytes equal to the size of the kernel (sent in a previous step), it sends the checksum over UART.
  11. The boot_send script verifies that the checksums match (for error detection purposes).
  12. The Raspberry Pi responds with the string "Done" to denote that it will now jump to the new kernel.
  13. The Raspberry Pi jumps back to address 0x00 to start executing in the new kernel.
  14. The boot_send script starts an interactive session.

One thing that we need to keep in mind is endianness when sending the size. I decided to use big-endian for all communication.

boot_send modifications

We need to add some functionality to our boot_send script to implement the protocol:

UartConnection

Let’s add these functions to our UartConnection class:

class UartConnection:
    # ...
    def send_line(self, line):
        if not line.endswith("\n"):
            line += "\n"
        return self.send_string(line)

    def send_int(self, number):
        if number > 2 ** 32 - 1:
            raise 'Number can only be 4 bytes long'
        number_in_bytes = number.to_bytes(4, byteorder='big')
        return self.send_bytes(number_in_bytes)

    def read_int(self):
        bytes_to_read = 4
        number_bytes = self.read(bytes_to_read)
        return int.from_bytes(number_bytes, byteorder='big')

    def read_line(self):
        return self._decode_bytes(self.serial.readline())

The most important thing to call out here is the byteorder which we decided to use (big stands for big-endian).

Kernel checksum

We compute a checksum to validate against some errors. I don’t think this is the best way to check for errors since something as simple as changing the order of two bytes would still make this checksum pass. I use it because it gives me a bit more confidence that things are working as expected.

def compute_kernel_checksum(kernel_bytes):
    num = 0
    for b in kernel_bytes:
        num = (num + b) % (2 ** 32)
    return num

We wrap around at 2^32 because that’s the max size of an integer.

send_kernel

Let’s add a function to send the kernel to the Raspberry Pi following our protocol:

def send_kernel(path, uart_connection):
    with open(path, mode='rb') as f:
        uart_connection.send_line("kernel")
        kernel = f.read()
        size = len(kernel)
        checksum = compute_kernel_checksum(kernel)

        print("Sending kernel with size", size, "and checksum", checksum)
        uart_connection.send_int(size)
        time.sleep(1)
        size_confirmation = uart_connection.read_int()
        if size_confirmation != size:
            print("Expected size to be", size, "but got", size_confirmation)
            return False

        print("Kernel size confirmed. Sending kernel")
        uart_connection.send_bytes(kernel)
        time.sleep(1)

        print("Validating checksum...")
        checksum_confirmation = uart_connection.read_int()
        if checksum_confirmation != checksum:
            print("Expected checksum to be", checksum,
                  "but was", checksum_confirmation)
            return False

        line = uart_connection.read_line()
        if not line.startswith("Done"):
            print("Didn't get confirmation for the kernel. Got", line)
            return False

        return True

Our function receives a path to the kernel file and a UartConnection object. We use these to follow the protocol described above to send the kernel to the Raspberry Pi.

Calling the send_kernel function

Let’s make a small tweak to our main function.

def main():
    # ...
    uart_connection = UartConnection(...)
    time.sleep(1)
    result = send_kernel(
        path="kernel8.img",
        uart_connection=uart_connection
    )
    if result:
        print("Done!")
        uart_connection.start_interactive(sys.stdin, sys.stdout)
    else:
        print("Error sending kernel :(")

Our script sends the kernel first and then starts an interactive session.

Git tag: uart-boot-boot-send-client-side - link

Kernel side modifications

Let’s dive into the modifications needed for the kernel.

branch_to_address

The last part of our boot protocol involves jumping back to address 0x00 and starting the execution from that memory location. Let’s start by adding a function that can do that.

Open the utils.h file and add the function declaration:

extern void branch_to_address( void * );

Now, go to the utils.S file and add the definition:

.global branch_to_address
branch_to_address:
    br x0
uart_send_int / uart_read_int

We’ll need to send and read integers via UART (for the kernel size and the checksum). Let’s open the uart.h file and add the following declarations:

void uart_send_int(int number);
int uart_read_int();

Let’s implement the functions in the uart.c file:

int uart_read_int() {
    int num = 0;
    for (int i = 0; i < 4; i++) {
        char c = uart_recv();
        num = num << 8;
        num += (int)c;
    }
    return num;
}

void uart_send_int(int number) {
    uart_send((char)((number >> 24) & 0xFF));
    uart_send((char)((number >> 16) & 0xFF));
    uart_send((char)((number >> 8) & 0xFF));
    uart_send((char)(number & 0xFF));
}

In the uart_read_int, we read one byte at a time (most significant byte first) and keep shifting those bytes to the left until we get the 4 bytes that represent the integer. In the uart_send_int, we send one byte at a time (most significant byte first). Note that using pointers here is not a good idea since we shouldn’t make assumptions on how the Raspberry Pi stores ints in memory (big-endian vs. little-endian).

include utils.h

Open the kernel.c file and add the following #include statement:

#include "utils.h"
readline

Here we write a pretty simple readline function:

int readline(char *buf, int maxlen) {
    int num = 0;
    while (num < maxlen - 1) {
        char c = uart_recv();
        if (c == '\n' || c == '\0' || c == '\r') {
            break;
        }
        buf[num] = c;
        num++;
    }
    buf[num] = '\0';
    return num;
}
strcmp

Our protocol needs to read a line of text and compare it against the word kernel. To do that, we need a string comparison function:

int strcmp(char *str1, char *str2) {
    while (1) {
        if (*str1 != *str2) {
            return *str1 - *str2;
        }

        if (*str1 == '\0') {
            return 0;
        }

        str1++;
        str2++;
    }
}
copy_and_jump_to_kernel

Now we have all the pieces to implement the protocol we brought up before:

void copy_and_jump_to_kernel() {
    int kernel_size = uart_read_int();
    uart_send_int(kernel_size);

    char *kernel = (char *)0;
    int checksum = 0;
    for (int i = 0; i < kernel_size; i++) {
        char c = uart_recv();
        checksum += c;
        kernel[i] = c;
    }
    uart_send_int(checksum);

    uart_send_string("Done copying kernel\r\n");
    branch_to_address((void *)0x00);
}

This function receives the kernel size and sends it back. Then, receives each byte from the new kernel and places it in memory starting on address 0x00. Next, it sends the calculated checksum and a string saying "Done copying the kernel". Finally, it branches to start executing at address 0x00 (where the new kernel resides).

kernel_main

Let’s add the logic to read a line on boot, and compare it against kernel to decide if we want to boot over UART or not:

void kernel_main(void) {
    int buff_size = 100;
    uart_init();

    char buffer[buff_size];
    readline(buffer, buff_size);
    if (strcmp(buffer, "kernel") == 0) {
        copy_and_jump_to_kernel();
    }

    uart_send_string("Hello world!\r\n");
    while (1) {
        uart_send(uart_recv());
    }
}

Note that the call to copy_and_jump_to_kernel never returns.

Git tag: uart-boot-boot-send-kernel-side - link

Testing our UART boot

Build the kernel and copy it to the SD card

Here are the steps I took to build the kernel and copy it to the SD card on my Mac:

$ ./build.sh
$ cp kernel8.img /Volumes/boot/

Booting test: sending the same kernel

Now eject the SD card and put in your Raspberry Pi.

Let’s run our boot_send script:

(rpi_os) $ python boot_send.py 
Sending kernel with size 1991 and checksum 201267
Kernel size confirmed. Sending kernel
Validating checksum...
Done!

If you see this, it means that you just sent a new kernel (the same kernel that you installed in the SD card) over UART! At this point, the kernel is stuck in the readline function to check whether you want to send a new kernel or not. Let’s press enter:

Hello world!
Got keyboard interrupt. Terminating...

Press ctrl-c to exit.

Booting test: making a small change

Sending the same kernel is not why we did all this work! Let’s make a small change to our kernel code, compile it and send it over UART (instead of putting it in the SD card). Go to the kernel_main function and change the string that we send:

int kernel_main(void) {
    // ...
   uart_send_string("Hello from a new kernel!!!\r\n"); 
   // ...
}

Let’s compile it and execute our python script (no need to copy anything to the SD card anymore!):

(rpi_os) $ ./build.sh
(rpi_os) $ python boot_send.py 
Sending kernel with size 2005 and checksum 202127
Kernel size confirmed. Sending kernel
Validating checksum...
Done!

Great! We sent the kernel! Now, the moment of truth! Let’s press enter:

Hello from a new kernel!!!
Got keyboard interrupt. Terminating...

Git tag: uart-boot-string-change-test-uart-boot - link

Booting test: adding a function

We did it! We’re done, right? Well, not so fast! Let’s try to make a different change. Let’s add a function and call it from kernel_main. Open the kernel.c file and add the following code:

// ...
void my_test_function(void) {
    uart_send_string("Sending a test message!\r\n"); 
}

void kernel_main(void) {
    // ...
   uart_send_string("Hello from a new kernel!!!\r\n");  
   my_test_function();
   // ...
}

We added a new function called my_test_function, and we call it from kernel_main. Let’s compile and send this over UART:

(rpi_os) $ ./build.sh
(rpi_os) $ python boot_send.py Sending kernel with size 2077 and checksum 208704
Kernel size confirmed. Sending kernel
Validating checksum...

It hangs in the Validating checksum… part! So, what’s going on? Let’s think for a minute about what we’re doing. The Raspberry Pi loads our kernel from the SD card onto address 0x00 and starts executing code in that address. When we copy the kernel over (over UART), we overwrite the kernel from the SD card with the new kernel. If the kernels are the same, it won’t make a difference since we’re effectively leaving the code as-is. Our string change didn’t affect us either because the strings go in the .rodata (read-only data) section of the executable which goes after the .text section (where the executable code lives), so changing the string doesn’t modify the addresses of the executable code. Adding a function, however, changes the .text layout causing the kernel to “corrupt” itself while it copies the new kernel.

Git tag: uart-boot-fail-by-adding-function - link

The fix

Before implementing a fix, let’s remove my_test_function to get back into a “good” state.

Git tag: uart-boot-remove-test-function - link

Options

So, how do we fix this problem? Let’s consider these two options:

Option 1: Copy the kernel to a different location and jump there

Using this option would not overwrite the kernel that is currently running, and we would jump to the new kernel in a different address. There’s a few drawbacks to this approach:

  1. The kernel code will not start in address 0x00. The course might make some assumptions about the kernel being in address 0x00, and we would be breaking those assumptions.
  2. The course could use the address range that we choose to put our new kernel.
  3. We wouldn’t be able to send a kernel multiple times over UART in the same session (once we copy over the kernel and we’re running in the new address, we wouldn’t have a new place to copy another new kernel).
Option 2: Copy the current running kernel to a new address range, jump to it, copy the new kernel to the original address range, jump back to it

To implement this option, we need to:

  1. Copy the currently running kernel to a new memory location (let’s say 0x8000).
  2. Jump to a function in the new address range.
  3. The function in the new memory location implements the protocol described above (our copy_and_jump_to_kernel function).
  • It copies the kernel over UART starting at memory location 0x00.
  1. Jump back to address 0x00.
  2. Start running the new kernel.

We treat the address range starting at 0x8000 as temporary storage that we can use for other purposes once we jump back to address 0x00.

With this implementation, we’re able to keep all the assumptions that the course might be making and we’re able to copy the kernel multiple times over the same session! This approach is called chain loading. You can read more about chain loading here.

Fix implementation

Position Independent Code

Currently, the compiler compiles our kernel with the assumption that its code starts at address 0x00. The compiler can then hardcode the absolute addresses of every function using this. Absolute addresses become a problem once we start executing instructions in the kernel that starts at location 0x8000 because it would still be referencing the old absolute memory locations.

We can solve this issue by using Position Independent Code, where the addresses are relative to the address held by the program counter.

To enable this, we need to tweak our Makefile to add the -fPIC flag. You can read more about this flag in this Stack Overflow post.

# -fPIC makes the addresses relative instead of absolute allowing
# us to place the kernel anywhere in memory.
COPS = -fPIC -Wall -nostdlib -nostartfiles -ffreestanding -Iinclude -mgeneral-regs-only
ASMOPS = -fPIC -Iinclude

This change is all that is needed to make our code position independent!

Determining current kernel size

The kernel needs to know its size to be able to copy itself. We use the bss_end section of the linker.ld file to determine the size of the kernel. Let’s add a reference to that section in our kernel.c file:

#include "utils.h"
// ...
extern char bss_end[];

We define this as an array, even though it technically isn’t (you can read more about why in this post). Now we can use bss_end as a pointer to the address where the kernel ends (this includes the .bss).

copy_current_kernel_and_jump

Let’s implement a function that:

  1. Copies the current kernel (starting in address 0x00) to the new address range (starting at 0x8000).
  2. Jumps to the copy_and_jump_to_kernel function, that we wrote before, in the new address range.

Note: Make sure to define this function below the copy_and_jump_to_kernel.

void copy_current_kernel_and_jump(char *new_address) {
    char *kernel = (char *)0x00;
    char *end = bss_end;

    char *copy = new_address;

    while (kernel <= end) {
        *copy = *kernel;
        kernel++;
        copy++;
    }

    // Cast the function pointer to char* to deal with bytes.
    char *original_function_address = (char *)&copy_and_jump_to_kernel;

    // Add the new address (we're assuming that the original kernel resides in
    // address 0). copied_function_address should now contain the address of the
    // original function but in the new location.
    char *copied_function_address =
        original_function_address + (long)new_address;

    // Cast the address back to a function and call it.
    void (*call_function)() = (void (*)())copied_function_address;
    call_function();
}

This function copies the currently running kernel to the address range starting at new_address. Then, it does some pointer math to determine the address of the copy_and_jump_to_kernel function in the new address range. Finally, it calls that function in the new address range.

Call copy_current_kernel_and_jump from kernel_main

All that remains is to call the copy_current_kernel_and_jump from kernel_main and, we should be ready to test:

void kernel_main(void) {
    // ...
    if (strcmp(buffer, "kernel") == 0) {
        copy_current_kernel_and_jump((char *)0x8000);
    }
    // ...
}

Git tag: uart-boot-fix-implementation - link

Testing the fix

First, we need to build our kernel and copy it to the SD card. Refer to the Build the kernel and copy it to the SD card section.

Let’s perform the same tests we ran before to make sure that we can still send the kernel over UART. Refer back to the following sections:

Testing the previous error

Now, the moment of truth! Let’s make a change that adds a function just like in the Adding a function section. Let’s test it out!

(rpi_os) $ ./build.sh
(rpi_os) $ python boot_send.py 
Sending kernel with size 2280 and checksum 223784
Kernel size confirmed. Sending kernel
Validating checksum...
Done!

This output is promising! At least our boot_send script didn’t hang in the Validating checksum… step like it did last time. The new kernel probably booted up and is stuck in the readline function. Let’s press enter to send an empty line:

Hello from a new kernel!!!
Sending a test message!
Got keyboard interrupt. Terminating...

We did it! We were able to add a new function to the kernel, and it was still able to boot up!

Sending the same kernel multiple times

Now, what if we try to send the same kernel over and over again? We would be able to send a new kernel that handles the next UART boot differently without having to put it in the SD card!

(rpi_os) $ python boot_send.py 
Sending kernel with size 2440 and checksum 234324
Kernel size confirmed. Sending kernel
Validating checksum...
Done!

Nothing surprising here, our kernel is stuck in the readline function again. This time, instead of pressing enter, let’s press ctrl-c to exit out of our boot_send program and rerun it.

Got keyboard interrupt. Terminating...
(rpi_os) $ python test.py 
Sending kernel with size 2440 and checksum 234324
Kernel size confirmed. Sending kernel
Validating checksum...
Done!

We sent the kernel again in the same session! Let’s press enter to make sure that our new kernel booted:

Hello from a new kernel!!!
Sending a test message!
Got keyboard interrupt. Terminating...

Git tag: uart-boot-succeeds-when-adding-function - link

Conclusion

In this post, we built a kernel that can be booted via UART to test, or from the SD card (for more permanent solutions). To do this, we came up with a protocol for the UART communication and implemented it. After a few tests, we noticed there was a problem when we tried to boot a kernel with an extra function. The problem was that the kernel was overwriting/corrupting its instructions. To overcome this problem, we made the kernel first copy itself to a new address range and perform the UART copy while executing from that address range. After the kernel finishes copying the new kernel over UART, it can jump back to the original start address (0x00) and execute the new kernel.

Improvements

Here is a small list of improvements that you can make to the project:

  • Add arguments to the boot_send script. I did this in my project, but this post is already long as it is.
  • Clean the kernel.c file and extract the functions to a helper file.
  • Use the C preprocessor to exclude the UART boot section if you don’t pass a UARTBoot flag.
  • Make UART boot work with multiple CPUs. I implemented this as well in a somewhat hacky way (I might write a post about it later).
  • Add a better checksum (CRC, for example).