Booting Your Own Kernel on Raspberry Pi via Uart
For the last few weeks, I’ve been reading a great book about building your own operating system for a Raspberry Pi from scratch. It has been a great experience, and I’ve learned a lot. One thing I didn’t like was my development workflow, which looked like this:
- Make a code change.
- Build the change.
- Unplug the Raspberry Pi.
- Remove the SD card.
- Plug the SD card to my laptop.
- Copy the newly built kernel over to the SD card.
- Eject the SD card.
- Place the SD card back on the Raspberry Pi.
- Connect the Raspberry Pi to my computer.
- Close previous terminal since the
screen
command made it pretty useless. - Connect to the Raspberry Pi using the
screen
command. - Test my change.
All of this was pretty time consuming making the feedback loop very slow (especially, when I forgot a crucial step such as placing the SD card back in the Raspberry Pi). I searched to see if there was a way to improve my workflow and found an issue in the github repo called UART boot. I read through it and decided to build the UART boot myself as a learning exercise. In this blog post, I go through the process that I took to make UART boot work, including my wrong assumptions/failures and how I overcame them.
Requirements
To make this project worth my time, I decided that it needed to be able to do the following:
- The kernel must act as both the UART bootloader and the kernel itself (a single executable).
- I must be able to send a newly compiled kernel over UART, and the Raspberry Pi must boot.
- I must be able to have an interactive session over UART using my program (basically, replace
screen
). - The
screen
replacement must leave my terminal in a working state (I assume there’s a way to make this work inscreen
but, I wanted to do this as a learning exercise).
Prerequisites
If you want to follow along, you’ll need:
- To meet all the prerequisites from the book.
- To have completed the first lesson.
- To have completed the second exercise to use UART instead of Mini UART (this might still work with Mini UART, but I haven’t tried).
- Python 3.7 (This is the version I used).
Note If you haven’t completed the exercise and would still like to follow along, follow the setup instructions below. I also tagged each commit to make it easier to follow along/jump ahead.
Setup
If you already have your project, make sure that you’ve built your kernel and have it in the SD card. Otherwise, you can clone my repository and can follow along:
$ git clone https://github.com/nicolasmesa/PiOS
$ git checkout tags/uart-boot-initial
Git tags
I’ve created commit tags for each section in case you want to experiment with a specific part of the project. You can checkout the tags by running the command:
$ git checkout tags/<tag-name>
I’ll reference the tag name at the end of each section.
Git tag: uart-boot-initial
- link
screen
replacement
To be able to send the kernel over UART, we need to establish a serial connection with the Raspberry Pi and send the kernel bytes. The screen
replacement will do the same thing with the difference that it will also read from the serial connection and stdin
and will write to stdout
. If we’re able to make this work, sending the kernel should be easy.
Let’s get started by installing pyserial
(I suggest you do this in a virtual environment):
(rpi_os) $ pip install pyserial==3.4
Let’s create a file called boot_send.py
and add the following code:
import os
import select
import serial
import sys
import time
import tty
class UartConnection:
def __init__(self, file_path, baud_rate):
self.serial = serial.Serial(file_path, baud_rate)
def send_string(self, string):
return self.send_bytes(bytes(string, "ascii"))
def send_bytes(self, bytes_to_send):
return self.serial.write(bytes_to_send)
def read(self, max_len):
return self.serial.read(max_len)
def read_buffer(self):
return self.read(self.serial.in_waiting)
def read_buffer_string(self):
return self._decode_bytes(self.read_buffer())
def start_interactive(self, input_file, output_file):
try:
tty.setcbreak(input_file.fileno())
while True:
rfd, _, _ = select.select([self.serial, input_file], [], [])
if self.serial in rfd:
r = self.read_buffer_string()
output_file.write(r)
output_file.flush()
if input_file in rfd:
r = input_file.read(1)
self.send_string(r)
except KeyboardInterrupt:
print("Got keyboard interrupt. Terminating...")
except OSError as e:
print("Got OSError. Terminating...")
finally:
os.system("stty sane")
def _decode_bytes(self, bytes_to_decode):
return bytes_to_decode.decode("ascii")
Wow, that’s a lot of code! Let’s go through it.
Constructor
The constructor (__init__
) receives two arguments which we use to create a serial connection:
file_path
: The path to the file where the UART device is (/dev/cu.SLAB_USBtoUART
in my computer).baud_rate
: The baud rate to use in the serial connection.
Utility methods
We have a bunch of utility methods with varying levels of abstraction, but they all end up calling either send_bytes
(which calls serial.write
) or read
(which calls serial.read
).
start_interactive
The start_interactive
method receives two arguments (input_file
and output_file
). We will use these arguments as stdin
and stdout
, respectively. The first thing this method does is call tty.setcbreak(input_file.fileno())
on the stdin
file. tty.setcbreak
turns the tty
into “raw” mode and allows us to read the characters as they are typed instead of buffering them until the user presses enter.
The select
function receives a set of file descriptors to monitor. When called, it blocks until one of those files has data that we can read. When the select
function returns, the rfd
variable has a list of file descriptors that have data available. Next, we verify if the serial
file descriptor has data available, and, if it does, we read it all and send the output to our terminal (output_file
). We also check if our input_file
has data, and, if it does, we send it over the serial connection to the Raspberry Pi. Note that in this case, we don’t write anything to the output_file
and we rely on the Raspberry Pi echoing back the character that we sent.
The os.system("stty sane")
call makes our tty not be in raw mode anymore. This function allows us to keep using the same terminal (instead of the problem I had with screen
).
def main():
import sys
uart_connection = UartConnection(
# Change these to match your setup
file_path='/dev/cu.SLAB_USBtoUART',
baud_rate=115200
)
time.sleep(1)
uart_connection.start_interactive(sys.stdin, sys.stdout)
if __name__ == '__main__':
main()
Now, let’s execute the script. Assuming everything worked, you should see something like this:
(rpi_os) $ python boot_send.py
Hello world!
You should be able to type anything you want, and the Raspberry Pi should echo it back. Press ctrl-c
to exit, and your terminal should still work normally.
Note: If you’re using the git branch I provided, you’ll need to hit enter
to see the “Hello world!” message. I did this to make sure that the kernel only sends the Hello world!
message after boot_send
has connected successfully.
Git tag: uart-boot-screen-replacement
- link
Sending the kernel over UART
Sending the kernel should be no different than sending characters over serial; everything gets translated to bytes, after all.
Protocol
We’ll use the following protocol to send the kernel:
- The Raspberry Pi boots into the kernel and blocks while waiting to read a single line.
- If the line it reads matches the word
kernel
then we go into UART booting mode, if not, we skip the UART boot. - When in UART booting mode, the
boot_send
script sends an integer (4 bytes) containing the size of the kernel that we’re about to send. - The Raspberry Pi takes note of this size and sends it back.
- The
boot_send
script verifies that the Raspberry Pi got the correct number. - The
boot_send
script starts reading the kernel and sends it byte by byte over the serial connection. - The Raspberry Pi receives each byte and…
- Keeps a running sum of all the bytes (a checksum).
- Places that byte in memory (overwriting the current kernel).
- When the Raspberry Pi receives the number of bytes equal to the size of the kernel (sent in a previous step), it sends the checksum over UART.
- The
boot_send
script verifies that the checksums match (for error detection purposes). - The Raspberry Pi responds with the string
"Done"
to denote that it will now jump to the new kernel. - The Raspberry Pi jumps back to address
0x00
to start executing in the new kernel. - The
boot_send
script starts an interactive session.
One thing that we need to keep in mind is endianness when sending the size. I decided to use big-endian for all communication.
boot_send modifications
We need to add some functionality to our boot_send
script to implement the protocol:
UartConnection
Let’s add these functions to our UartConnection
class:
class UartConnection:
# ...
def send_line(self, line):
if not line.endswith("\n"):
line += "\n"
return self.send_string(line)
def send_int(self, number):
if number > 2 ** 32 - 1:
raise 'Number can only be 4 bytes long'
number_in_bytes = number.to_bytes(4, byteorder='big')
return self.send_bytes(number_in_bytes)
def read_int(self):
bytes_to_read = 4
number_bytes = self.read(bytes_to_read)
return int.from_bytes(number_bytes, byteorder='big')
def read_line(self):
return self._decode_bytes(self.serial.readline())
The most important thing to call out here is the byteorder
which we decided to use (big
stands for big-endian
).
Kernel checksum
We compute a checksum to validate against some errors. I don’t think this is the best way to check for errors since something as simple as changing the order of two bytes would still make this checksum pass. I use it because it gives me a bit more confidence that things are working as expected.
def compute_kernel_checksum(kernel_bytes):
num = 0
for b in kernel_bytes:
num = (num + b) % (2 ** 32)
return num
We wrap around at 2^32
because that’s the max size of an integer.
send_kernel
Let’s add a function to send the kernel to the Raspberry Pi following our protocol:
def send_kernel(path, uart_connection):
with open(path, mode='rb') as f:
uart_connection.send_line("kernel")
kernel = f.read()
size = len(kernel)
checksum = compute_kernel_checksum(kernel)
print("Sending kernel with size", size, "and checksum", checksum)
uart_connection.send_int(size)
time.sleep(1)
size_confirmation = uart_connection.read_int()
if size_confirmation != size:
print("Expected size to be", size, "but got", size_confirmation)
return False
print("Kernel size confirmed. Sending kernel")
uart_connection.send_bytes(kernel)
time.sleep(1)
print("Validating checksum...")
checksum_confirmation = uart_connection.read_int()
if checksum_confirmation != checksum:
print("Expected checksum to be", checksum,
"but was", checksum_confirmation)
return False
line = uart_connection.read_line()
if not line.startswith("Done"):
print("Didn't get confirmation for the kernel. Got", line)
return False
return True
Our function receives a path
to the kernel file and a UartConnection
object. We use these to follow the protocol described above to send the kernel to the Raspberry Pi.
Calling the send_kernel
function
Let’s make a small tweak to our main
function.
def main():
# ...
uart_connection = UartConnection(...)
time.sleep(1)
result = send_kernel(
path="kernel8.img",
uart_connection=uart_connection
)
if result:
print("Done!")
uart_connection.start_interactive(sys.stdin, sys.stdout)
else:
print("Error sending kernel :(")
Our script sends the kernel first and then starts an interactive session.
Git tag: uart-boot-boot-send-client-side
- link
Kernel side modifications
Let’s dive into the modifications needed for the kernel.
branch_to_address
The last part of our boot protocol involves jumping back to address 0x00
and starting the execution from that memory location. Let’s start by adding a function that can do that.
Open the utils.h
file and add the function declaration:
extern void branch_to_address( void * );
Now, go to the utils.S
file and add the definition:
.global branch_to_address
branch_to_address:
br x0
uart_send_int / uart_read_int
We’ll need to send and read integers via UART (for the kernel size and the checksum). Let’s open the uart.h
file and add the following declarations:
void uart_send_int(int number);
int uart_read_int();
Let’s implement the functions in the uart.c
file:
int uart_read_int() {
int num = 0;
for (int i = 0; i < 4; i++) {
char c = uart_recv();
num = num << 8;
num += (int)c;
}
return num;
}
void uart_send_int(int number) {
uart_send((char)((number >> 24) & 0xFF));
uart_send((char)((number >> 16) & 0xFF));
uart_send((char)((number >> 8) & 0xFF));
uart_send((char)(number & 0xFF));
}
In the uart_read_int
, we read one byte at a time (most significant byte first) and keep shifting those bytes to the left until we get the 4 bytes that represent the integer. In the uart_send_int
, we send one byte at a time (most significant byte first). Note that using pointers here is not a good idea since we shouldn’t make assumptions on how the Raspberry Pi stores int
s in memory (big-endian vs. little-endian).
include utils.h
Open the kernel.c
file and add the following #include
statement:
#include "utils.h"
readline
Here we write a pretty simple readline
function:
int readline(char *buf, int maxlen) {
int num = 0;
while (num < maxlen - 1) {
char c = uart_recv();
if (c == '\n' || c == '\0' || c == '\r') {
break;
}
buf[num] = c;
num++;
}
buf[num] = '\0';
return num;
}
strcmp
Our protocol needs to read a line of text and compare it against the word kernel
. To do that, we need a string comparison function:
int strcmp(char *str1, char *str2) {
while (1) {
if (*str1 != *str2) {
return *str1 - *str2;
}
if (*str1 == '\0') {
return 0;
}
str1++;
str2++;
}
}
copy_and_jump_to_kernel
Now we have all the pieces to implement the protocol we brought up before:
void copy_and_jump_to_kernel() {
int kernel_size = uart_read_int();
uart_send_int(kernel_size);
char *kernel = (char *)0;
int checksum = 0;
for (int i = 0; i < kernel_size; i++) {
char c = uart_recv();
checksum += c;
kernel[i] = c;
}
uart_send_int(checksum);
uart_send_string("Done copying kernel\r\n");
branch_to_address((void *)0x00);
}
This function receives the kernel size and sends it back. Then, receives each byte from the new kernel and places it in memory starting on address 0x00
. Next, it sends the calculated checksum and a string saying "Done copying the kernel"
. Finally, it branches to start executing at address 0x00
(where the new kernel resides).
kernel_main
Let’s add the logic to read a line on boot, and compare it against kernel
to decide if we want to boot over UART or not:
void kernel_main(void) {
int buff_size = 100;
uart_init();
char buffer[buff_size];
readline(buffer, buff_size);
if (strcmp(buffer, "kernel") == 0) {
copy_and_jump_to_kernel();
}
uart_send_string("Hello world!\r\n");
while (1) {
uart_send(uart_recv());
}
}
Note that the call to copy_and_jump_to_kernel
never returns.
Git tag: uart-boot-boot-send-kernel-side
- link
Testing our UART boot
Build the kernel and copy it to the SD card
Here are the steps I took to build the kernel and copy it to the SD card on my Mac:
$ ./build.sh
$ cp kernel8.img /Volumes/boot/
Booting test: sending the same kernel
Now eject the SD card and put in your Raspberry Pi.
Let’s run our boot_send
script:
(rpi_os) $ python boot_send.py
Sending kernel with size 1991 and checksum 201267
Kernel size confirmed. Sending kernel
Validating checksum...
Done!
If you see this, it means that you just sent a new kernel (the same kernel that you installed in the SD card) over UART! At this point, the kernel
is stuck in the readline
function to check whether you want to send a new kernel or not. Let’s press enter:
Hello world!
Got keyboard interrupt. Terminating...
Press ctrl-c
to exit.
Booting test: making a small change
Sending the same kernel is not why we did all this work! Let’s make a small change to our kernel code, compile it and send it over UART (instead of putting it in the SD card). Go to the kernel_main
function and change the string that we send:
int kernel_main(void) {
// ...
uart_send_string("Hello from a new kernel!!!\r\n");
// ...
}
Let’s compile it and execute our python script (no need to copy anything to the SD card anymore!):
(rpi_os) $ ./build.sh
(rpi_os) $ python boot_send.py
Sending kernel with size 2005 and checksum 202127
Kernel size confirmed. Sending kernel
Validating checksum...
Done!
Great! We sent the kernel! Now, the moment of truth! Let’s press enter:
Hello from a new kernel!!!
Got keyboard interrupt. Terminating...
Git tag: uart-boot-string-change-test-uart-boot
- link
Booting test: adding a function
We did it! We’re done, right? Well, not so fast! Let’s try to make a different change. Let’s add a function and call it from kernel_main
. Open the kernel.c
file and add the following code:
// ...
void my_test_function(void) {
uart_send_string("Sending a test message!\r\n");
}
void kernel_main(void) {
// ...
uart_send_string("Hello from a new kernel!!!\r\n");
my_test_function();
// ...
}
We added a new function called my_test_function
, and we call it from kernel_main
. Let’s compile and send this over UART:
(rpi_os) $ ./build.sh
(rpi_os) $ python boot_send.py Sending kernel with size 2077 and checksum 208704
Kernel size confirmed. Sending kernel
Validating checksum...
It hangs in the Validating checksum… part! So, what’s going on? Let’s think for a minute about what we’re doing. The Raspberry Pi loads our kernel from the SD card onto address 0x00
and starts executing code in that address. When we copy the kernel over (over UART), we overwrite the kernel from the SD card with the new kernel. If the kernels are the same, it won’t make a difference since we’re effectively leaving the code as-is. Our string change didn’t affect us either because the strings go in the .rodata
(read-only data) section of the executable which goes after the .text
section (where the executable code lives), so changing the string doesn’t modify the addresses of the executable code. Adding a function, however, changes the .text
layout causing the kernel to “corrupt” itself while it copies the new kernel.
Git tag: uart-boot-fail-by-adding-function
- link
The fix
Before implementing a fix, let’s remove my_test_function
to get back into a “good” state.
Git tag: uart-boot-remove-test-function
- link
Options
So, how do we fix this problem? Let’s consider these two options:
Option 1: Copy the kernel to a different location and jump there
Using this option would not overwrite the kernel that is currently running, and we would jump to the new kernel in a different address. There’s a few drawbacks to this approach:
- The kernel code will not start in address
0x00
. The course might make some assumptions about the kernel being in address0x00
, and we would be breaking those assumptions. - The course could use the address range that we choose to put our new kernel.
- We wouldn’t be able to send a kernel multiple times over UART in the same session (once we copy over the kernel and we’re running in the new address, we wouldn’t have a new place to copy another new kernel).
Option 2: Copy the current running kernel to a new address range, jump to it, copy the new kernel to the original address range, jump back to it
To implement this option, we need to:
- Copy the currently running kernel to a new memory location (let’s say
0x8000
). - Jump to a function in the new address range.
- The function in the new memory location implements the protocol described above (our
copy_and_jump_to_kernel
function).- It copies the kernel over UART starting at memory location
0x00
.
- It copies the kernel over UART starting at memory location
- Jump back to address
0x00
. - Start running the new kernel.
We treat the address range starting at 0x8000
as temporary storage that we can use for other purposes once we jump back to address 0x00
.
With this implementation, we’re able to keep all the assumptions that the course might be making and we’re able to copy the kernel multiple times over the same session! This approach is called chain loading. You can read more about chain loading here.
Fix implementation
Position Independent Code
Currently, the compiler compiles our kernel with the assumption that its code starts at address 0x00
. The compiler can then hardcode the absolute addresses of every function using this. Absolute addresses become a problem once we start executing instructions in the kernel that starts at location 0x8000
because it would still be referencing the old absolute memory locations.
We can solve this issue by using Position Independent Code, where the addresses are relative to the address held by the program counter.
To enable this, we need to tweak our Makefile
to add the -fPIC
flag. You can read more about this flag in this Stack Overflow post.
# -fPIC makes the addresses relative instead of absolute allowing
# us to place the kernel anywhere in memory.
COPS = -fPIC -Wall -nostdlib -nostartfiles -ffreestanding -Iinclude -mgeneral-regs-only
ASMOPS = -fPIC -Iinclude
This change is all that is needed to make our code position independent!
Determining current kernel size
The kernel needs to know its size to be able to copy itself. We use the bss_end
section of the linker.ld
file to determine the size of the kernel. Let’s add a reference to that section in our kernel.c
file:
#include "utils.h"
// ...
extern char bss_end[];
We define this as an array, even though it technically isn’t (you can read more about why in this post). Now we can use bss_end
as a pointer to the address where the kernel ends (this includes the .bss).
copy_current_kernel_and_jump
Let’s implement a function that:
- Copies the current kernel (starting in address
0x00
) to the new address range (starting at0x8000
). - Jumps to the
copy_and_jump_to_kernel
function, that we wrote before, in the new address range.
Note: Make sure to define this function below the copy_and_jump_to_kernel
.
void copy_current_kernel_and_jump(char *new_address) {
char *kernel = (char *)0x00;
char *end = bss_end;
char *copy = new_address;
while (kernel <= end) {
*copy = *kernel;
kernel++;
copy++;
}
// Cast the function pointer to char* to deal with bytes.
char *original_function_address = (char *)©_and_jump_to_kernel;
// Add the new address (we're assuming that the original kernel resides in
// address 0). copied_function_address should now contain the address of the
// original function but in the new location.
char *copied_function_address =
original_function_address + (long)new_address;
// Cast the address back to a function and call it.
void (*call_function)() = (void (*)())copied_function_address;
call_function();
}
This function copies the currently running kernel to the address range starting at new_address
. Then, it does some pointer math to determine the address of the copy_and_jump_to_kernel
function in the new address range. Finally, it calls that function in the new address range.
Call copy_current_kernel_and_jump
from kernel_main
All that remains is to call the copy_current_kernel_and_jump
from kernel_main
and, we should be ready to test:
void kernel_main(void) {
// ...
if (strcmp(buffer, "kernel") == 0) {
copy_current_kernel_and_jump((char *)0x8000);
}
// ...
}
Git tag: uart-boot-fix-implementation
- link
Testing the fix
First, we need to build our kernel and copy it to the SD card. Refer to the Build the kernel and copy it to the SD card section.
Let’s perform the same tests we ran before to make sure that we can still send the kernel over UART. Refer back to the following sections:
- Booting test: sending the same kernel: This will help us make sure that the boot process works end-to-end.
- Booting test: making a small change: This test will help us make sure that everything is working at least as well as it was working before the fix.
Testing the previous error
Now, the moment of truth! Let’s make a change that adds a function just like in the Adding a function section. Let’s test it out!
(rpi_os) $ ./build.sh
(rpi_os) $ python boot_send.py
Sending kernel with size 2280 and checksum 223784
Kernel size confirmed. Sending kernel
Validating checksum...
Done!
This output is promising! At least our boot_send
script didn’t hang in the Validating checksum… step like it did last time. The new kernel probably booted up and is stuck in the readline
function. Let’s press enter to send an empty line:
Hello from a new kernel!!!
Sending a test message!
Got keyboard interrupt. Terminating...
We did it! We were able to add a new function to the kernel, and it was still able to boot up!
Sending the same kernel multiple times
Now, what if we try to send the same kernel over and over again? We would be able to send a new kernel that handles the next UART boot differently without having to put it in the SD card!
(rpi_os) $ python boot_send.py
Sending kernel with size 2440 and checksum 234324
Kernel size confirmed. Sending kernel
Validating checksum...
Done!
Nothing surprising here, our kernel is stuck in the readline
function again. This time, instead of pressing enter, let’s press ctrl-c
to exit out of our boot_send
program and rerun it.
Got keyboard interrupt. Terminating...
(rpi_os) $ python test.py
Sending kernel with size 2440 and checksum 234324
Kernel size confirmed. Sending kernel
Validating checksum...
Done!
We sent the kernel again in the same session! Let’s press enter to make sure that our new kernel booted:
Hello from a new kernel!!!
Sending a test message!
Got keyboard interrupt. Terminating...
Git tag: uart-boot-succeeds-when-adding-function
- link
Conclusion
In this post, we built a kernel that can be booted via UART to test, or from the SD card (for more permanent solutions). To do this, we came up with a protocol for the UART communication and implemented it. After a few tests, we noticed there was a problem when we tried to boot a kernel with an extra function. The problem was that the kernel was overwriting/corrupting its instructions. To overcome this problem, we made the kernel first copy itself to a new address range and perform the UART copy while executing from that address range. After the kernel finishes copying the new kernel over UART, it can jump back to the original start address (0x00
) and execute the new kernel.
Improvements
Here is a small list of improvements that you can make to the project:
- Add arguments to the
boot_send
script. I did this in my project, but this post is already long as it is. - Clean the
kernel.c
file and extract the functions to a helper file. - Use the C preprocessor to exclude the UART boot section if you don’t pass a
UARTBoot
flag. - Make UART boot work with multiple CPUs. I implemented this as well in a somewhat hacky way (I might write a post about it later).
- Add a better checksum (CRC, for example).
Links / References
- Learning operating system development using Linux kernel and Raspberry Pi (very recommended course!)
- Chain loading
- BSS
- Endianness
- Position Independent Code
- Position Independent Code Stack Overflow explanation
- Python virtual environments
- Referencing linker script defined variable
- Setting terminal to “raw” mode
- UART Boot Github Issue
- CRC