Troubleshooting Development Errors#
Debugging Tools#
Clean Rebuild#
Occasionally builds will fail, especially after adding / deleting files. Deleting the build folder and clearing ccache should fix it.
For Qemu:
rm -r -f qemu-build
ccache -C
mkdir qemu-build
cd qemu-build
../init-build.sh -DAARCH64=TRUE -DPLATFORM=qemu-arm-virt -DSIMULATION=TRUE
For Odroid-D4:
rm -r -f odroid-build
ccache -C
mkdir odroid-build
cd odroid-build
../init-build.sh -DPLATFORM=odroidc4
Debug Prints#
The debug options for the GPI server are located in projects/sel4-gpi/libsel4gpi/include/sel4gpi/debug.h
:
OSDB_TOPIC
sets which components are allowed to print.OSDB_LEVEL
sets the level of verbosity for the enabled topics.
Qemu with GDB#
From the Qemu build folder, run
./simulate --extra-qemu-args "-S -s"
.Open a new terminal window in the Qemu build folder, run
gdb-multiarch <image_path>
, replacing<image_path>
with the location of the image you want to debug.Root task image:
gdb-multiarch apps/sel4test-driver/sel4test-driver
Test process image:
apps/sel4test-driver/sel4test-tests/sel4test-tests
Other app images are in the
apps/sel4test-driver/
directory
Once GDB starts up, connect to Qemu with
target remote :1234
.
Loading debug symbols from musl libc#
By default, debug symbols are not loaded from musl libc. If you want to set a breakpoint inside malloc.c
, for example, you will need to extract the symbols.
Enable musl libc debug: in
/projects/musllibc/makefile
line 57, change${ENABLE_DEBUG}
to--enable-debug
.I have not figured out how to properly set the flag, as build errors occured when I tried to set it from the cmake file.
Extract the desired object file: eg. to extract malloc, run
ar -xv apps/sel4test-driver/musllibc/build-temp/stage/lib/libc.a malloc.o
from the build folder.Now the symbols should be loaded automatically in GDB.
addr2line#
If the system fails with a page fault, it will print the image name and PC where the fault occurred. Use addr2line to find the file & line where the error occurred:
addr2line -e apps/sel4test-driver/<image_path> 0x<PC>
If this returns ???
, the fault may have occurred in a library call, eg. memset
. Another way to determine where the fault occurred is using objdump
:
<aarch64-none-elf_toolchain_path>/bin/aarch64-none-elf-objdump -DS apps/sel4test-driver/<image_path> | less
You will need to determine the location of the correct objdump depending on where you installed your
aarch64-none-elf
toolchain.Piping to
less
allows you to search with/
, then input the PC value, and determine what function the fault occurred in.
Common development pitfalls#
Server error when contacting the ADS component#
If an “invalid request” error is returned while attempting to attach or remove a VMR from an ADS, double-check that the VMR RDE for the ADS endpoint is being used to send the request. See the ADS capability quirks section for more details.
Insufficient untyped memory after resources are freed#
The VKA allocator may throw a bunch of errors about “insufficient memory to allocate untypeds” despite it being extremely unlikely for the system to have run out of memory (e.g. after something has just been freed). This is caused by freeing an object before all descendant capabilities to the object have been deleted. Freeing an object causes the underlying untyped memory to be returned to the VKA allocator’s free memory pool. If a capability to the memory when it was typed still exists, attempts to retype the untyped memory will fail, causing the VKA allocator to think it is out of memory, and returning the misleading “insufficient memory” error.
This may not always be the cause of this error, but a good way to tell is by using the seL4_DebugCapIsLastCopy
syscall. This will return true if there exists a copy of a given capability from the current CSpace in any existing CSpace.
Potential solutions include:
Revoking the capability before freeing - this is a quick and easy (from the developer’s perspective, not in terms of the kernel’s efforts) to deal with the error, but leaves references to the revoked caps invalid.
For CellulOS tracked resources, toggling the
GPI_DEBUG
log topic, and ensuring that reference counts for the problem capability increase/decrease according to expectation.Checking both CNodes and TCBs as potential capability containers. TCBs are often forgotten about as containers that capabilities may need to be freed from.
Strange Behaviour During Root Task IPCs#
One known source of strange and unexpected behaviour when the Root Task handles API requests is when the RPC_MSG_MAX_SIZE value is not large enough to hold the largest nanopb message. The RPC_MSG_MAX_SIZE
is used to allocate static buffers to copy message contents from nanopb, and if not large enough, will result in a buffer overflow.