Kernel load address randomization
In the previous part, we finally left the setup code and reached the Linux kernel itself. We explored the last steps of the early boot process - from the kernel decompression to the hand-off to the Linux kernel entrypoint (the startup_64 function). You may think this is the end of the set of posts about the Linux kernel booting process, but I'd like to come back one more time to the early setup code and look at one more important part of it - KASLR or Kernel Address Space Layout Randomization.
As you can remember from the previous parts, the entry point of the Linux kernel is the startup_64 function defined in arch/x86/kernel/head_64.S. In normal cases, the kernel is loaded at the fixed, well-known address defined by the value of the CONFIG_PHYSICAL_START configuration option. The description and the default value of this option are defined in arch/x86/Kconfig:
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
default "0x1000000"
help
This gives the physical address where the kernel is loaded.However, modern systems rarely stick to predictable memory layouts for security reasons. Knowing the fixed address where the kernel was loaded can make it easier for attackers to guess the location of the kernel structures which can be exploited in various ways. To make such attacks harder, the Linux kernel provides support for address space layout randomization mechanism.
To enable this mechanism, the CONFIG_RANDOMIZE_BASE kernel configuration option should be enabled. If this mechanism is enabled, the kernel will not be decompressed and loaded at the given fixed address. Instead, each boot the kernel image will be placed at a different physical address.
In this part, we will look at how this mechanism works.
Choose random location for kernel image
Before we will start to investigate kernel's code, let's remember where we were and what we have seen.
In the previous part, we followed the kernel decompression code and transition to long mode. The kernel's decompressor entrypoint is the extract_kernel function defined in arch/x86/boot/compressed/misc.c. At this point, the kernel image is about to be decompressed into the specific location in memory.
Before the kernel's decompressor actually begins to decompress the kernel image, it needs to decide where that image should be placed in memory. While we were going through the kernel's decompression code in the extract_kernel, we skipped the next function call:
choose_random_location((unsigned long)input_data, input_len,
(unsigned long *)&output,
needed_size,
&virt_addr);This function is defined in arch/x86/boot/compressed/kaslr.c and does nothing if the kaslr option is not passed to the kernel command line:
Otherwise, it selects a randomized address where the kernel image should be decompressed.
As we can see, this function takes five parameters:
input- beginning address of the compressed kernel imageinput_size- size of the compressed kernel imageoutput- physical address where the kernel should be decompressedoutput_size- size of the decompressed kernel imagevirt_addr- virtual address where the kernel should be decompressed
The extract_kernel function receives the output parameter from the code that prepares the decompressor:
If you read the previous chapters, you can remember that the starting address where the kernel image should be decompressed was calculated and stored in the rbp register.
The source of the values for the input, input_size, and output_size parameters is quite interesting. These values come from a little program called mkpiggy.
If you've ever tried compiling the Linux kernel yourself, you can find the output generated by this program in the arch/x86/boot/compressed/piggy.S assembly file, which contains all the parameters needed for decompression. In my case, this file looks like this:
At build time, the kernel's vmlinux image is compressed into vmlinux.bin.{ALGO} file. A small mkpiggy program gets the information about the compressed kernel image and generates this assembly file using the following code:
That is where the kernel setup code obtains the values of these parameters.
The last parameter of the choose_random_location function is the virtual base address for the decompressed kernel image. At this point during early boot it is set to the physical load address:
Why is a virtual address initialized with the value of the physical address? The answer is simple and can be found in the previous chapters. During decompression, the early boot-time page tables are set up as an identity map. In other words, for this early stage, we have each virtual address equal to a physical address.
The value of LOAD_PHYISICAL_ADDR is the aligned value of the CONFIG_PHYSICAL_START configuration option, which we already saw at the beginning of this chapter:
At this point, we have examined all the parameters passed to the choose_random_location function. Now it is time to look inside the function.
As it was mentioned above, the first thing that this function does is check whether ASLR disabled using the nokaslr option in the kernel's command line:
If this option is specified in the kernel command line, the function does nothing, and the kernel is decompressed at the fixed address. In this chapter, however, we focus on the case where this option is not provided, as that is the main topic under discussion. If the nokaslr option is not present, the function proceeds to find a random location in memory to decompress the kernel.
The very first step is to set a mark in the boot parameters that ASLR is enabled. This is done by setting a specific flag in the kernel’s boot header:
After marking that ASLR is enabled, the next task is to determine the upper memory limit which system can use:
Since we consider only x86_64 systems, the memory limit is MAXMEM, which is a macro defined in arch/x86/include/asm/pgtable_64_types.h:
where MAX_PHYSMEM_BITS depends on is 5-level paging is enabled or not. We will consider only 4-level paging, so in our case MAXMEM will be expand to 1 << 46 bytes.
With the mem_limit value set, the decompressor and kernel code responsible for the address randomization will know how far they can safely go during calculating an address for the kernel image. But before a random address for the kernel image can be chosen, the kernel needs to make sure it does not overwrite something important.
Avoiding reserved memory ranges
The next step in the randomization process is to build a map of forbidden memory regions to prevent the kernel image from overwriting memory areas that are already in use. These may include, for example, the initial ramdisk or the kernel command line. To gather this information, we use this function:
It collects the forbidden memory regions into the mem_avoid array, which has mem_vector type:
For this moment, the randomization code tries to avoid the memory regions specified by the mem_avoid_index:
Let's look at the implementation of the mem_avoid_init function. As we know, the main goal of this function is to store information about reserved memory regions to avoid them when choosing a random address for the kernel image. There are no complex calculations in this function, and most of the reserved memory areas are known, as they are set by the bootloader or were already calculated at the previous steps during kernel setup. A typical example of the process of gathering information about the memory reserved regions looks like this:
In the code above, the start address of the initial ramdisk and its size are stored in the mem_avoid array. The same pattern repeats for other important memory areas, for example:
the setup header
the decompressor itself
the compressed kernel image
After the mem_avoid_init function is executed, the decompressor code has a complete picture of the system’s reserved memory zones and avoids them during selecting a random address to load the kernel image.
Now we can return to the choose_random_location function and finally see the process of the address randomization.
Physical address randomization
The whole process of finding a suitable random address to load the kernel image consists of two parts:
Find a random physical address
Find a random virtual address
You can remember that at this point, the kernel uses identity-mapped page tables. Having this in mind, you can ask why two different addresses are calculated if there is a 1:1 mapping anyway. The answer is that these two random addresses have different purposes. Physical address determines where the kernel image is loaded in memory. Virtual address determines the kernel's address in the virtual address space. Despite the decompressor code now running with identity mapping, all the symbol references in the kernel image are patched during the relocation process with a random virtual address and offset. If it turns out that there is no mapping between the newly chosen physical and virtual addresses in the current page tables, the page fault interrupt handler builds a new identity mapping. You can find more information in the previous chapter.
Before generating any random offset, the decompressor determines the lowest possible base address that the kernel can use:
This address is the minimal aligned value between 512 megabytes and the starting address of the output buffer passed to the extract_kernel function. After obtaining this value, the kernel calls the next function, which returns a random physical address:
The find_random_phys_addr function is defined in the same arch/x86/boot/compressed/kaslr.c source code file as the choose_random_location function. This function starts from the sanity checks. The first check is that the kernel image will not get behind the memory limit:
The next check is to verify that the number of memory regions specified via memmap kernel command line option is not excessive:
After these sanity checks, the decompressor code begins scanning the system's available memory regions to find suitable candidates for the randomized address to decompress the kernel image. This is done with the help of the following functions:
The scanning consists of three potential stages:
Scan the memory regions that are not preserved by the KHO.
Scan the memory regions presented by the EFI memory map.
Fallback to scanning the memory regions reported by the e820 BIOS service.
All the memory regions that were found and accepted as suitable will be stored in the slot_areas array represented by the following structure:
The kernel will select a random index from this array to decompress kernel to. The selection of the random index happens in the slots_fetch_random function:
The main goal of the slots_fetch_random function is to select a random memory slot from the list of possible locations that were gathered into the slot_areas array. Each entry of this array represents a contiguous free region of memory and the number of possible aligned kernel placements that fit in it.
To select a random address, this function generates a random number which is limited to the total number of the available slots. The random value is produced by the kaslr_get_random_long function which is defined in the same file. As its name suggests, this function returns a random unsigned long value, obtained using whatever entropy sources are available on the system. Depending on the hardware and the kernel configuration it can be:
the CPU’s Time Stamp Counter
the rdrand instruction
After obtaining the random value, the code goes through the slot_areas array to find a memory region with enough available slots. If such a memory region is found, its starting address is used as a random physical address for decompressing the kernel image.
The kernel checks the result of the find_random_phys_addr function and prints a warning message if this operation was not successful, otherwise it assigned the obtained address to the output:
At this point, the kernel has successfully picked a random physical address. The final step is to obtain a random virtual address.
Virtual address randomization
With the physical address chosen, the decompressor now knows where to decompress the kernel image. Once the decompressed kernel starts running, it switches from the early-boot page tables to the full paging setup. The next and last step is to randomize the virtual base address:
The function find_random_virt_addr is located in the same source code file and looks like this:
As we can see, this function uses the same kaslr_get_random_long call to get a random memory slot.
At this point, both the physical and virtual base addresses are determined — randomized, aligned, and guaranteed to fit in available memory.
Conclusion
This is the end of the sixth part about Linux kernel insides. If you have questions or suggestions, feel free ping me on X - 0xAX, drop me an email, or just create an issue.
The next chapter will be about kernel initialization and we will study the first steps take in the Linux kernel initialization code.
Links
Last updated