Kernel booting processseries. In the previous part we took a look at the final stages of the Linux kernel boot process. But we have skipped some important, more advanced parts.
start_kernelfunction defined in the main.c source code file. This function is executed at the address stored in
LOAD_PHYSICAL_ADDR. and depends on the
CONFIG_PHYSICAL_STARTkernel configuration option, which is
CONFIG_RANDOMIZE_BASEkernel configuration option should be enabled during kernel configuration.
CONFIG_RANDOMIZE_BASEoption is enabled and the load address of the kernel image is randomized for security reasons.
inputis just the
input_dataparameter of the
extract_kernelfunction from the arch/x86/boot/compressed/misc.c source code file, cast to
input_datais generated by the little mkpiggy program. If you've tried compiling the Linux kernel yourself, you may find the output generated by this program in the
linux/arch/x86/boot/compressed/piggy.Ssource code file. In my case this file looks like this:
z_output_lenare the sizes of the compressed and uncompressed
vmlinux.bin.gzarchive. The third is our
input_dataparameter which points to the Linux kernel image's raw binary (stripped of all debugging symbols, comments and relocation information). The last parameter,
input_data_end, points to the end of the compressed linux image.
choose_random_locationfunction is the pointer to the compressed kernel image that is embedded into the
choose_random_locationfunction are the address of the decompressed kernel image and its length respectively. The decompressed kernel's address came from the arch/x86/boot/compressed/head_64.S source code file and is the address of the
startup_32function aligned to a 2 megabyte boundary. The size of the decompressed kernel is given by
z_output_lenwhich, again, is found in
choose_random_locationfunction is the virtual address of the kernel load address. As can be seen, by default, it coincides with the default physical load address:
choose_random_location's parameters, so let's look at its implementation. This function starts by checking the
nokaslroption in the kernel command line:
nokaslrto the kernel command line and the
CONFIG_RANDOMIZE_BASEkernel configuration option is enabled. In this case we add
kASLRflag to kernel load flags:
initialize_identity_mapsfunction is defined in the arch/x86/boot/compressed/kaslr_64.c source code file. This function starts by initializing an instance of the
4G. This won't do since we might generate a randomized address outside of the 4 gigabyte range. So, the
initialize_identity_mapsfunction initializes the memory for a new page table entry. First, let's take a look at the definition of the
alloc_pgt_pageis a callback function that is called to allocate space for a page table entry. The
contextfield is an instance of the
alloc_pgt_datastructure. We use it to track allocated page tables. The
kernpg_flagfields are page flags. The first represents flags for
kernpg_flagfield represents overridable flags for kernel pages. The
direct_gbpagesfield is used to check if huge pages are supported and the last field,
offset, represents the offset between the kernel's virtual addresses and its physical addresses up to the
alloc_pgt_pagecallback just checks that there is space for a new page, allocates it in the
pgt_buffield of the
alloc_pgt_datastructure and returns the address of the new page:
alloc_pgt_datastructure looks like:
initialize_identity_mapsfunction is to initialize
pgt_buf_offset. As we are only in the initialization phase, the
pgt_data.pgt_buf_sizewill be set to
69632depending on which boot protocol was used by the bootloader (64-bit or 32-bit). The same is done for
pgt_data.pgt_buf. If a bootloader loaded the kernel at
pgdt_data.pgdt_bufwill point to the end of the already initialized page table in the arch/x86/boot/compressed/head_64.S source code file:
_pgtablepoints to the beginning of _pgtable. On the other hand, if the bootloader used the 64-bit boot protocol and loaded the kernel at
startup_64, the early page tables should already be built by the bootloader itself and
_pgtablewill just point to those instead:
mem_avoid_initfunction will help us do this:
mem_avoid_initfunction. The main goal of this function is to store information about reserved memory regions with descriptions given by the
mem_avoid_indexenum in the
mem_avoidarray and to create new pages for such regions in our new identity mapped buffer. The
mem_avoid_indexfunction does the same thing for all elements in the
mem_avoid_indexenum, so let's look at a typical example of the process:
mem_avoid_initfunction first tries to avoid memory regions currently used to decompress the kernel. We fill an entry from the
mem_avoidarray with the start address and the size of the relevant region and call the
add_identity_mapfunction, which builds the identity mapped pages for this region. The
add_identity_mapfunction is defined in the arch/x86/boot/compressed/kaslr_64.c source code file and looks like this:
round_downfunctions are used to align the start and end addresses to a 2 megabyte boundary.
kernel_ident_mapping_initfunction from the arch/x86/mm/ident_map.c source code file and passes the previously initialized
mapping_infoinstance, the address of the top level page table and the start and end addresses of the memory region for which a new identity mapping should be built.
kernel_ident_mapping_initfunction sets default flags for new pages if they were not already set:
Page Global Directoryfor the given address. If the entry's address is greater than the
endof the given memory region, we set its size to
end. After this, we allocate a new page with the
x86_mapping_infocallback that we looked at previously and call the
ident_p4d_initfunction will do the same thing, but for the lower level page directories (
mem_avoid_initfunction, but the rest is similar. It builds pages for the initrd and the kernel command line, among other things.
mem_avoidarray and identity mapped pages are built for them, we select the region with the lowest available address to decompress the kernel to:
512megabytes. A limit of
512megabytes was selected to avoid unknown things in lower memory.
process_efi_entriesfunction is to find all suitable memory ranges in fully accessible memory to load kernel. If the kernel is compiled and run on a system without EFI support, we continue to search for such memory regions in the e820 region. All memory regions found will be stored in the
slots_fetch_randomfunction. The main goal of the
slots_fetch_randomfunction is to select a random memory range from the
slot_areasarray via the
kaslr_get_random_longfunction is defined in the arch/x86/lib/kaslr.c source code file and as its name suggests, returns a random number. Note that the random number can be generated in a number of ways depending on kernel configuration and features present in the system (For example, using the time stamp counter, or rdrand or some other method).
outputwill store the base address of the memory region where kernel will be decompressed. Currently, we have only randomized the physical address. We can randomize the virtual address as well on the x86_64 architecture:
x86_64, the randomized physical and virtual addresses are the same. The
find_random_virt_addrfunction calculates the number of virtual memory ranges needed to hold the kernel image. It calls the
kaslr_get_random_longfunction, which we have already seen being used to generate a random
*output) and virtual (
*virt_addr) addresses for the decompressed kernel.