start_kernelfunction from the init/main.c. The
start_kernelis the entry of the generic and architecture independent kernel code, although we will return to the
arch/folder many times. If you look inside of the
start_kernelfunction, you will see that this function is very big. For this moment it contains about
86function calls. Yes, it's very big and of course this part will not cover all the processes that occur in this function. In the current part we will only start to do it. This part and all the next which will be in the Kernel initialization process chapter will cover it.
start_kernelto finish kernel initialization process and launch the first
initprocess. Before the first process will be started, the
start_kernelmust do many things such as: to enable lock validator, to initialize processor id, to enable early cgroups subsystem, to setup per-cpu areas, to initialize different caches in vfs, to initialize memory manager, rcu, vmalloc, scheduler, IRQs, ACPI and many many more. Only after these steps will we see the launch of the first
initprocess in the last part of this chapter. So much kernel code awaits us, let's start.
Linux Kernel initialization processwill not cover anything about debugging. There will be a separate chapter about kernel debugging tips.
start_kernelfunction is defined in the init/main.c. This function defined with the
__initattribute and as you already may know from other parts, all functions which are defined with this attribute are necessary during kernel initialization.
free_initmemfunction. Note also that
__initis defined with two attributes:
notrace. The purpose of the first
coldattribute is to mark that the function is rarely used and the compiler must optimize this function for size. The second
notraceis defined as:
no_instrument_functionsays to the compiler not to generate profiling function calls.
start_kernelfunction, you can also see the
__visibleattribute which expands to the:
externally_visibletells to the compiler that something uses this function or variable, to prevent marking this function/variable as
unusable. You can find the definition of this and other macro attributes in include/linux/init.h.
start_kernelyou can see the definition of these two variables:
parse_argsfunction which parses an input string with parameters in the form
name=value, looking for specific keywords and invoking the right handlers. We will not go into the details related with these two variables at this time, but will see it in the next parts. In the next step we can see a call to the
set_task_stack_end_magicfunction. This function takes address of the
0x57AC6E9D) as canary for it.
init_taskrepresents the initial task structure:
task_structstores all the information about a process. I will not explain this structure in this book because it's very big. You can find its definition in include/linux/sched.h. At this moment
task_structcontains more than
100fields! Although you will not see the explanation of the
task_structin this book, we will use it very often since it is the fundamental structure which describes the
processin the Linux kernel. I will describe the meaning of the fields of this structure as we meet them in practice.
init_taskand it initialized by the
INIT_TASKmacro. This macro is from include/linux/init_task.h and it just fills the
init_taskwith the values for the first process. For example it sets:
runnable. A runnable process is one which is waiting only for a CPU to run on;
PF_KTHREADwhich means - kernel thread;
initthread_unionhas type -
thread_infoand process stack:
x86_64. We can note that it is defined as array of
unsigned long. The next field of the
thread_infostructure contains architecture-specific information on the thread. We know that on
x86_64the stack grows down and
thread_union.thread_infois stored at the bottom of the stack in our case. So the process stack is 16 kilobytes and
thread_infois at the bottom. The remaining thread size will be
16 kilobytes - 62 bytes = 16332 bytes. Note that
thread_unionrepresented as the union and not structure, it means that
thread_infoand stack share the memory space.
INIT_TASKmacro fills these
task_struct'sfields and many many more. As I already wrote above, I will not describe all the fields and values in the
INIT_TASKmacro but we will see them soon.
set_task_stack_end_magicgets the end of the stack for the given
end_of_stackfunction. Earlier (and now for all architectures besides
x86_64) stack was located in the
thread_infostructure. So the end of a process stack depends on the
CONFIG_STACK_GROWSUPconfiguration option. As we learn in
x86_64architecture, the stack grows down. So the end of the process stack will be:
task_thread_infojust returns the stack which we filled with the
thread_infostructure may contains only flags and stack pointer resides in
task_structstructure which represents a thread in the Linux kernel. This depends on
CONFIG_THREAD_INFO_IN_TASKkernel configuration option which is enabled by default for
x86_64. You can be sure in this if you will look in the init/main.c configuration build file:
initprocess stack, we write
canaryis set, we can check it like this:
smp_setup_processor_id. This function has an empty body for
debug_objects_early_init. Implementation of this function is almost the same as
lockdep_init, but fills hashes for object debugging. As I wrote above, we will not see the explanation of this and other functions which are for debugging purposes in this chapter.
debug_object_early_initfunction we can see the call of the
boot_init_stack_canaryfunction which fills
canaryvalue for the
-fstack-protectorgcc feature. This function depends on the
CONFIG_CC_STACKPROTECTORconfiguration option and if this option is disabled,
boot_init_stack_canarydoes nothing, otherwise it generates random numbers based on random pool and the TSC:
canaryis set, we disable local and early boot IRQs and register the bootstrap CPU in the CPU maps. We disable local IRQs (interrupts for current CPU) with the
local_irq_disablemacro which expands to the call of the
arch_local_irq_disablefunction from include/linux/percpu-defs.h:
x86_64. As interrupts are disabled we can register the current CPU with the given ID in the CPU bitmap.
boot_cpu_init. This function initializes various CPU masks for the bootstrap processor. First of all it gets the bootstrap processor id with a call to:
CONFIG_DEBUG_PREEMPTconfiguration option is disabled,
smp_processor_idjust expands to the call of
raw_smp_processor_idwhich expands to the:
this_cpu_readas many other function like this (
this_cpu_addand etc...) defined in the include/linux/percpu-defs.h and presents
this_cpuoperation. These operations provide a way of optimizing access to the per-cpu variables which are associated with the current processor. In our case it is
raw_smp_processor_id. Now let's look at the
pscr_ret__variable with the
inttype. Why int? Ok,
cpu_numberand it was declared as per-cpu int variable:
__verify_pcpu_ptrwith the address of
__veryf_pcpu_ptrused to verify that the given parameter is a per-cpu pointer. After that we set
pscr_ret__value which depends on the size of the variable. Our
int, so it's 4 bytes in size. It means that we will get
pscr_ret__. In the end of the
__pcpu_size_call_returnwe just call it.
this_cpu_read_4is a macro:
movinstruction and per-cpu variable there.
percpu_from_opwill expand to the inline assembly call:
gssegment register contains the base of per-cpu area. Here we just copy
cpu_numberwhich is in memory to the
movlinstruction. Or with another words:
zeroas a result of the
boot_cpu_initsets the given CPU online, active, present and possible with the:
cpu_possibleis a set of CPU ID's which can be plugged in at any time during the life of that system boot.
cpu_presentrepresents which CPUs are currently plugged in.
cpu_onlinerepresents subset of the
cpu_presentand indicates CPUs which are available for scheduling. These masks depend on the
CONFIG_HOTPLUG_CPUconfiguration option and if this option is disabled
possible == presentand
active == online. Implementation of the all of these functions are very similar. Every function checks the second parameter. If it is
true, it calls
set_cpu_possible. As we passed
trueas the second parameter, the:
to_cpumaskmacro. This macro casts a bitmap to a
struct cpumask *. CPU masks provide a bitmap suitable for representing the set of CPU's in a system, one bit position per CPU number. CPU mask presented by the
DECLARE_BITMAPmacro expands to the array of
unsigned long. Now let's look at how the
to_cpumaskmacro is implemented:
trueevery time, but why the
__check_is_bitmaphere? It's simple, let's look at it:
1every time. Actually we need in it here only for one purpose: at compile time it checks that the given
bitmapis a bitmap, or in other words it checks that the given
bitmaphas a type of
unsigned long *. So we just pass
to_cpumaskmacro for converting the array of
unsigned longto the
struct cpumask *. Now we can call
cpumask_set_cpufunction with the
cpu- 0 and
struct cpumask *cpu_possible_bits. This function makes only one call of the
set_bitfunction which sets the given
cpuin the cpumask. All of these
set_cpu_*functions work on the same principle.
start_kernel.Now it is
page_address_init, but this function does nothing in our case, because it executes only when all
RAMcan't be mapped directly.
printkcall. At this moment we use
pr_noticeto print the Linux banner:
setup_archfunction. This is a very big function like
start_kerneland we do not have time to consider all of its implementation in this part. Here we'll only start to do it and continue in the next part. As it is
architecture-specific, we need to go again to the
setup_archfunction defined in the arch/x86/kernel/setup.c source code file and takes only one argument - address of the kernel command line.
_datawhich starts from the
_textsymbol (you can remember it from the arch/x86/kernel/head_64.S) and ends before
__bss_stop. We are using
memblockfor the reserving of memory block:
_textsymbol with the
__phys_reloc_hidemacro on the given parameter. The
__phys_reloc_hidemacro does nothing for
x86_64and just returns the given parameter. Implementation of the
__phys_addr_symbolmacro is easy. It just subtracts the symbol address from the base address of the kernel text mapping base virtual address (you can remember that it is
__START_KERNEL_map) and adds
phys_basewhich is the base address of
memblock_reservecan reserve a memory block from the
__bss_stop - _text.
initrdin this post, you just may know that it is temporary root file system stored in memory and used by the kernel during its startup. The
early_reserve_initrdfunction does all work. First of all this function gets the base address of the ram disk, its size and the end address with:
boot_params. If you have read the chapter about Linux Kernel Booting Process, you must remember that we filled the
boot_paramsstructure during boot time. The kernel setup header contains a couple of fields which describes ramdisk, for example:
boot_params. For example let's look at
ramdisk_imageand we return it.
get_ramdisk_sizeworks on the same principle as
get_ramdisk_image, but it used
ext_ramdisk_image. After we got ramdisk's size, base address and end address, we check that bootloader provided ramdisk with the:
start_kernelfunction in this part and stopped on the architecture-specific initialization in the
setup_arch. In the next part we will continue with architecture-dependent initialization steps.