Start to dive into interrupts
Last updated
Was this helpful?
Last updated
Was this helpful?
We saw some theory about interrupts and exception handling in the and as I mentioned in that part, we will now start to dive into interrupts and exceptions within the Linux kernel source code. We'll commence by initializing the basic components as we did in the other chapters. But, we will not see the Linux kernel source code from the very early , as this was presented in the example within the chapter. In the beginning we will deal with the first sections of the Linux kernel source code, which are related to interrupts and exceptions.
If you've read the previous parts, you can remember that the earliest place in the Linux kernel x86_64
architecture-specific source code, which is related to the interrupt is located in the source code file and represents the first setup of the . It occurs right before the transition into the in the go_to_protected_mode
function by calling setup_idt
:
The setup_idt
function is defined in the same source code file as the go_to_protected_mode
function and just loads the address of the NULL
interrupt descriptor table:
where gdt_ptr
represents a special 48-bit GDTR
register, which must contain the base address of the Global Descriptor Table
:
Of course in our case the gdt_ptr
does not represent the GDTR
register, but IDTR
since we set the Interrupt Descriptor Table
. You will not find an idt_ptr
structure, because if it had been in the Linux kernel source code, it would have been the same as a gdt_ptr
but with a different name. It would make no sense to create two structures that only differ in their names. Note here that we do not fill the Interrupt Descriptor Table
with entries, because it is too early to handle any interrupts or exceptions at this point. That's why we just fill the IDT
with NULL
.
where in_pm32
contains a jump to the 32-bit entry point:
arch/x86/boot/compressed/head_32.S
.
arch/x86/boot/compressed/head_64.S
;
From this we can understand that MSR_GS_BASE
defines the number of the model specific register
. Since registers cs
, ds
, es
, and ss
are not used in the 64-bit mode, their fields are ignored. But we can access memory over fs
and gs
registers. The model specific register provides a back door
to the hidden parts of these segment registers and allows to use 64-bit base address for segment register addressed by the fs
and gs
. So the MSR_GS_BASE
is the hidden part and this part is mapped on the GS.base
field. Let's look on the initial_gs
:
where:
Now we know about initial_gs
, so let's look at the code:
As you can see it has only one difference in the name of the array of the interrupts handlers entry points. Now it is early_idt_handler_array
:
where NUM_EXCEPTION_VECTORS
and EARLY_IDT_HANDLER_SIZE
are defined as:
and write canary
value to the irq_stack_union
with the this_cpu_write
macro:
If you will set CONFIG_DEBUG_LOCKDEP
kernel configuration option, the lockdep_stats_debug_show
function will write all tracing information to the /proc/lockdep
:
and you can see its result with the:
Here we can see calls of three different functions:
set_intr_gate_ist
set_system_intr_gate_ist
set_intr_gate
GATE_INTERRUPT
GATE_TRAP
GATE_CALL
GATE_TASK
and set the present bit for the given IDT
entry:
After this we write just filled interrupt gate to the IDT
with the write_idt_entry
macro which expands to the native_write_idt_entry
and just copy the interrupt gate to the idt_table
table by the given index:
where idt_table
is just array of gate_desc
:
That's all. The second set_system_intr_gate_ist
function has only one difference from the set_intr_gate_ist
:
Do you see it? Look on the fourth parameter of the _set_gate
. It is 0x3
. In the set_intr_gate
it was 0x0
. We know that this parameter represent DPL
or privilege level. We also know that 0
is the highest privilege level and 3
is the lowest. Now we know how set_system_intr_gate_ist
, set_intr_gate_ist
, set_intr_gate
work and we can return to the early_trap_init
function. Let's look on it again:
We set two IDT
entries for the #DB
interrupt and int3
. These functions takes the same set of parameters:
vector number of an interrupt;
address of an interrupt handler;
interrupt stack table index.
That's all. More about interrupts and handlers you will know in the next parts.
It is the end of the second part about interrupts and interrupt handling in the Linux kernel. We saw the some theory in the previous part and started to dive into interrupts and exceptions handling in the current part. We have started from the earliest parts in the Linux kernel source code which are related to the interrupts. In the next part we will continue to dive into this interesting theme and will know more about interrupt handling process.
After the setup of the , and other stuff we jump into in the - file. You can read more about it in the , which describes the transition to protected mode.
The entry to protected mode is located in the boot_params.hdr.code32_start
and passed together with the boot_params
to the protected_mode_jump
function at the end of :
The protected_mode_jump
function is defined at and receives these two parameters within the ax
and dx
registers, using one of the calling :
As you can remember the 32-bit entry point is in the assembly file, although it contains _64
in its name. We can see the two similar files in the arch/x86/boot/compressed
directory:
But the 32-bit mode entry point is the second file in our case. The first file is not even compiled for x86_64
. Let's look at the :
We can see here that head_*
depends on the $(BITS)
variable, which is based on the architecture. The variable is defined within :
Now as we jumped into startup_32
from , we will not encounter anything related to interrupt handling here. The code inside of startup_32
makes necessary preparations, before transitioning into the with a direct jump. The long mode
entry is located in startup_64
and it makes arrangements for the that occurs in the decompress_kernel
function inside of . After the kernel is decompressed, we jump into startup_64
defined at . In startup_64
we start to build identity-mapped pages, check the bit, setup the Extended Feature Enable Register
(see in links) and update the early Global Descriptor Table
with the lgdt
instruction. And proceed to setup gs
register with the following code:
We already saw this code in the previous . First of all pay attention on the last wrmsr
instruction. This instruction writes data from the edx:eax
registers to the specified by the ecx
register. We can see that ecx
contains $MSR_GS_BASE
which is declared in the and looks like:
We pass irq_stack_union
symbol to the INIT_PER_CPU_VAR
macro which just concatenates the init_per_cpu__
prefix with the given symbol. In our case we will get the init_per_cpu__irq_stack_union
symbol. Let's look at the script. There we can see following definition:
It tells us that the address of the init_per_cpu__irq_stack_union
will be irq_stack_union + __per_cpu_load
. Now we need to understand where init_per_cpu__irq_stack_union
and __per_cpu_load
are what they mean. The first irq_stack_union
is defined in the with the DECLARE_INIT_PER_CPU
macro which expands to call the init_per_cpu_var
macro:
If we expand all macros we will get the same init_per_cpu__irq_stack_union
as we got after expanding the INIT_PER_CPU
macro, but you can note that it is not just a symbol, but a variable. Let's look at the typeof(per_cpu_var(var))
expression. Our var
is irq_stack_union
and the per_cpu_var
macro is defined in the :
So, we are accessing gs:irq_stack_union
and getting its type which is irq_union
. Ok, we defined the first variable and know its address, now let's look at the second __per_cpu_load
symbol. There are a couple of per-cpu
variables which are located after this symbol. The __per_cpu_load
is defined in the :
and presented base address of the per-cpu
variables from the data area. So, we know the address of the irq_stack_union
, __per_cpu_load
and we know that init_per_cpu__irq_stack_union
must be placed right after __per_cpu_load
. And we can see it in the :
Here we specified a model specific register with MSR_GS_BASE
, put the content of the initial_gs
to the edx:eax
pair and execute the wrmsr
instruction for filling the gs
register with the base address of the init_per_cpu__irq_stack_union
which will be at the bottom of the interrupt stack. After this we will jump to the C code on the x86_64_start_kernel
from the . In the x86_64_start_kernel
function we do the last preparations before we jump into the generic and architecture-independent kernel code and one of these preparations is filling the early Interrupt Descriptor Table
with the interrupts handlers entries or early_idt_handlers
. You can remember it, if you have read the part about the and can remember following code:
but I wrote Early interrupt and exception handling
part when Linux kernel version was - 3.18
. For this day actual version of the Linux kernel is 4.1.0-rc6+
and Andy Lutomirski
sent the and soon it will be in the mainline kernel that changes behaviour for the early_idt_handlers
. NOTE While I wrote this part the already turned in the Linux kernel source code. Let's look on it. Now the same part looks like:
So, the early_idt_handler_array
is an array of the interrupts handlers entry points and contains one entry point on every nine bytes. You can remember that previous early_idt_handlers
was defined in the . The early_idt_handler_array
is defined in the same source code file too:
It fills early_idt_handler_array
with the .rept NUM_EXCEPTION_VECTORS
and contains entry of the early_make_pgtable
interrupt handler (you can read more about its implementation in the part about ). For now, we have reached the end of the x86_64 architecture-specific code and the next part is the generic kernel code. You probably already know, that we will return to the architecture-specific code in the setup_arch
function and other places, but this is the end of the x86_64
early code.
The next stop after the is the biggest start_kernel
function from the . If you've read the previous about the Linux kernel initialization process, you must remember it. This function does all initialization stuff before kernel will launch first init
process with the - 1
. The first thing that is related to the interrupts and exceptions handling is the call of the boot_init_stack_canary
function.
This function sets the value to protect interrupt stack overflow. We already saw a little some details about implementation of the boot_init_stack_canary
in the previous part and now let's take a closer look on it. You can find implementation of this function in the and its depends on the CONFIG_CC_STACKPROTECTOR
kernel configuration option. If this option is not set this function will not do anything:
If the CONFIG_CC_STACKPROTECTOR
kernel configuration option is set, the boot_init_stack_canary
function starts from the check stat irq_stack_union
that represents interrupt stack has offset equal to forty bytes from the stack_canary
value:
As we can read in the previous the irq_stack_union
represented by the following union:
which defined in the . We know that in the programming language is a data structure which stores only one field in a memory. We can see here that structure has first field - gs_base
which is 40 bytes size and represents bottom of the irq_stack
. So, after this our check with the BUILD_BUG_ON
macro should end successfully. (you can read the first part about Linux kernel initialization if you're interesting about the BUILD_BUG_ON
macro).
After this we calculate new canary
value based on the random number and :
more about this_cpu_*
operation you can read in the .
The next step in the which is related to the interrupts and interrupts handling after we have set the canary
value to the interrupt stack - is the call of the local_irq_disable
macro.
This macro defined in the header file and as you can understand, we can disable interrupts for the CPU with the call of this macro. Let's look on its implementation. First of all note that it depends on the CONFIG_TRACE_IRQFLAGS_SUPPORT
kernel configuration option:
They are both similar and as you can see have only one difference: the local_irq_disable
macro contains call of the trace_hardirqs_off
when CONFIG_TRACE_IRQFLAGS_SUPPORT
is enabled. There is special feature in the subsystem - irq-flags tracing
for tracing hardirq
and softirq
state. In our case lockdep
subsystem can give us interesting information about hard/soft irqs on/off events which are occurs in the system. The trace_hardirqs_off
function defined in the :
and just calls trace_hardirqs_off_caller
function. The trace_hardirqs_off_caller
checks the hardirqs_enabled
field of the current process and increases the redundant_hardirqs_off
if call of the local_irq_disable
was redundant or the hardirqs_off_events
if it was not. These two fields and other lockdep
statistic related fields are defined in the and located in the lockdep_stats
structure:
Ok, now we know a little about tracing, but more info will be in the separate part about lockdep
and tracing
. You can see that the both local_irq_disable
macros have the same part - raw_local_irq_disable
. This macro defined in the and expands to the call of the:
And you already must remember that cli
instruction clears the flag which determines ability of a processor to handle an interrupt or an exception. Besides the local_irq_disable
, as you already can know there is an inverse macro - local_irq_enable
. This macro has the same tracing mechanism and very similar on the local_irq_enable
, but as you can understand from its name, it enables interrupts with the sti
instruction:
Now we know how local_irq_disable
and local_irq_enable
work. It was the first call of the local_irq_disable
macro, but we will meet these macros many times in the Linux kernel source code. But for now we are in the start_kernel
function from the and we just disabled local
interrupts. Why local and why we did it? Previously kernel provided a method to disable interrupts on all processors and it was called cli
. This function was and now we have local_irq_{enabled,disable}
to disable or enable interrupts on the current processor. After we've disabled the interrupts with the local_irq_disable
macro, we set the:
The early_boot_irqs_disabled
variable defined in the :
and used in the different places. For example it used in the smp_call_function_many
function from the for the checking possible deadlock when interrupts are disabled:
The next functions after the local_disable_irq
are boot_cpu_init
and page_address_init
, but they are not related to the interrupts and exceptions (more about this functions you can read in the chapter about Linux kernel ). The next is the setup_arch
function. As you can remember this function located in the source code file and makes initialization of many different architecture-dependent . The first interrupts related function which we can see in the setup_arch
is the - early_trap_init
function. This function defined in the and fills Interrupt Descriptor Table
with the couple of entries:
All of these functions defined in the and do the similar thing but not the same. The first set_intr_gate_ist
function inserts a new interrupt gate in the IDT
. Let's look on its implementation:
First of all we can see the check that n
which is of the interrupt is not greater than 0xff
or 255. We need to check it because we remember from the previous that vector number of an interrupt must be between 0
and 255
. In the next step we can see the call of the _set_gate
function that sets a given interrupt gate to the IDT
table:
Here we start from the pack_gate
function which takes clean IDT
entry represented by the gate_desc
structure and fills it with the base address and limit, , , type of an interrupt which can be one of the following values:
If you have any questions or suggestions write me a comment or ping me at .
Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to .