clocksourceframework in the previous part. We have started to consider this framework because it is closely related to the special counters which are provided by the Linux kernel. One of these counters which we already saw in the first part of this chapter is -
jiffies. As I already wrote in the first part of this chapter, we will consider time management related stuff step by step during the Linux kernel initialization. Previous step was call of the:
refined_jiffiesclock source for us. Recall that this function is called from the
setup_archfunction that is defined in the arch/x86/kernel/setup.c source code and executes architecture-specific (x86_64 in our case) initialization. Look on the implementation of the
setup_archand you will note that the call of the
register_refined_jiffiesis the last step before the
setup_archfunction finishes its work.
x86_64specific things already configured after the end of the
setup_archexecution. For example some early interrupt handlers already able to handle interrupts, memory space reserved for the initrd, DMI scanned, the Linux kernel log buffer is already set and this means that the printk function is able to work, e820 parsed and the Linux kernel already knows about available memory and and many many other architecture specific things (if you are interested, you can read more about the
setup_archfunction and Linux kernel initialization process in the second chapter of this book).
setup_archfinished its work and we can go back to the generic Linux kernel code. Recall that the
setup_archfunction was called from the
start_kernelfunction which is defined in the init/main.c source code file. So, we shall return to this function. You can see that there are many different functions that are called right after
setup_archfunction inside of the
start_kernelfunction, but since our chapter is devoted to timers and time management related stuff, we will skip all code which is not related to this topic. The first function which is related to the time management in the Linux kernel is:
tick broadcastframework related data structures;
fulltickless mode related data structures.
tick broadcastframework in this book and didn't know anything about tickless mode in the Linux kernel. So, the main point of this part is to look on these concepts and to know what they are.
tick_initfunction. As I already wrote, this function is defined in the kernel/time/tick-common.c source code file and consists from the two calls of following functions:
tick_broadcast_initfunction for now. This function defined in the kernel/time/tick-broadcast.c source code file and executes initialization of the
tick broadcastframework related data structures. Before we will look on the implementation of the
tick_broadcast_initfunction and will try to understand what does this function do, we need to know about
idletask. We already saw a little about this in the last part of the Linux kernel initialization process. When the Linux kernel will finish all initialization processes in the
start_kernelfunction from the init/main.c source code file, it will call the
rest_initfunction from the same source code file. Main point of this function is to launch kernel
initthread and the
kthreaddthread, to call the
schedulefunction to start task scheduling and to go to sleep by calling the
cpu_idle_loopfunction that defined in the kernel/sched/idle.c source code file.
cpu_idle_loopfunction represents infinite loop which checks the need for rescheduling on each iteration. After the scheduler finds something to execute, the
idleprocess will finish its work and the control will be moved to a new runnable task with the call of the
cpu_idle_loopfunction and details of the
idlestate in this part, because it is not related to our topic. But there is one interesting moment for us. We know that the processor can execute only one task in one time. How does the Linux kernel decide to reschedule and stop
idleprocess if the processor executes infinite loop in the
cpu_idle_loop? The answer is system timer interrupts. When an interrupt occurs, the processor stops the
idlethread and transfers control to an interrupt handler. After the system timer interrupt handler will be handled, the
need_reschedwill return true and the Linux kernel will stop
idleprocess and will transfer control to the current runnable task. But handling of the system timer interrupts is not effective for power management, because if a processor is in
idlestate, there is little point in sending it a system timer interrupt.
CONFIG_HZ_PERIODICkernel configuration option which is enabled in the Linux kernel and tells to handle each interrupt of the system timer. To solve this problem, the Linux kernel provides two additional ways of managing scheduling-clock interrupts:
CONFIG_NO_HZ_IDLEkernel configuration option. This option allows Linux kernel to avoid sending timer interrupts to idle processors. In this case periodic timer interrupts will be replaced with on-demand interrupts. This mode is called -
dyntick-idlemode. But if the kernel does not handle interrupts of a system timer, how can the kernel decide if the system has nothing to do?
tick_nohz_idle_enterfunction that defined in the kernel/time/tick-sched.c source code file and enabled with the call of the
tick_nohz_idle_exitfunction. There is special concept in the Linux kernel which is called -
clock event devicesthat are used to schedule the next interrupt. This concept provides API for devices which can deliver interrupts at a specific time in the future and represented by the
clock_event_devicestructure in the Linux kernel. We will not dive into implementation of the
clock_event_devicestructure now. We will see it in the next part of this chapter. But there is one interesting moment for us right now.
idlestate or that have only one runnable task or in other words busy processor. We can enable this feature with the
CONFIG_NO_HZ_FULLkernel configuration option and it allows to reduce the number of timer interrupts significantly.
cpu_idle_loop, idle processor can be in a sleeping state. The Linux kernel provides special
cpuidleframework. Main point of this framework is to put an idle processor to sleeping states. The name of the set of these states is -
C-states. But how will a processor will be woken if local timer is disabled? The Linux kernel provides
tick broadcastframework for this. The main point of this framework is assign a timer which is not affected by the
C-states. This timer will wake a sleeping processor.
tick_initfunction just calls two following functions:
tick_broadcast_initfunction defined in the kernel/time/tick-broadcast.c source code file and executes initialization of the
tick broadcastframework related data structures. Let's look on the implementation of the
cpumaskwith the certain flags with the help of the
cpumasksthat will be initialized in the
tick_broadcast_initfunction. As we can see, the
tick_broadcast_initfunction will initialize six
cpumasks, and moreover, initialization of the last three
cpumaskswill depend on the
CONFIG_TICK_ONESHOTkernel configuration option.
tick_broadcast_mask- the bitmap which represents list of processors that are in a sleeping mode;
tick_broadcast_on- the bitmap that stores numbers of processors which are in a periodic broadcast state;
tmpmask- this bitmap for temporary usage.
cpumasksdepends on the
CONFIG_TICK_ONESHOTkernel configuration option. Actually each clock event devices can be in one of two modes:
periodic- clock events devices that support periodic events;
oneshot- clock events devices that capable of issuing events that happen only once.
tick_broadcast_oneshot_mask- stores numbers of processors that must be notified;
tick_broadcast_pending_mask- stores numbers of processors that pending broadcast;
tick_broadcast_force_mask- stores numbers of processors with enforced broadcast.
tick broadcastframework, and now we can proceed to implementation of this framework.
specialclock source devices which can raise an interrupt at a specified time. We already know that such timers called
clock eventsdevices in the Linux kernel. Besides
clock eventsdevices, each processor in the system has its own local timer which is programmed to issue interrupt at the time of the next deferred task. Also these timers can be programmed to do a periodical job, like updating
jiffiesand etc. These timers represented by the
tick_devicestructure in the Linux kernel. This structure defined in the kernel/time/tick-sched.h header file and looks:
tick_devicestructure contains two fields. The first field -
evtdevrepresents pointer to the
clock_event_devicestructure that is defined in the include/linux/clockchips.h header file and represents descriptor of a clock event device. A
clock eventdevice allows to register an event that will happen in the future. As I already wrote, we will not consider
clock_event_devicestructure and related API in this part, but will see it in the next part.
tick_devicestructure represents mode of the
tick_device. As we already know, the mode can be one of the:
clock eventsdevice in the system registers itself by the call of the
clockevents_config_and_registerfunction during initialization process of the Linux kernel. During the registration of a new
clock eventsdevice, the Linux kernel calls the
tick_check_new_devicefunction that defined in the kernel/time/tick-common.c source code file and checks the given
clock eventsdevice should be used by the Linux kernel. After all checks, the
tick_check_new_devicefunction executes a call of the:
clock eventdevice can be broadcast device and install it, if the given device can be broadcast device. Let's look on the implementation of the
tick_check_broadcast_devicefunction which checks that a given clock events device can be utilized as broadcast device. The main point of the
tick_check_broadcast_devicefunction is to check value of the
featuresfield of the given
clock eventsdevice. As we can understand from the name of this field, the
featuresfield contains a clock event device features. Available values defined in the include/linux/clockchips.h header file and can be one of the
CLOCK_EVT_FEAT_PERIODIC- which represents a clock events device which supports periodic events and etc. So, the
CLOCK_EVT_FEAT_DUMMYand other flags and returns
falseif the given clock events device has one of these features. In other way the
ratingsof the given clock event device and current clock event device and returns the best.
tick_check_broadcast_devicefunction, we can see the call of the
try_module_getfunction that checks module owner of the clock events. We need to do it to be sure that the given
clock eventsdevice was correctly initialized. The next step is the call of the
clockevents_exchange_devicefunction that defined in the kernel/time/clockevents.c source code file and will release old clock events device and replace the previous functional handler with a dummy handler.
tick_install_broadcast_devicefunction we check that the
tick_broadcast_maskis not empty and start the given
clock eventsdevice in periodic mode with the call of the
tick_broadcast_maskfilled in the
tick_device_uses_broadcastfunction that checks a
clock eventsdevice during registration of this
tick_broadcast_start_periodicfunction check the given
clock eventdevice and call the
off) and sets the broadcast handler depends on its value:
hpet_interrupt_handlergets the IRQ specific data and check the event handler of the
clock eventdevice. Recall that we just set in the
tick_set_periodic_handlerfunction. So the
tick_handler_periodic_broadcastfunction will be called in the end of the high precision event timer interrupt handler.
tick_handler_periodic_broadcastfunction calls the
cpumaskand call the
tick broadcastframework in the Linux kernel. We have missed some aspects of this framework, for example reprogramming of a
clock eventdevice and broadcast with the oneshot timer and etc. But the Linux kernel is very big, it is not realistic to cover all aspects of it. I think it will be interesting to dive into it yourself.
tick_initfunction. We just consider the
tick_broadcast_initfunction and related theory, but the
tick_initfunction contains another call of a function and this function is -
tick_nohz_init. Let's look on the implementation of this function.
dyntickconcept in this part and we know that this concept allows kernel to disable system timer interrupts in the
tick_nohz_initfunction makes initialization of the different data structures which are related to this concept. This function defined in the kernel/time/tick-sched.c source code file and starts from the check of the value of the
tick_nohz_full_runningvariable which represents state of the tick-less mode for the
idlestate and the state when system timer interrupts are disabled during a processor has only one runnable task:
tick_nohz_init_allfunction that defined in the same source code file and check its result. The
tick_nohz_init_allfunction tries to allocate the
tick_nohz_full_maskwith the call of the
alloc_cpumask_varthat will allocate space for a
tick_nohz_full_maskwill store numbers of processors that have enabled full
NO_HZ. After successful allocation of the
tick_nohz_full_maskwe set all bits in the
tick_nohz_full_mask, set the
tick_nohz_full_runningand return result to the
cpumaskwill store number of processor for
housekeepingor in other words we need at least in one processor that will not be in
NO_HZmode, because it will do timekeeping and etc. After this we check the result of the architecture-specific
arch_irq_work_has_interruptfunction. This function checks ability to send inter-processor interrupt for the certain architecture. We need to check this, because system timer of a processor will be disabled during
NO_HZmode, so there must be at least one online processor which can send inter-processor interrupt to awake offline processor. This function defined in the arch/x86/include/asm/irq_work.h header file for the x86_64 and just checks that a processor has APIC from the CPUID:
APIC, the Linux kernel prints warning message, clears the
tick_nohz_full_maskcpumask, copies numbers of all possible processors in the system to the
housekeeping_maskand resets the value of the
smp_processor_idand check this processor in the
tick_nohz_full_mask. If the
tick_nohz_full_maskcontains a given processor we clear appropriate bit in the
cpu_possible_maskand not in the
housekeeping_maskwill contain all processors of the system except a processor for timekeeping. In the last step of the
tick_nohz_init_allfunction, we are going through all processors that are defined in the
tick_nohz_full_maskand call the following function for an each processor:
context_tracking_cpu_setfunction defined in the kernel/context_tracking.c source code file and main point of this function is to set the
context_tracking.activepercpu variable to
true. When the
activefield will be set to
truefor the certain processor, all context switches will be ignored by the Linux kernel context tracking subsystem for this processor.
tick_nohz_initfunction. After this
NO_HZrelated data structures will be initialized. We didn't see API of the
NO_HZmode, but will see it soon.
clocksourceconcept in the Linux kernel which represents framework for managing different clock source in a interrupt and hardware characteristics independent way. We continued to look on the Linux kernel initialization process in a time management context in this part and got acquainted with two new concepts for us: the
tick broadcastframework and
tick-lessmode. The first concept helps the Linux kernel to deal with processors which are in deep sleep and the second concept represents the mode in which kernel may work to improve power management of