📊
linux-insides
  • README
  • Summary
    • Booting
      • From bootloader to kernel
      • First steps in the kernel setup code
      • Video mode initialization and transition to protected mode
      • Transition to 64-bit mode
      • Kernel decompression
      • Kernel load address randomization
    • Initialization
      • First steps in the kernel
      • Early interrupts handler
      • Last preparations before the kernel entry point
      • Kernel entry point
      • Continue architecture-specific boot-time initializations
      • Architecture-specific initializations, again...
      • End of the architecture-specific initializations, almost...
      • Scheduler initialization
      • RCU initialization
      • End of initialization
    • Interrupts
      • Introduction
      • Start to dive into interrupts
      • Interrupt handlers
      • Initialization of non-early interrupt gates
      • Implementation of some exception handlers
      • Handling Non-Maskable interrupts
      • Dive into external hardware interrupts
      • Initialization of external hardware interrupts structures
      • Softirq, Tasklets and Workqueues
      • Last part
    • System calls
      • Introduction to system calls
      • How the Linux kernel handles a system call
      • vsyscall and vDSO
      • How the Linux kernel runs a program
      • Implementation of the open system call
      • Limits on resources in Linux
    • Timers and time management
      • Introduction
      • Clocksource framework
      • The tick broadcast framework and dyntick
      • Introduction to timers
      • Clockevents framework
      • x86 related clock sources
      • Time related system calls
    • Synchronization primitives
      • Introduction to spinlocks
      • Queued spinlocks
      • Semaphores
      • Mutex
      • Reader/Writer semaphores
      • SeqLock
      • RCU
      • Lockdep
    • Memory management
      • Memblock
      • Fixmaps and ioremap
      • kmemcheck
    • Cgroups
      • Introduction to Control Groups
    • SMP
    • Concepts
      • Per-CPU variables
      • Cpumasks
      • The initcall mechanism
      • Notification Chains
    • Data Structures in the Linux Kernel
      • Doubly linked list
      • Radix tree
      • Bit arrays
    • Theory
      • Paging
      • Elf64
      • Inline assembly
      • CPUID
      • MSR
    • Initial ram disk
    • Misc
      • Linux kernel development
      • How the kernel is compiled
      • Linkers
      • Program startup process in userspace
      • Write and Submit your first Linux kernel Patch
      • Data types in the kernel
    • KernelStructures
      • IDT
    • Useful links
    • Contributors
Powered by GitBook
On this page
  • Limits on resource in the Linux kernel
  • Conclusion
  • Links

Was this helpful?

  1. Summary
  2. System calls

Limits on resources in Linux

PreviousImplementation of the open system callNextTimers and time management

Last updated 2 years ago

Was this helpful?

Each process in the system uses certain amount of different resources like files, CPU time, memory and so on.

Such resources are not infinite and each process and we should have an instrument to manage it. Sometimes it is useful to know current limits for a certain resource or to change its value. In this post we will consider such instruments that allow us to get information about limits for a process and increase or decrease such limits.

We will start from userspace view and then we will look how it is implemented in the Linux kernel.

There are three main fundamental to manage resource limit for a process:

  • getrlimit

  • setrlimit

  • prlimit

The first two allows a process to read and set limits on a system resource. The last one is extension for previous functions. The prlimit allows to set and read the resource limits of a process specified by . Definitions of these functions looks:

The getrlimit is:

int getrlimit(int resource, struct rlimit *rlim);

The setrlimit is:

int setrlimit(int resource, const struct rlimit *rlim);

And the definition of the prlimit is:

int prlimit(pid_t pid, int resource, const struct rlimit *new_limit,
            struct rlimit *old_limit);

In the first two cases, functions takes two parameters:

  • resource - represents resource type (we will see available types later);

  • rlim - combination of soft and hard limits.

There are two types of limits:

  • soft

  • hard

The first provides actual limit for a resource of a process. The second is a ceiling value of a soft limit and can be set only by superuser. So, soft limit can never exceed related hard limit.

Both these values are combined in the rlimit structure:

struct rlimit {
    rlim_t rlim_cur;
    rlim_t rlim_max;
};

The last one function looks a little bit complex and takes 4 arguments. Besides resource argument, it takes:

  • pid - specifies an ID of a process on which the prlimit should be executed;

  • new_limit - provides new limits values if it is not NULL;

  • old_limit - current soft and hard limits will be placed here if it is not NULL.

For example:

~$ strace ulimit -s 2>&1 | grep rl

prlimit64(0, RLIMIT_NPROC, NULL, {rlim_cur=63727, rlim_max=63727}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0

Here we can see prlimit64, but not the prlimit. The fact is that we see underlying system call here instead of library call.

Now let's look at list of available resources:

Resource
Description

RLIMIT_CPU

CPU time limit given in seconds

RLIMIT_FSIZE

the maximum size of files that a process may create

RLIMIT_DATA

the maximum size of the process's data segment

RLIMIT_STACK

the maximum size of the process stack in bytes

RLIMIT_CORE

RLIMIT_RSS

the number of bytes that can be allocated for a process in RAM

RLIMIT_NPROC

the maximum number of processes that can be created by a user

RLIMIT_NOFILE

the maximum number of a file descriptor that can be opened by a process

RLIMIT_MEMLOCK

RLIMIT_AS

the maximum size of virtual memory in bytes.

RLIMIT_LOCKS

RLIMIT_SIGPENDING

RLIMIT_MSGQUEUE

RLIMIT_NICE

RLIMIT_RTPRIO

maximum real-time priority value

RLIMIT_RTTIME

maximum number of microseconds that a process may be scheduled under real-time scheduling policy without making blocking system call

If you're looking into source code of open source projects, you will note that reading or updating of a resource limit is quite widely used operation.

/* Don't limit the coredump size */
(void) setrlimit(RLIMIT_CORE, &RLIMIT_MAKE_CONST(RLIM_INFINITY));
getrlimit(RLIMIT_NOFILE, &limit);
if (limit.rlim_cur < global.maxsock) {
	Warning("[%s.main()] FD limit (%d) too low for maxconn=%d/maxsock=%d. Please raise 'ulimit-n' to %d or more to avoid any trouble.\n",
		argv[0], (int)limit.rlim_cur, global.maxconn, global.maxsock, global.maxsock);
}

We've just saw a little bit about resources limits related stuff in the userspace, now let's look at the same system calls in the Linux kernel.

Limits on resource in the Linux kernel

Both implementation of getrlimit system call and setrlimit looks similar. Both they execute do_prlimit function that is core implementation of the prlimit system call and copy from/to given rlimit from/to userspace:

The getrlimit:

SYSCALL_DEFINE2(getrlimit, unsigned int, resource, struct rlimit __user *, rlim)
{
	struct rlimit value;
	int ret;

	ret = do_prlimit(current, resource, NULL, &value);
	if (!ret)
		ret = copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0;

	return ret;
}

and setrlimit:

SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim)
{
	struct rlimit new_rlim;

	if (copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
		return -EFAULT;
	return do_prlimit(current, resource, &new_rlim, NULL);
}

First of all the do_prlimit function executes a check that the given resource is valid:

if (resource >= RLIM_NLIMITS)
	return -EINVAL;

and in a failure case returns -EINVAL error. After this check will pass successfully and new limits was passed as non NULL value, two following checks:

if (new_rlim) {
	if (new_rlim->rlim_cur > new_rlim->rlim_max)
		return -EINVAL;
	if (resource == RLIMIT_NOFILE &&
			new_rlim->rlim_max > sysctl_nr_open)
		return -EPERM;
}
~$ cat /proc/sys/fs/nr_open
1048576
read_lock(&tasklist_lock);
...
...
...
read_unlock(&tasklist_lock);

We need to do this because prlimit system call allows us to update limits of another task by the given pid. As task list is locked, we take the rlimit instance that is responsible for the given resource limit of the given process:

rlim = tsk->signal->rlim + resource;

where the tsk->signal->rlim is just array of struct rlimit that represents certain resources. And if the new_rlim is not NULL we just update its value. If old_rlim is not NULL we fill it:

if (old_rlim)
    *old_rlim = *rlim;

That's all.

Conclusion

Links

Exactly prlimit function is used by util. We can verify this with the help of util.

the maximum size of a file.

the maximum number of bytes of memory that may be locked into RAM by .

the maximum number and locking related calls

maximum number of that may be queued for a user of the calling process

the number of bytes that can be allocated for

the maximum value that can be set by a process

For example:

Or :

Implementations of these system calls are defined in the kernel source code file.

check that the given soft limit does not exceed hard limit and in a case when the given resource is the maximum number of a file descriptors that hard limit is not greater than sysctl_nr_open value. The value of the sysctl_nr_open can be found via :

After all of these checks we lock tasklist to be sure that handlers related things will not be destroyed while we updating limits for a given resource:

This is the end of the second part that describes implementation of the system calls in the Linux kernel. If you have questions or suggestions, ping me on Twitter , drop me an , or just create an .

Please note that English is not my first language and I am really sorry for any inconvenience. If you find any mistakes please send me PR to .

system calls
PID
ulimit
strace
systemd
haproxy
kernel/sys.c
procfs
signal
0xAX
email
issue
linux-insides
system calls
PID
ulimit
strace
POSIX message queues
core
mlock
flock
fcntl
signals
POSIX message queues
nice