How Tech - Systems Programming

How Tech - Systems Programming

cgroups and memcg: How the Kernel Enforces Memory Limits

Jun 11, 2026
∙ Paid

Part I


Most engineers set container memory limits and assume the enforcement is simple — process hits limit, gets killed. What actually happens involves three distinct threshold tiers, a per-page charge mechanism that tracks kernel objects alongside heap, and an OOM killer that defaults to shooting the wrong process. Here is what is happening under the hood.

Why ulimits Fall Short

setrlimit(RLIMIT_AS) limits virtual address space, not physical memory consumption. A process can mmap 8GB, touch 400MB of it, and stay within a 600MB RLIMIT_AS — while actually consuming 400MB of physical pages. RLIMIT_DATA covers only the initialized data segment; mmap-backed heap allocations are invisible to it. Neither limit tracks page cache, shared file mappings, or kernel objects — socket buffers, dentries, inodes — allocated on your process’s behalf. cgroups track all of it.

How memcg Charges Every Byte

Every physical page in the kernel carries a memcg pointer. When a page fault brings in an anonymous page — heap allocation, stack growth, copy-on-write trigger — mem_cgroup_charge() fires before the page is mapped into the process. It walks up the cgroup hierarchy and increments usage counters at each level. The charge lives on the page, not the process: if the process exits but another process in the same cgroup still maps that page, the charge stays until the page is reclaimed.

User's avatar

Continue reading this post for free, courtesy of Systems.

Or purchase a paid subscription.
© 2026 Sumedh S · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture