How Tech - Systems Programming

How Tech - Systems Programming

Post-Mortem Analysis: Core Dump Generation and Inspection in High-Availability Systems

Jan 25, 2026
∙ Paid

Your service just segfaulted at 3 AM. The process is gone, monitoring shows nothing useful, and you need to know what happened. This is where core dumps become critical—but only if they’re actually being generated.

Why Core Dumps Fail in Production

Most production systems have RLIMIT_CORE set to zero. Check yours right now: ulimit -c probably returns 0. This means when your process crashes, the kernel skips dump generation entirely. Even if you set it to unlimited, a 16GB process creates a 16GB+ core file. If you’re on a filesystem that’s 90% full, the write fails mid-dump and you get nothing.

The second killer is container environments. Your Kubernetes pod writes the core to /cores, but that path isn’t mounted to persistent storage. The pod restarts, ephemeral storage gets wiped, and your evidence disappears. I’ve spent entire incident responses debugging phantom crashes because no one configured a PersistentVolume for cores.

User's avatar

Continue reading this post for free, courtesy of Systems.

Or purchase a paid subscription.
© 2026 Sumedh S · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture