Symptoms and causes

A program or process hangs when it waits for a condition that never materializes. The program ends up in a state where it may stop responding permanently. Hangs are challenging because, unlike crashes, they might not come with a backtrace/core file that will tell us the affected code path. The first order is to identify the thread(s) stuck waiting for the condition and then figure out why this condition is unfulfilled.

Pattern to debug a hang

In this section, we’ll discuss a general pattern or an ordered sequence of steps to follow for debugging hangs.

Step 1: Identify the stuck thread(s)

The first step is identifying the thread(s) waiting for the condition. The ideal information needed here is a memory dump of the process when it is stuck. Some platforms and programming languages make this information available readily. In other cases, we need the memory or a thread dump containing the backtraces of all threads when the dump was taken. It will tell us what every thread in the process is doing, which could be anything. A thread could be in the middle of an operation or be waiting for something. We are interested in threads that are waiting for something.

A thread can end up in the waiting state for various conditions. Some examples are as follows:

  • In a thread pool waiting for work on a condition variable

  • Waiting for the completion of an I/O operation in a poll or select operation

  • Waiting to lock a mutex or a semaphore on a lock or acquire operation

  • Intentionally waiting on a timed wait or a sleep

  • Waiting for a signal or event from an external source, like another thread or a process

Get hands-on with 1200+ tech skills courses.