Motivation

Understand and appreciate the role of debugging in the software engineering life cycle.

Why learn debugging?

A typical software product’s development process will entail the following:

  1. Requirements for the product are finalized and handed over to the development teams responsible for various components within the product.

  2. Within each team, various members are responsible for subcomponents within the component. Individual members divide the work further among themselves.

  3. The engineers write up the components they are responsible for and test whether their features function correctly (commonly called dev testing) independently. They will further test to see if their feature works with other components (integration testing).

  4. Development teams then hand over the product to the Quality Assurance (QA) teams. QA teams run tests to verify if the product functions correctly and reliably and is ready for customer use. They will bugs that the dev teams must fix.

  5. The product is shipped to the customer once the bugs get fixed.

Except for step one and parts of step two, all the steps above involve figuring out why the code produced so far does not do what it is supposed to, a process called debugging. Therefore, software engineers typically spend more time dealing with bugs than writing new code.

Press + to interact
Programmer's time
Programmer's time

Internal vs. external bugs

For bugs found by the QA team, the development team usually has the luxury of accessing the QA’s setup where the bug is reproducible, which makes this process easy in most cases. With this setup, the developer can add any diagnostic information or other changes to the code to a patch, which can aid debugging. Debug patches are those that help in debugging.

Nonproduction debug patches are allowed to alter the product’s acceptable behavior. It is common to release a debug patch in an internal or nonproduction environment and collect more information. Even if this patch adversely affects the nonproduction setup, it is possible to recreate it. No customer (especially enterprise customers) will ever agree to run such builds that come with warnings in their production environment.

Programmers look forward to this phase of the software development process less than the design and coding phase for many reasons. The debugging stage offers the least scope for learning anything “cool.” Besides, a bug means mistakes in the work done so far, which can feel discomfiting.

Press + to interact
Bugs life cycle
Bugs life cycle

The responsibility for bugs continues after the product is shipped to the customer. Many bugs go undetected during the testing phase and manifest on a released product. Now, a customer who has paid for the software cannot use it. There are many reasons bugs can get shipped to a customer, for example:

  • Abnormalities in the customer’s environment that trigger the bug are only seen by the customer. For example, specific usage patterns, such as network traffic patterns or usage particular to a customer or hardware features or failures seen by the customer, can cause code to take a path not tested in-house, leading to the bug.

  • Other third-party software that is (inadvertently, in most cases) affecting the operations of our product.

  • The planning phase for testing this product may have overlooked a use case.

Sometimes, the reasons above can be so severe that the bug might never be reproducible in-house, even if the nature of the problem is fully known. It will be the development team’s responsibility to fix these bugs. Now, they need the luxury of accessing the environment where the bug is being seen but will have to rely on the available diagnostic information.

Debugging bugs that are seen only in a customer’s environment is very demanding. Developers will have to rely on diagnostic information, like logs already collected in the product’s support bundle, to investigate the bug. Suppose we cannot make progress on the bug with this diagnostic information. We have to identify the information required to diagnose the issue, and a new build or a patch will have to be installed again in the customer’s environment.

Bugs found in the customer’s environment are usually called escalations. Cracking bugs at this stage is more critical than before. If escalations are not root-caused and fixed fast, the customer can move on to the product’s competitor, affecting the product’s future and, ultimately, the company itself. Also, time spent on escalations is time away from working on new features that can add more value to the product, so it is better to minimize this time.

Press + to interact
Debugging at the heart of software development
Debugging at the heart of software development

Debugging: The unsung life skill of the developer

Teams tasked with going after escalations must be well-armed with tools and techniques to solve bugs. Yet, debugging is primarily self-taught. There are hardly a few books on debugging compared to the ones on programming. How many academic courses on debugging out there? How many conferences on debugging? There have been recent developments in monitoring and observability that deal with bottlenecks and issues on a high level. The goals of monitoring and observability are to indicate when something is wrong and, ideally, to pinpoint out the area where issues arise. The tools used here won’t help debug lower-level code issues. We’ll, nevertheless, cover this topic in detail.

Even the name “debugging" is quite outdated and uninviting (yet, unfortunately, we will use it throughout this course)! Root causing a bug involves:

  • Asking the right questions

  • Using the right tools

  • Searching at the right place at the right time

  • Showing grit

  • A lot of critical thinking

Press + to interact

The results are pretty rewarding. Solving a customer’s issue and saving the day makes an engineer more of a hero than shipping a product. Therefore, debugging is more akin to solving a crime. One starts with a victim (bug symptom), collects information from the crime scene (logs, support bundle), compiles a (bug) report, recreates the scene with the collected information (reproduces bug in-house), explores motives and suspects (root cause), and finally solves it (put out a fix and verify). So, it is hardly akin to terminating bugs.