A Commitment to Reliability

Context

I am currently taking an embedded systems class, which is naturally quite lab heavy. We are using STM32L4xx microcontrollers and we are programming them in C. The other day, I noticed some rather unsafe practices in people’s code. This is certainly acceptable in a lab (where the goal is to finish the assignment and leave as soon as possible), but the issue arises when these practices become ingrained in one’s mind.

I spent a while debating what exactly this post should be about. Initially it was going to be “A Coding Practice that Needs to Die”. Then I figured that something more general – perhaps “A List of Coding Practices to Avoid” or “Never Leave Things in an Undefined State” – may be more beneficial.

Eventually I realized the true point I wish to convey: as engineers, it is nothing less than our responsibility to commit to designing reliable products. What do I mean by this? Broadly speaking, it is somewhat obvious: we must create things that do not break. And if they do, they should do so somewhat predictably.

Why?

At the very worst, it could mean the difference between somebody’s life and death, even if you are creating something that seems dissociated from human lives. Suppose you are writing code for the infotainment system of a car. It’s not directly related to the operation of the car, so it is classified as a non-critical system and is not reviewed/tested thoroughly. Now suppose your code is buggy and the bluetooth connection to the user’s phone drops randomly – infrequently enough that most users pick up their phone and turn bluetooth off and back on. Eventually, someone is going to get in a car accident because of this.

You may argue that this is the user’s fault, which is technically true. The example is rather dramatic, and in most cases, unreliable systems are nothing more than a small inconvenience. But errors can propagate. Bad code in the hands of a fool can (apparently still) be pushed to millions of users. The key point is that this can all be prevented by designing reliable systems in the first place.

Furthermore, remember that electrical engineers are more often writing low-level code for hardware (rather than pure software engineers) because a strong understanding of hardware – as well as a system-wide perspective of the project – is necessary. This type of work is inherently unforgiving to bugs, and a great deal of care must be taken.

Example: Undefined States

For some reason, in the introductory C++ class at UCSB we were taught to write for loops as shown below. Can you guess why this may be problematic?

int i;
for (i = 0; i < val; i++) {
    // do something
}
// dummy code below, just an example of how we may use i after the for loop
if (i % 2 == 0) {
    // do something
}
else {
    // do something
}

The problem is that i is initially in an undefined state, since it is declared as int i; without an initial value. You may think “Who cares? i is set to zero in the for loop in the very next line.”, but I encourage you to think more carefully. Suppose you come back to this code segment two weeks later, realizing that the for loop only needs to run under certain conditions. You throw an if condition around it:

int i;
if (need_to_run_for_loop) {
    for (i = 0; i < val; i++) {
        // do something
    }
}
// erroneously assume that i = 0 if the loop was NOT run
if (i % 2 == 0) {
    // do something
}
else {
    // do something
}

But remember that in C/C++, there is a very direct connection between your code and the memory in hardware. Declaring a variable without an initial value will only allocate space for the variable. Until initialized with a value, that space could contain anything! Sometimes it may be zero, so the code will sporadically work as intended. These types of bugs are especially frustrating and difficult to locate. The solution is to always initialize variables upon declaration (unless you are working with a class, in which case you should initialize variables in the constructor).

As another example, don’t enable microcontroller peripherals before setting the relevant configuration registers! In general, never put yourself in a position where you lack absolute certainty of what is happening.

To prevent such mistakes, you must train yourself to become extremely uncomfortable and alarmed whenever anything is in an undefined state. I feel as though I am sitting at a desk on a conveyor belt on a cliff. If something is in an undefined state, the conveyor belt moves towards the edge of the cliff. Naturally, I immediately fix the error so that the conveyor belt moves backwards.

The desk/cliff/conveyor belt situation – generated by ChatGPT.

Conclusion

Ultimately, everyone makes mistakes. Learning from them is what matters. I do not intend to invoke any sort of fear or stress in anybody who reads this, but I do believe in the importance of having a perspective that is at least aware of designing for reliability.

To develop such a mentality, there is absolutely no substitute for thinking carefully, critically, and independently. Instead of blindly assuming something will work as intended, seek a deep level of understanding when you encounter something unfamiliar. Ask “Why does this work?”, or “What if someone changed x to y?”. And try to answer the question yourself.

Being inquisitive is not easy, and it certainly takes time. But I think it’s worth it. You’ll be a better engineer. It’s a lifelong journey that I have only recently begun.