Bug-free programming, what could we do for it?

Modern programming communities appear superstitious when it comes to writing bug-free or safe computer software. For example the search by "safe programming" gives you Toby Ho's blog post (2008) containing fairly traditional viewpoints within programming communities:

First, let me define what I mean by safe: the earlier a programming language catches a programming error for you, the safer it is. Haskell is extremely safe, whereas php is extremely unsafe.

The ideal appears to be that the programming error gets catched when it is being made. Why? Wouldn't it be much more important to catch the errors in the first place?

This bothers me because I see that dynamically typed programming environments have a lot of potential that remains unsought. I do believe that in the future the mainstream programming languages are going to be entirely dynamic, although the programs commonly run are going to be statically typed.

Additionally I would argue that the "type safety" is an oxymoron term for there is no safety in types. Likewise the type theory or even theorem proving does not ensure that your programs are correct. The research of study of these things is worthwhile, but the things produced there are merely tools and do not ensure correctness in any amount unless they are applied in the right place.

The idea that a good type system, or a theorem prover alone can make programming safe resembles the practice of medieval medicine.

We need a some kind of model to study this problem in more detail. Otherwise it'll be static vs. dynamic typing rhetoric pingpong all the way down.

The model for writing safe software

We want to really understand what a software bug or a programming error is. To do it we have to make a definition. It must be logically valid definition that allows analytic conclusions. Otherwise we might not retrieve any new insights about the model we are studying.

A programming error has occured when the software does not work like the software author intended it to work.

The ideal situation for us would be that the author would always write bug-free software. Aside some simple examples this will not happen. Though it would be useful to get close to the ideal.

Tolerances are a common concept at engineering. Tools available to us cannot produce two entirely identical parts. Electronics and parts have tolerances on dimensions, aging, electrical properties and on to temperatures where they stay undamaged. Everything works as long as the tolerances are satisfied.

Therefore it makes sense to define safe software to mean the kind of software that you can trust to be bug-free within some given tolerances.

The tolerance should be a familiar concept for a programming, already for the simple reason that many useful computations are approximations and come with limits on tolerances and precision.

We obtain two concepts:

The program is likely to have some behavior that we do not care about. In this sense both the model and computer program may be with either open or closed specification. Here's an illustration of the system:

illustration 0

The first observation

The program is an expression of the model that exists in the programmer's head.

Observe that the separation between the model in author's mind and the computer program cannot be left out. Due to limitations in bandwidth of how well you can translate your ideas to programs, It is likely that these two will never be the exactly same.

Note that it is very difficult to increase the bandwidth by which ideas can be turned into programs. If you had infinite bandwidth then you could transfer everything that you know into the program.

The bandwidth is loosely dependent on how fast you can think and write. It is directly associated to how costly it is to produce computer programs.

Note that it is relatively straightforward to create systems where we eat bandwidth to increase safety. Ideally we would want to find an approach where we get high productivity with safety.

The second observation

How do you know that the program is correct and that it works like the author intended it to work?

You can only know what you can observe, therefore if you directly observe a bug that violates the model, then you immediately know that the program is not correct.

Therefore, if you can never observe an event that violates the model, then you have safe software.

This may seem useless at first, after all at first sight you are not able to verify the absence of bugs here. Also nothing is said that we don't already know.

What is nice though, is that this definition of the bug does not change. It does not matter in which way you observe that the event can occur during evaluation.

The third observation

Observe that aside the computer program being incorrect, it is possible for the idea to be invalid as well.

Therefore there appears to be two categories of bugs identified by this model.

The first kind of bugs represent direct failures that you immediately can know to be incorrect. eg. Something like 1+1=3.

The second kind of bugs represent failures of the mind. eg. You have some idea of how to implement security, but in the end someone is still able to retrieve the root credentials.

illustration 1

To fix the first kind of bugs, you have to fix the software. To fix the second kind of bugs, you have to fix your idea of the software.

In the illustration I also drew the environment. This is an important detail because programs are commonly interactive with their environment. The events produced by the environment change the events that you observe.

Fourth observation

There are things that you know, and things that you do not know.

To produce safe software within this kind of model, you have to ensure that:

If the two conditions above succeed for some set of tolerances, then you have safe software that works within those tolerances.

Can it be done?

I think for evaluating feasibility of obtaining safe software, we can ignore the first category of bugs entirely if we focus on the second category bugs immediately. This may even make sense because the first category bugs only occur when the program does not match with the idea in the head.

The bugs in the second category are far much more interesting because they require you to identify that your understanding of how the program works is invalid.

Given that you know the event of the bug, you may conduct a static analysis that determines how likely it is that the event will not occur. To make use of this you have to identify all the bugs that you want to avoid in your program.

The three open, interesting, questions

There are several questions that I didn't manage to satisfiably answer.

illustration 2

How to find all the potential occurrences of events that prove that your idea of the program is invalid?

Given your idea of how the program should work, how to reliably enumerate all the bugs that may occur in the program?

Additionally have we already maximized how much program safety we could get without sacrificing other good qualities of our programming languages?

I do think that an answer to writing safe software would include answers to some variety of these questions. Without having those answers, you cannot prove that the software is bug free, even if you could prove some static qualities about the program.