Tuesday, 12 January 2010

Verification controversy

My colleague, Vijay D'Silva, has an exceptional capacity for keeping abreast of literature, debate, and “who's who” in formal verification. He frequently tips off our formal verification group about interesting developments.

Yesterday, he alerted me to a wonderful piece of controversy in the verification world. Moshe Vardi, Editor in Chief of Communications of the ACM, recently published an editorial encouraging wider debate in Computing Science. His editorial starts with a reference to a famous paper of Richard A. De Millo, Richard J. Lipton and Alan J. Perlis: “Social Processes and Proofs of Theorems and Programs”, which Communications published in 1979. “Social Processes...” takes the view that formal verification is doomed to fail, for a variety of well-argued reasons. Vardi's view is that the article, 30 years, on, looks rather misguided.

This editorial has certainly fulfilled Vardi's aim of sparking off some debate, though actually on Richard Lipton's blog rather than in Communications - the surviving authors of "Social Processes..." have not taken his comments well!!

I hadn't read this famous article, and did so today. It's wonderful – if you haven't read it, then please do so now! While I disagree that the article was misguided, Vardi is right that the verification community has moved on, and some of the arguments in “Social Processes...” do not apply to what we now think of as formal verification. This is not because De Millo et al. were wrong, it's because the goals of formal verification have changed.

I thought I'd discuss two particular points in this area that are important to me, as a proponent of pragmatic formal verification.

What we want to verify has changed

Back in 1979, the goal of verification was to show that a program “is correct”, i.e. meets its specification. The idea was that the requirements a program should satisfy would be formalised, then the program, as a monolithic unit, would be shown logically to satisfy the formal properties, via a formal proof.

There are many practical problems associated with doing such a proof, whether manually or automatically, even if the program's requirements have been formalised. However, the fundamental problem is the difficulty of formalising whole-program requirements for large pieces of software. This simply cannot be done for large applications like compilers or word processors. For simpler programs, a formalisation of requirements may be possible, but the formalisation might be extremely complex and hard-to-understand. It may be necessary to use temporal logic to state properties, in which case, in my opinion, the game is up: even moderately complex formulae in temporal logics are virtually impossible to understand intuitively.

The modern, pragmatic standpoint on this issue is much simpler. Limit the scope of formal verification. Instead of aiming to prove whole-program correctness, just try to determine whether an assertion at a particular program point can be violated. The crucial point here is that an assertion states something unambiguous about a low-level, but important, detail of a program. For example, that a value should remain within a certain range, or that two boolean flags should never simultaneously be true. Modern programmers write assertions all the time. Of course, they cannot be used to specify whole-program correctness for complex systems, but as De Millo et al. rightly point out, the notion of whole-program correctness is a dream anyway. Unfortunately, checking assertions is still algorithmically impossible in general. Nevertheless, automated techniques can go a long way in this area.

What we expect a verifier to tell us has changed

In the “Disbelieving Verifications” part of the paper, the authors argue that, if the programmer gets the message 'VERIFIED', from an automatic verification tool (if such a tool could indeed be built), this “mystical stamp of approval” would tell them very little. However, the authors note that in practice, the programmer would never get this answer: for, say, a 20,000 line program, they would always get 'NOT VERIFIED', since no sizeable program is ever perfect.

The techniques of 1979 did indeed aim simply to decide whether or not a program was correct. Many modern techniques do more than this, particularly model checking. The strength of a model checker is that, when asked to verify say whether an assertion in a C program can ever be false, if the answer is “yes it can”, the tool also says “and here's how: ...”, and reports an execution sequence illuminating the bug. Unlike a mechanical correctness proof, these counterexamples can be understood by programmers, who can use the counterexamples to debug their program.

If a programmer uses a verifier to reveal bug after bug in their program, via counterexamples, and then, after 20 bugs have been fixed, the verifier finally reports 'VERIFIED', then this stamp of approval does not seem so mystical. The programmer's confidence in the program has increased due to all the fixed bugs. The verification tool proved its worth by managing to find those bugs, and the fact that it cannot find any more suggests that something is right about the program!

Counterexamples relate to another major point in “Social Processes...”: that when a mathematician does a proof, they leap up and say “I've got it!”, and run and tell their colleagues, to whom they try to justify the proof. This leads to discussion, perhaps to a refutation of the proof, or increased confidence in it. This discussion and excitement is at the heart of the social process of mathematics. However, “formal verifications”, i.e. the sequences of logical steps used to prove a program correct, do not generate such excitement and do not form part of a social process: the results of one formal verification are of no interest to software practitioners in general, and one verification result provides no general insight into the correctness of whole classes of programs, nor even to syntactically similar programs.

However, one can get excited about counterexamples. When I take the time to understand a counterexample, the counterexample surprises me. I think: “Ah, I see!!!”. My assumptions about aspects of my algorithm are shattered and need to be re-thought. Sometimes I am so excited that I do run and tell my colleagues. (Sometimes I get so excited that I recount stories of nasty bugs I have found to my wife, who unfortunately has no idea what I'm talking about and quickly tries to change the subject!) From a given counterexample, one can think of a simpler counterexample representing the same bug. Then one can extract the essence of the bug which the counterexample represents. If this class of bugs proves very important, this might lead to designers of verification tools tuning their algorithms to be super-aware of bugs of this nature, so that their tools can quickly report easy-to-understand counterexamples. This is a social process.

In summary, it is true that formal verification has had success over the last 30 years, particularly due to advances in model checking. However, I think it's a massive leap for Vardi to suggest that formal verification has been so successful that it rebuts the many compelling arguments in De Millo, Lipton and Perlis's paper. It's fairer to say that the goals of verification have changed. And we in the verification community should not get too carried away with what we have achieved. Software developers are not yet using verification tools on a regular basis. They are not built into mainstream compilers like GCC. There is a long way to go.


  1. Thanks for your comments. Thanks for reading the paper--I think we did guide rather than misguide, but thanks anyway.

  2. "I think it's a massive leap for Vardi to suggest that formal verification has been so successful that it rebuts the many compelling arguments in De Millo, Lipton and Perlis's paper."

    Please do not attribute to me claims I did not make. Note that I did not conduct a thorough analysis of the 1979 article. Many of the arguments offered in the article are indeed as compelling today as they were 30 years ago. The major thrust of the article is, however, that formal verification, as an activity, and by implication, as a research direction, is futile. It is this thrust that I commented on.

    See also comments I posted on DeMillo's and Lipton's blogs.

  3. "The major thrust of the article is, however, that formal verification, as an activity, and by implication, as a research direction, is futile."

    I certainly agree with you that research into formal verification has been far from futile, and is still very fruitful.

    I'm really pleased that your editorial comment kicked off this discussion - it is very interesting; thank you!