CRELLVM: Verified Credible Compilation for LLVM

Jeehoon Kang∗ Yoonseung Kim† Youngju Song∗
{jeehoon.kang, yoonseung.kim, youngju.song}@sf.snu.ac.kr
Seoul National University, Korea

Sungkeun Cho Joonwon Choi
skcho@ropas.snu.ac.kr joonwconc@mit.edu
Seoul National University, Korea MIT, USA

Juneyoung Lee Sanghoon Park
{juneyoung.lee, sanghoon.park}@sf.snu.ac.kr
{dongyeon.shin, yonghyun.kim}@sf.snu.ac.kr
Seoul National University, Korea

Chung-Kil Hur† Kwangkeun Yi
gil.hur@sf.snu.ac.kr kwang@ropas.snu.ac.kr
Seoul National University, Korea

Abstract

Production compilers such as GCC and LLVM are large complex software systems, for which achieving a high level of reliability is hard. Although testing is an effective method for finding bugs, it alone cannot guarantee a high level of reliability. To provide a higher level of reliability, many approaches that examine compilers’ internal logics have been proposed. However, none of them have been successfully applied to major optimizations of production compilers.

This paper presents Crellvm: a verified credible compilation (or equivalently, verified translation validation) framework for LLVM, which can be used as a systematic way of providing a high level of reliability for major optimizations in LLVM. Specifically, we augment an LLVM optimizer to generate translation results together with their correctness proofs, which can then be checked by a proof checker formally verified in Coq. As case studies, we applied our approach to two major optimizations of LLVM: register promotion (mem2reg) and global value numbering (gvn), having found four miscompilation bugs (two in each). This result is notable because, to the best of our knowledge, no previous systematic approaches including random testing have found any bugs in the mem2reg and gvn passes. Moreover, except for the two bugs we have reported, we found only one confirmed miscompilation bug for mem2reg in the LLVM bug tracker history.

1 Introduction

Production compilers such as GCC and LLVM are large complex software systems, for which achieving a high level of reliability is hard. Their complexity comes in two fold. First, to generate efficient target code, they perform various complex optimizations. Second, to consume less time and memory during compilation, they are usually written in C/C++ using sophisticated data structures. Due to such complexity, it is hard to make mainstream compilers very reliable.

Although testing is an effective method for finding bugs, it alone cannot guarantee a high level of reliability. For example, recent random testing tools such as Csmith [41] and EMI [16] have shown their effectiveness by finding hundreds of bugs in GCC and LLVM. However, they are hardly sufficient to provide a high level of reliability guarantee since they treat compilers as black boxes without examining their internal logics.

In order to provide a higher level of reliability, many approaches that examine compilers’ internal logics have been proposed, none of which, however, have successfully applied to major optimizations of production compilers. For example, verified compilers such as CompCert [17] are written and verified inside a proof assistant such as Coq and thus guarantee their formal correctness. However, this approach is not readily applicable to production compilers since they are mostly written in C/C++, not in the language of a proof assistant. As another example, Alive [20] is a domain-specific language (DSL) in which one can manually write a compiler’s optimization logic and automatically verify its correctness or else generate a counter example. Though this approach has been successfully applied to LLVM, its application is limited to peephole optimizations because it is hard to faithfully translate the implementation of complex optimizations into Alive and, more importantly, Alive does not support cyclic control flows such as loop. As the last example, the credible compilation [8, 23, 24, 32] and verified translation validation [7, 11, 31, 38–40] approaches augment compilers to generate translation results together with their correctness proofs, which can then be checked by a (verified) proof checker. Since a correctness proof is generated and checked at each compilation time, it provides a formal correctness guarantee for the particular translation or else find a bug (either in the compiler code or in the proof generation code). However, there has been only a preliminary attempt to apply these approaches to production compilers so far. (See §8 for detailed comparison.)

This paper presents Crellvm: a verified credible compilation framework for LLVM, which can be used as a systematic
way of providing a high level of reliability for major optimizations in LLVM. Specifically:

1. We design and develop a logic and its proof checker for reasoning about LLVM optimizations, called Extensible Relational Hoare Logic (ERHL), in the proof assistant Coq.
2. We fully verify a semantics preservation result for our proof checker in the style of CompCert using the Coq formalization of LLVM IR (Intermediate Representation) from the Vellvm project [43].
3. As case studies, we wrote proof generation codes (213 and 413 SLOC\(^1\) in C++) for two major optimizations: register promotion (\texttt{mem2reg} pass) and global value numbering (\texttt{gvn} pass). Then we performed validation of the two optimizations for standard benchmarks and five large open-source projects.
4. As a result, we found four miscompilation bugs (two in each optimization). This result is notable because no previous systematic approaches including random testing have found any bugs in the \texttt{mem2reg} and \texttt{gvn} passes. Moreover, except for the two bugs we have reported, we found only one confirmed miscompilation bug for \texttt{mem2reg} in the LLVM bug tracker history\(^2\), which was reported in 2005 — in the early days of LLVM. Also, one of the bugs we found in \texttt{mem2reg} had not been found for 7 years even though it could miscompile realistic programs (see Appendix B\(^3\)).

1.1 Overview of Crellvm

\textbf{Framework} The framework of Crellvm works as follows. First, as shown in Fig. 1, we separate the compilation and validation phases. For compilation, as depicted in the left side of Fig. 1, we use the original optimizer to translate the source IR code \texttt{src.ll} to the target IR code \texttt{tgt.ll}. After the compilation, we can conduct validation, as depicted in the right side of Fig. 1. For this, we first run the optimizer extended with a proof generation code that produces the target \texttt{tgt'.ll} together with the proof \texttt{Proof}. Then the proof checker validates \texttt{Proof} to see whether \texttt{src.ll} is correctly translated to \texttt{tgt'.ll}. If the validation fails, we can see a logical reason for the failure, with which we can find a bug either in the compiler or in the proof generation code. If the validation succeeds, we finally compare \texttt{tgt'.ll} and \texttt{tgt'.ll} using the LLVM IR comparison tool \texttt{llvm-diff}.

There are two points to note about the framework. First, we explicitly distinguish the original compiler from the proof generating one because the former is what the users will use and the latter is what the compiler developers will use to increase the reliability of the former. Second, \texttt{llvm-diff} essentially performs alpha-equivalence checking, which is necessary because while \texttt{tgt'.ll} may have unnamed IR registers, \texttt{tgt'.ll} has explicit names for all registers for a proof generation purpose.

\textbf{ERHL and Proof Checker} For validation in Crellvm, we develop ERHL, which is a variant of relational Hoare logic [8] specialized for LLVM IR. The logic and its proof checker is extensible because (i) the logic can be extended with any custom inference rules and (ii) the proof checker can be extended with any custom automation functions that try to fill in the gaps in incomplete proofs by automatically finding appropriate inference rules, like the auto tactic in Coq.

\textbf{Verification of Proof Checker} In the Crellvm framework, the TCB (Trusted Computing Base) includes only the proof checker, the equality checker (\texttt{llvm-diff}) and custom inference rules. In particular, the proof generation code in the compiler is not a part of TCB because any incorrect proof would be invalidated by the proof checker.

We further remove the proof checker and inference rules from the TCB by implementing and verifying them in Coq. Though we currently use the (unverified) standard \texttt{llvm-diff} tool for comparing IR programs, it would be also possible to implement and verify it in Coq.

Note that verification of the proof checker and inference rules matters in practice. First, we found various corner case bugs in our proof checker during its verification. Second, we also found one of our two \texttt{mem2reg} bugs during the verification of inference rules. See the example below.

\begin{verbatim}
p := alloc(); r := *p
foo(r) ~ foo(1 / ((int)G - (int)G))
*p := 1 / ((int)G - (int)G)
\end{verbatim}

Here \texttt{G} is the constant address of a global variable.

To see why this translation is incorrect, suppose that the function \texttt{foo(r)} ignores \texttt{r} and repeatedly prints out 0 without returning to the caller. Then division-by-zero never happens in the source program, while it does in the target. The problem here is that the \texttt{mem2reg} pass assumes that constant expressions never raise any undefined behavior such as division-by-zero, which is not true since 1 / ((int)G - (int)G) forms a valid constant expression in LLVM. Following the logic of \texttt{mem2reg}, we also added such a custom inference rule, which we found unsound during the verification of the rule.

\footnotesize
\begin{itemize}
\item SLOC stands for significant lines of code i.e., ignoring spaces and comments.
\item We examined all miscompilation reports in https://bugs.llvm.org.
\item This technical appendix is submitted as supplementary material.
\end{itemize}

\normalsize

Figure 1. The Crellvm Framework
It is important to note that all the programs in this paper represent LLVM IR programs and we just use C syntax to help understanding. For example, the source program in the above transformation is undefined as a C program but well-defined as an IR program. Thus, the transformation is only unsound as an IR-to-IR transformation. The LLVM community considers such an IR-to-IR miscompilation as a definite bug even when it does not cause any C-to-Assembly miscompilation since it can potentially cause an end-to-end miscompilation for other source languages such as Swift and Rust.

**Results** We wrote proof generation codes for Register Promotion in the `mem2reg` pass and for GVN-PRE in the `gvn` pass; and also partly for Loop Invariant Code Motion in the `licm` pass, and 139 micro-optimizations in the `instcombine` pass in order to demonstrate the generality of ERHL. We then conducted validation of the optimizations for the SPEC CINT2006 C Benchmarks [4], LLVM nightly test suite, and five open-source projects: `sendmail`, `emacs`, `python`, `gimp`, and `ghostscript`, in total 5.3 million LOC in C. The validation failed at only those miscompilation caused by the four bugs we found. We present the details of the `mem2reg` validation in §3 and `gvn` validation in Appendix C.

1.2 Advantages of CReLVM over Testing

CReLVM checks whether optimizations are performed by correct reasoning, while testing simply checks results of the test programs. This can make difference as follows.

First, an optimization performed by incorrect reasoning may be still correct for most programs including all the test programs. In this case, testing cannot uncover the bug, while CReLVM can because it checks the underlying reasoning. For example, we found our first `mem2reg` bug in this situation, which had not been found for 7 years.

Specifically, the following optimization shows the bug.

```
p := alloca()
loop {
    r := *p; foo(r); *p := 42 => foo(undef)
}
```

In the source program, at the first iteration, the register `r` is assigned the `undef` value because the memory cell `*p` is uninitialized, while at the second iteration, `r` is assigned 42. On the other hand, in the target program, `r` is always replaced by `undef`. Of course, this is performed due to faulty reasoning of `mem2reg`. However, it can be correct for most programs because `r` can be sometimes `undef` and thus `foo(r)` is likely to ignore `r` (e.g., by taking an operation like `r & 0x0`). Indeed this is what happened in the SPEC benchmark and why the bug had been hidden for such a long time. Note that it does not mean that this bug does not matter in practice because it could well cause trouble in the real world (see Appendix B).

Second, a potential flaw introduced by miscompilation may not be exploited by the current compiler and silently disappear during the compilation. Also in this case, CReLVM can detect the bug because it checks the underlying reasoning. For example, we found the two `gvn` bugs in this situation, which had not been found for 8 years. Note that the two bugs are caused by the same reason but we counted them as two because they appear in two separate places.

Specifically, the following optimization shows the bug.

```
q1 := *(p + 10) inbounds q1 := (p + 10) inbounds
q2 := *(p + 10) ~> bar(q1, q2)
```

In the source program, `(p + 10) inbounds` is defined to be `undef` when the index 10 is out of the bounds of `p`, while `(p + 10)` is always defined to be the computed address. Thus replacing `q2` with `q1` introduces more undefinedness, which is incorrect because it can be potentially exploited by subsequent optimizations. However, so far the LLVM compiler has not exploited such undefinedness, thereby causing no observable misbehaviors. Indeed this miscompilation happened many times during validation of the standard benchmarks but testing has failed to detect it.

2 Overview

In this section, we give a more detailed overview of how CReLVM works using the `assoc-add` optimization of the `instcombine` pass as a motivating example.

2.1 Translation Example

We first give an example translation of the `assoc-add` optimization, which is shown in the shaded part of Fig. 2. Here `y := add x 2` is replaced by `y := add a 3` at line 20. This translation can be beneficial because after it, the register `x` may no longer be used and thus `x := add a 1` at line 10 may be eliminated later. This translation is also sound because (i) the assertion “`x = add a 1`” holds throughout lines 10-20, since the registers `a` and `x` are not redefined between line 10 and 20 thanks to the Static Single Assignment (SSA) property [10]; and (ii) from this, we can infer that `add x 2 = add (add a 1) 2 = add a 3` holds at line 20.

2.2 Proof Validation

We now construct a proof for the `assoc-add` translation example and validate it in ERHL.
Figure 2. Validation of an assoc–add translation in ERHL

**ERHL Proof** A formal proof of the translation is given in the box of Fig. 2. Specifically, the proof consists of a set of assertions and a list of inference rules at each line. For example, at line 20, the set of assertions is \( \{ \text{MD}(0) \} \) and the list of inference rules is \( \{ \text{assoc}_\text{add}(x_{\text{src}}, y_{\text{src}}, a_{\text{src}}, 1, 2), \text{reduce}_\text{maydiff}(y) \} \).

This ERHL proof captures the assertion and the inference step of the intuitive reasoning above. First, the assertion \( \text{MD}(0) \) at every line states that every register contains the same value in the source and target program states. Second, the additional assertion \( x_{\text{src}} = \text{add} a_{\text{src}} 1 \) between line 10 and line 20 states that in the source state, the value of the register \( x \) is equal to the result of \( \text{add} a 1 \). Finally, the inference rules \( \text{assoc}_\text{add}(x_{\text{src}}, y_{\text{src}}, a_{\text{src}}, 1, 2) \) and \( \text{reduce}_\text{maydiff}(y) \) at line 20 are those that need to be applied for validation at line 20. The details of the rules will be given later when we discuss the validation process.

**ERHL Assertions** Before we proceed to the validation of the proof, we discuss ERHL assertions in more details. An ERHL assertion is a triple \( (S, T, M) \), where \( S \) is a set of assertions that should hold for the source state; \( T \) is for the target state; and \( M \) is an assertion relating the source and target states.

First, the source and target assertions, \( S \) and \( T \), can contain various forms of predicates. For example, \( x_{\text{src}} = \text{add} a_{\text{src}} 1 \) is a source assertion and \( x_{\text{tgt}} = \text{add} a_{\text{tgt}} 3 \) is a target assertion. Here and henceforth, \( x_{\text{src}} \) and \( x_{\text{tgt}} \) represent the values of the register \( x \) in the source and target states, respectively.

Though we only use the equality predicate for assoc–add, we will introduce various other predicates later. It is important to note that we do not allow any general assertion relating the source and target states such as \( x_{\text{src}} = y_{\text{tgt}} + 1 \).

Second, the relational assertion \( M \) is a set of registers, called maydiff set, that may contain different values in the source and target states. In other words, all the registers not in \( M \) should have the same value in the source and target states, which we denote by \( \text{MD}(M) \):

\[
\text{MD}(M) \iff \forall x \notin M. x_{\text{src}} = x_{\text{tgt}}.
\]

Note that the maydiff set is the only form of relational assertion relating the source and target states.

Finally, every ERHL assertion implicitly imposes that the public parts of the source and target memories should be equivalent. More precisely, we use the CompCert-style memory injection relation [19]. Later we introduce predicates that allow private memory allocations that do not belong to the public part of the memory (see §3.2).

**Proof Validation** The gray text in Fig. 2 shows the validation process performed by the ERHL proof checker, which proceeds as follows.

First, the proof checker checks that the initial assertion holds for all possible initial states. It accepts the initial assertion \( \{ \text{MD}(0) \} \) in Fig. 2 since the source and target states are initially equivalent.

Second, the proof checker checks whether the Hoare triple \( (P) I_{\text{src}} \sim I_{\text{tgt}} (Q) \) at each line is valid. This means that the assertion \( Q \) after the line holds for all program states resulted by executing the source and target instructions \( I_{\text{src}} \) and \( I_{\text{tgt}} \) at the line under any program states satisfying the assertion \( P \) before the line. In Fig. 2, we only explain validations at lines 10 and 20 because the others are trivial.

At line 10, the proof checker first computes a strong post-assertion, \( \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, x_{\text{tgt}} = \text{add} a_{\text{tgt}} 1, \text{MD}(0) \} \), using our post-assertion computation algorithm. Here, the algorithm simply adds the equality predicates corresponding to the executed instructions. Then, the assertion after line 10, \( \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, \text{MD}(0) \} \), follows from the computed strong post-assertion by a simple inclusion check.

At line 20, the checker also computes a strong post-assertion, \( \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, y_{\text{src}} = \text{add} a_{\text{src}} 2, y_{\text{tgt}} = \text{add} a_{\text{tgt}} 3, \text{MD}(y) \} \). Here, the post-assertion computation adds the equality predicates corresponding to the executed instructions and also adds the register \( y \) to the maydiff set because the executed source and target instructions are not identical. Then, the
proof checker applies the inference rules given by the proof. The rule \text{assoc-add}(x, y, a, 1, 2) derives \( y = \text{add} a \) from \( x = \text{add} a \) and \( y = \text{add} x \) by associativity:
\[
\frac{x = \text{add} a \ C_1 \quad y = \text{add} x \ C_2}{C = C_1 + C_2}
\]
The rule \text{reduce-maydiff}(y, e) removes the register \( y \) from the maydiff set because \( y = \text{add} a \) and \( a \) is not in the maydiff set:
\[
y = \text{maydiff}(y, e) \quad e_{y_{gt}} = y_{tgt} \quad \text{no registers in} \ e \ \text{are in the maydiff set}
\]
remove \( y \) from the maydiff set

Then, the assertion after line 20, \{ MD(\emptyset) \}, easily follows by a simple inclusion check.

Finally, the proof checker checks whether the same observable events (i.e., the same sequence of system calls) are produced at each line. It is the case in Fig. 2 because at line 20, no observable events are produced; and at the other lines, the source and target instructions are identical and the maydiff sets are empty implying that the source and target states are equivalent. In particular, at line 21, the proof checker explicitly checks that the same value is passed to the function \text{foo} because the function may produce observable events.

### 2.3 Proof Generation

Now we explain how we generate proofs for \text{assoc-add}.

\noindent \textbf{Algorithm} Algorithm 1 shows the \text{assoc-add} optimization algorithm implemented in LLVM’s \text{instcombine} pass, which is presented in a rather functional style for presentation purposes. Specifically, \text{AssocAdd}(F) optimizes each function definition \( F \) as follows (ignore the boxes for now, which are the proof generation code).

- **[Line A1]:** Find an instruction of the form \( l_2: y := \text{add} x C_2 \) with \( C_2 \) constant. In Fig. 2, 20: \( y := \text{add} x 2 \) can be picked.
- **[Line A2]:** Check if \( x \) is defined by an instruction of the form \( l_1: x := \text{add} a C_1 \) with \( C_1 \) constant. Here, \text{FindDef}(F, x)\footnote{The instruction that defines \( x \) is unique thanks to the SSA property.} finds the instruction that defines the register \( x \).\footnote{The instruction that defines \( x \) is unique thanks to the SSA property.} In Fig. 2, 10: \( x := \text{add} a 1 \) is picked. **[Lines A3-A4]:** If it is the case, compute the constant \( C = C_1 + C_2 \) and replace the instruction at \( l_2 \) with \( y := \text{add} a C \). In Fig. 2, the instruction at line 20 is replaced by \( y := \text{add} 3 \).

\noindent \textbf{Proof Generation} Once we understand the \text{assoc-add} optimization algorithm, it is quite straightforward to write the proof generation code given in the boxes of Algorithm 1.

- **[Line A5]:** Add the assertion \( x_{src} = \text{add} a_{src} C_1 \) at every line between \( l_1 \) and \( l_2 \). In Fig. 2, the assertion \( x_{src} = \text{add} a_{src} 1 \) is added at every line between 10 and 20. **[Line A6]:** Add the inference rule \text{assoc-add}(x, y, a, src, C, 1, 2) at line \( l_2 \). In Fig. 2, the rule \text{assoc-add}(x, y, a, src, 1, 2) is added at line 20. **[Line A9]:** Enable the custom automation function named \text{reduce-maydiff}, which tries to find and insert appropriate \text{reduce-maydiff} rules when necessary. In Fig. 2, it figures out that \text{reduce-maydiff}(y) is needed at line 20.

\noindent \textbf{Automation} An automation function works as follows. Given a pair \( (Q, Q') \) of ERHL assertions, it examines the assertions \( Q \) and \( Q' \), and tries to find inference rules \( I \) such that the result of applying \( I \) to \( Q \) trivially implies \( Q' \).

It is important to note that automation functions do not need to be verified (i.e., not a part of TCB) because all they do is inserting inference rules, which is a part of proof construction, not that of proof checking.

### 3 Register Promotion

Register promotion optimization, the \text{mem2reg} pass of LLVM, transforms memory accesses to locally allocated memory locations into register accesses, provided that the memory location is only used for loads and stores (i.e., never copied or escaped). This translation is important because register accesses are cheaper than memory accesses, and are subject to further optimizations.

The optimization also performs the SSA transformation so that the target program has the SSA property. This transformation is necessary because there can be statically multiple stores to a single location, and just transforming them to writes to a single register would break the SSA property.

In this section, we show how we generate and validate proofs for the \text{mem2reg} optimization.

#### 3.1 Translation Example

The shaded part of Fig. 3 shows an example translation of the \text{mem2reg} optimization, where all memory accesses via \( p \) is promoted to register accesses to \( p \) and uses of 42 and \( x \). Note that \( c, x, \) and \( q \) are the function parameters.

More specifically, the allocation, load and store instructions to \( p \) are removed (ignore \text{nop} for now), and every use of the result of a load from \( p \) is replaced by the value stored in \( *p \) at the time of the load. For example, in Fig. 3, the compiler figures out that \( *p \) contains 42 at line 20 (as long as the register \( a \) due to the store of 42 in \( *p \) at line 11, and thus replaces the use of a with 42 at line 21. This translation is
We show how to turn the intuition for soundness into a formal proof in order to state relational properties (i.e., relating the source and target states) in a proof. For example, in Fig. 3, we need to state \( p_{src} = p_{tgt} \) before line 40, which relates a value in the source \( p_{src} \) with that in the target \( p_{tgt} \).

Though not directly supported in ERHL, such relational properties can be encoded using ghost registers. Specifically, we can encode \( e_{src} = e'_{tgt} \) using a fresh ghost register \( \hat{g} \):

\[
\{ e_{src} = \hat{g}, e_{tgt} = e'_{tgt} \mid \text{MD}(M) \} \quad \text{with} \quad \hat{g} \notin M
\]

Since the ghost register \( \hat{g} \) is not in the maydiff set, we have \( \hat{g}_{src} = \hat{g}_{tgt} \), which, by transitivity, implies \( e_{src} = e'_{tgt} \). For example, in Fig. 3, the assertion \( \{ p_{src} = p_{tgt}, \hat{g}_{tgt} = p_{tgt} \} \) before line 40 effectively states \( p_{src} = p_{tgt} \).

Note that 1nop is logical because it is absent from the real IR code and used only for validation purposes. During validation, it is interpreted as doing nothing (i.e., no-op).

**Ghost Registers for Relational Assertions** For complex optimizations, we often need to state relational properties (i.e., relating the source and target states) in a proof. For example, in Fig. 3, we need to state \( p_{src} = p_{tgt} \) before line 40, which relates a value in the source \( p_{src} \) with that in the target \( p_{tgt} \).

Though not directly supported in ERHL, such relational properties can be encoded using ghost registers. Specifically, we can encode \( e_{src} = e'_{tgt} \) using a fresh ghost register \( \hat{g} \):

\[
\{ e_{src} = \hat{g} \mid \text{MD}(M) \} \quad \text{with} \quad \hat{g} \notin M
\]

Since the ghost register \( \hat{g} \) is not in the maydiff set, we have \( \hat{g}_{src} = \hat{g}_{tgt} \), which, by transitivity, implies \( e_{src} = e'_{tgt} \). For example, in Fig. 3, the assertion \( \{ p_{src} = p_{tgt}, \hat{g}_{tgt} = p_{tgt} \} \) before line 40 effectively states \( p_{src} = p_{tgt} \).

Note that the ghost register \( \hat{g} \) has nothing to do with the physical register \( p \) and we use \( \hat{g} \) for ghost registers to distinguish them from the physical ones.

Ghost registers are logical ones that do not exist in physical program states. Instead, they are existentially quantified in the semantics of ERHL assertions. More specifically, a pair of source and target states \( (\sigma_{src}, \sigma_{tgt}) \) satisfies an ERHL assertion \( P \), if there exists a pair of source and target ghost register files \( (r_{src}, r_{tgt}) \) such that the pair of \( \sigma_{src} \) extended with \( r_{src} \) and \( \sigma_{tgt} \) extended with \( r_{tgt} \) satisfies \( P \).

Taking ghost registers into account, the proof in Fig. 3 has five relational assertions: \( p_{src} = 42_{tgt} \) between line 11 and the end of \( B_{left} \), \( a_{src} = 42_{tgt} \) between line 20 and line 21, \( p_{src} = x_{tgt} \) between line 30 and the end of \( B_{right} \), \( p_{src} = p_{tgt} \) between the beginning of \( B_{right} \) and line 41, and \( b_{src} = p_{tgt} \) between line 40 and line 41. It is easy to see that these assertions correctly capture the relational properties caused by executing different instructions in the source and target.

**Uniqueness Predicate for Isolation** We can use the predicate \( \text{Uniq} \) in order to state that an address is completely isolated. For example, in Fig. 3, we have \( \text{Uniq}(p_{src}) \) at every line. It means that in the source, if \( p \) contains an address \( \ell \), (i) \( \ell \) is not aliased with any address stored in the other registers and the memory (i.e., they point to disjoint memory blocks); and (ii) \( \ell \) is private (i.e., it is not in the public part of the memory injection) meaning that it has no corresponding equivalent address in the target. In other words, the address contained in \( p \) should point to a completely isolated block.

Note that ERHL also supports memory predicates weaker than \( \text{Uniq}(p) \): (i) the private predicate, \( \text{Priv}(p) \), which states that the address in \( p \) is private; and (ii) the noalias predicate, \( p \perp q \), which states that the addresses in \( p \) and \( q \) point to disjoint memory blocks.

**Logical No-Ops for Instruction Alignment** Logical Noops, denoted 1nop, can be inserted as a part of proof in order to align source and target instructions when their alignment is broken by a translation. For example, in Fig. 3, 1nop is inserted at lines 10, 11, 20, 30, 40 because the instructions there are removed by mem2reg.
Maydiff Sets  Finally, we have MD({p, p1, a, b}) at every line because these registers are removed or introduced so that they have different values in the source and target.

3.3 Proof Validation

We show how our proof checker validates the ERHL proof.

Entry  The proof checker checks that the entry assertion, \{(Uniq(p_{src}), MD\{p, p1, a, b\})\}, holds for initial states. It accepts the assertion Uniq(p_{env}) since p is a local register and thus contains the undef value initially, which is not an address. It also accepts every maydiff set since the source and target registers initially contain equivalent values.

Allocation of the Promoted Location  At line 10, the proof checker allows an allocation, p := alloc(a), in the source and 1nop in the target. In this case, it computes a post-assertion from the pre-assertion by (i) removing all assertions containing p_{src} because p_{src} is updated, (ii) adding \{Uniq(p_{src}), *p_{src} = undef\} because p contains a newly allocated address, and then (iii) adding p to the maydiff set. Thus, we have \{Uniq(p_{src}), *p_{src} = undef, MD\{p, p1, a, b\}\}, from which the assertion after line 10 trivially follows.

Stores to the Promoted Location  At line 30 (and similarly at line 11), the proof checker allows a store, *p := x, in the source and 1nop in the target because *p_{src} is private (i.e., has no corresponding target address) due to Uniq(p_{src}) in the pre-assertion. In this case, it computes a post-assertion by (i) removing all and only the assertions containing *p_{src} because *p_{src} is updated and p_{src} has no alias with any other address due to Uniq(p_{src}), and then (ii) adding \{*p_{src} = x_{src}\}. Thus, we have \{Uniq(p_{src}), *p_{src} = x_{src}, MD\{p, p1, a, b\}\}.

At this point, the proof gives the rule intro_ghost(\hat{\beta}, \hat{x}) which first makes \hat{\beta} fresh by removing all assertions about \hat{\beta} and removing \hat{x} from the maydiff set and then adds \{x_{src} = \hat{\beta}_{src}, \hat{\beta}_{tgt} = x_{tgt} \} when x is not in the maydiff set. Thus, we have \{Uniq(p_{src}), *p_{src} = x_{src}, \hat{x}_{src} = \hat{\beta}_{src}, \hat{\beta}_{tgt} = x_{tgt}, MD\{p, p1, a, b\}\}. Then, the proof gives the rule transitivity(*p_{src}, x_{src}, \hat{\beta}_{src}), which derives *p_{src} = \hat{\beta}_{src} from *p_{src} = x_{src} and x_{src} = \hat{\beta}_{src}. Then the assertion after line 30 trivially follows. (See Appendix I for the definitions of intro_ghost and transitivity.)

Phinodes  At the phinode of B_{exit}, the proof checker validates the assertion separately for each incoming block. For the incoming block B_{left}, the proof checker computes a post-assertion by (i) removing all assertions containing p1_{tgt} because p1_{tgt} is updated, (ii) adding 42 = p1_{tgt} because p1 := 42 is executed in the target when control comes from B_{left}, and then (iii) adding p1 to the maydiff set. Then the proof gives the inference rule transitivity(\hat{\beta}_{tgt}, 42, p1_{tgt}) which derives \hat{\beta}_{tgt} = p1_{tgt} from which the assertion after the phinode follows trivially. For the incoming block B_{right}, validation succeeds similarly, where the proof gives the inference rule transitivity(\hat{\beta}_{tgt}, x_{tgt}, p1_{tgt}).

Note that for presentation purposes here we simplified the post-assertion computation for phinodes. ERHL performs a more general version to handle cyclic control flows (see §4).

 Loads from the Promoted Location  At line 40 (and similarly at line 20), the proof checker allows a load, b := *p, in the source and 1nop in the target. In this case, it computes a post-assertion by (i) removing all assertions containing b_{src} because b_{src} is updated, (ii) adding b_{src} = *p_{src} and then (iii) adding b to the maydiff set. Thus, we have \{Uniq(p_{src}), *p_{src} = \hat{\beta}_{src}, \hat{\beta}_{tgt} = p1_{tgt}, b_{src} = *p_{src}, MD\{p, p1, a, b\}\}.

At this point, the proof gives the rule intro_ghost(\hat{\beta}, \hat{\rho}) which adds \{\hat{\rho}_{src} = b_{src}, \hat{\rho}_{tgt} = \hat{\beta}_{tgt}\} because \hat{\rho} is not in the maydiff set. Then the proof gives appropriate transitivity rules, which derives b_{src} = *p_{src} = \hat{\beta}_{src} = b_{src} and b_{tgt} = \hat{\beta}_{tgt} = p1_{tgt}, from which the assertion after line 40 trivially follows.

Equivalence Checking  At lines 21, 31, and 41, the proof checker checks that the behaviors of the source and target instructions are equivalent. Specifically, it checks that equivalent values are passed to the same function (at line 21) and stored at equivalent public locations (at lines 31, 41) because these can be observed by other functions. These checks succeed thanks to the relational assertions \{(a_{src} = a_{src}, a_{tgt} = 42) at line 21, (b_{src} = b_{src}, b_{tgt} = p1_{tgt}) at line 41\}.

Alias Checking  At lines 21, 31, and 41, the proof checker computes post-assertions using memory alias information. In general, for a function call or store instruction, since it updates the public part of the memory, the proof checker removes all assertions about values stored in memory locations p (i.e., those including *p) unless (i) p is in the private part of the memory (i.e., Priv(p) or Uniq(p)), or (ii) p is not aliased with q (i.e., p \perp q) in case q is updated by the store instruction. At lines 21, 31, and 41, thanks to Uniq(p_{src}), the assertions about *p_{src} are preserved.

Note that in the example of Fig. 3, it suffices to use Priv(p_{src}) instead of Uniq(p_{src}). However, in general when more than one locations are promoted, we need to know that those promoted locations are not aliased with each other, which follows from Uniq(p_{src}) for each promoted location p. Also for the sake of performance, we use Uniq instead of introducing \perp between each pair of promoted locations.

3.4 Proof Generation

LLVM’s mem2reg pass consists of three algorithms: the general register promotion algorithm, a specialized algorithm for the case that the promotable location is stored at most once, and another for the case that the location is used only within a single block. In this section we explain the general algorithm and its proof generation code. Note that we also validate the two specialized algorithms in the same way since they are just degenerate cases.

Algorithm 2 shows the general algorithm implemented in LLVM’s mem2reg pass and the proof generation code, given
Algorithm 2 RegisterPromotion(F:Function)

```plaintext
for \( l_a \): p := al1oca(f) in F if \( p \)'s uses are loads/stores only do
  InsertEmptyPhinodesFor(F, \( p \))
  \( // Add the phinodes to the maydiff set globally \)
  \( \text{Remove}(l_a, \text{Nop}(l_a, tgt), \text{Assn}(\text{Uniq}(p_{src}), \text{MD}(p), \text{global})) \)
  \( \text{Inf}(\text{intro_ghost}(\hat{p}, \text{undef}), l_a) \)
  \( \text{WL} := [(\text{Entry}(F), \text{undef}, [l_a]), \text{MarkVisited}(\text{Entry}(F))] \)
while \( \text{NonEmpty}(\text{WL}) \) do
  \( (B, v, \{ l \} :: \text{WL} := \text{WL}) \)
  \( \text{for } (l_i : i) \in \text{Instr}(B) \text{ do} \)
  if \( i \) is a store instruction (*\( p := w \) then
    \( \text{Remove}(l_i, \text{Nop}(l_i, tgt), \text{Inf}(\text{intro_ghost}(\hat{p}, w), l_i)) \)
    \( v := w, l := l_i \)
  else if \( i \) is a load instruction (*\( x := p \) then
    \( \text{Assn}((\*p_{src} = \hat{p}_{src}, \hat{p}_{tgt} = \text{undef}, l, l_i)) \)
    \( \text{for } (l_j : j) \in \text{Use}(x) \text{ do} \)
      \( \text{Replace}(F, l_j, x, v), \text{Assn}((x_{src} = x_{src}, x_{tgt} = \text{undef}, l, l_j)) \)
    end for
    \( \text{Remove}(l_i, \text{Nop}(l_i, tgt), \text{Assn}(\text{MD}(x), \text{global})) \)
    \( \text{Inf}(\text{intro_ghost}(\hat{x}, \hat{p}), l_i) \)
  end if
  end for
  \( \text{for } B' \in \text{Successor}(B) \text{ do} \)
  if \( B' \) has a phinode \( (z := \hat{\phi}(\cdots)) \) inserted at line \( A2 \) then
    \( z[B] := v, \text{Assn}((\*p_{src} = \hat{p}_{src}, \hat{p}_{tgt} = \text{undef}, l, \text{End}(B))) \)
  if not IsVisited(B') then \( \text{WL} := (B', z, \text{Begin}(B')) :: \text{WL} \)
  else
    if not IsVisited(B') then \( \text{WL} := (B', v, \{ l \} :: \text{WL} \)
  end if
  MarkVisited(B')
  end for
end while
\( \text{Auto(transitivity)} \)
```

in the box, that we inserted. Note that we do not modify the existing compiler code at all and only add the proof generation code. In detail, the overall algorithm including proof generation works as follows.

Promotable Allocation [Line A1]: We find a promotable allocation \( p \) at line \( l_a \). [Line A2]: Then we insert empty phinodes wherever needed\(^{10}\), and add them to the maydiff set globally (i.e., at each line). [Line A3]: We also remove the allocation, insert \( \text{lnop} \) at that line, and add \( \text{Uniq}(p_{src}) \) and \( \text{MD}(p) \) globally. [Line A4]: In addition, we add the rule \( \text{intro_ghost}(\hat{p}, \text{undef}) \) because the initial value \text{undef} in \(*p\) may be used by some load from \(*p\) (though it does not happen in Fig. 3). In that case, the code at line A13 would introduce \( \hat{p}_{src} = \hat{p}_{src}, \hat{p}_{tgt} = \text{undef} \) at line \( l_a \), which will be inferred with the help of \( \text{intro_ghost}(\hat{p}, \text{undef}) \).

For example, in Fig. 3, the empty phinode \( p_1 := \phi(-, -) \) is inserted in \( B_{\text{exit}} \) and \( p_1 \) is added to the maydiff set globally; and then the allocation at line 10 is removed, \( \text{lnop} \) is inserted, \( \text{Uniq}(p_{src}) \) is added and \( p \) is added to the maydiff set globally; and finally \( \text{intro_ghost}(\hat{p}, \text{undef}) \) is added at line 10.

Block Traversal [Lines A5-A7]: We traverse the blocks in DFS order starting from the entry block using the worklist \( WL \). An element of \( WL \) consists of triple \( (B, v, l) \), where \( B \) is the block to visit, \( v \) is the value in \(*p\) at the beginning of \( B \), and \( l \) is the line number where the value \( v \) is stored in \(*p\).

[Line A5]: Initially, we put \( \text{Entry}(F), \text{undef}, l \) in \( WL \) and mark the entry block \( \text{Entry}(F) \) as visited. [Lines A6-A7]: Then we process the blocks in \( WL \) one by one. For example, in Fig. 3, \( B_{\text{entry}}, B_{\text{left}}, B_{\text{exit}}, \) and \( B_{\text{right}} \) are visited in order.

Instruction Traversal [Line A8]: Given a work \( (B, v, l) \), we traverse each instruction \( (l_i : i) \) in the block \( B \) as follows.

Store Instructions [Lines A9-A11]: If \( i \) is a store instruction \(*p := w \) (line A9), then we remove the instruction (line A10) and update \( v \) with the stored value \( w \) (line A11). The proof generation code inserts \( \text{lnop} \), adds \( \text{intro_ghost}(\hat{p}, w) \) (line A10) and updates \( l \) with the store location \( l_i \) (line A11).

For example, in Fig. 3, when \( i \) is 11: \(*p := 42 \), the store \( i \) is replaced by \( \text{lnop} \); \( \text{intro_ghost}(\hat{p}, 42) \) is added at line 11; and \( v \) and \( l \) are updated to be 42 and line 11.

Load Instructions [Lines A12-A18]: If \( i \) is a load instruction \( x := \*p \) (line A12), then we replace all the uses of \( x \) with the current value \( v \) (lines A14-A16), and remove the load instruction (line A17). The proof-generation code adds the relational assertion \(*p_{src} = \text{undef} \) from the store site \( l \) to the load site \( l_i \) (line A13). Then it adds \( x_{src} = \text{undef} \) from the load site \( l_i \) to every use site \( l_j \) (line A15). It also inserts \( \text{lnop} \), adds \( x \) to the maydiff set and adds the rule \( \text{intro_ghost}(\hat{x}, \hat{p}) \) (lines A17-A18).

For example, in Fig. 3, when \( i \) is 20: \( a := \*p \), the load \( i \) is replaced by \( \text{lnop} \); the use of \( a \) is replaced by the current value \( 42 \) at line 21; \(*p_{src} = \text{undef} \) is added from 11 to 20; \( a_{src} = \text{undef} \) is added from 20 to 21; \( a \) is added to the maydiff set globally; and \( \text{intro_ghost}(\hat{a}, \hat{p}) \) is added at line 20.

Successors [Lines A21-A28]: Now we handle the successor (i.e., outgoing) blocks of the current block \( B \). [Line A21]: We traverse each successor block \( B' \) as follows.

- If \( B' \) has a phinode \( (z := \hat{\phi}(\cdots)) \) that is inserted by the code at line A2 (line A22), then we update the phinode \( z \)'s component for the incoming block \( B \) with the value \( v \) of \(*p \) at the end of \( B \) (line A23), and if \( B' \) has not been visited yet, add \( (B', z, \text{Begin}(B')) \) to the worklist \( WL \) (line A24). Since the value \( v \) is used at the phinode \( z \), we add \(*p_{src} = \text{undef} \) from store location \( l \) to the end of \( B \) (line A23).

\(^{10}\)The optimization uses the "dominance frontier" algorithm [10] in order to list up the blocks that require a phinode. We omit the details for brevity.
For example, in Fig. 3, when \((B, B')\) is \((B_{\text{left}}, B_{\text{exit}})\), the phinode \(p1 := \phi(-, -)\) is updated to \(p1 := \phi(42, -)\) and \((B_{\text{exit}}, p1, \text{Begin}(B_{\text{exit}}))\) is added to the worklist \(WL\). Also *\(p_{\text{src}} = 42_{\text{tgt}}\) is added from line 11 to the end of \(B_{\text{left}}\).

- If \(B'\) has no such phinode (line A25), then we simply add \((B', v, l)\) to the worklist \(WL\) if \(B'\) has not been visited yet (line A26). For example, when \((B, B')\) is \((B_{\text{entry}}, B_{\text{right}})\), the triple \((B_{\text{right}}, 42, \text{line 11})\) is added to the worklist.

[Line A28]: Finally the successor \(B'\) is marked as visited.

**Inference Rules** As shown in §3.3, the complete proof for mem2reg contains two inference rules, intro_ghost and transitivity. The intro_ghost rules are explicitly added by the proof generation code shown in Algorithm 2, while the transitivity rules are automatically added by the automation function transitivity (line A32).

## 4 Reasoning about Cyclic Data Dependence

In this section, we show how to reason about cyclic data dependence via phinodes.

**Fold-Phi Optimization** Consider the translation below performed by the fold-phi optimization of instcombine, and its ERHL proof. This translation is performed only when \(x\) and \(y\) are no longer used in the target so that they can be optimized away by subsequent optimizations.

\[
\begin{align*}
B_{1} & \quad \{ \\
10: & \quad x := a + 1 \quad \sim \quad x := a + 1 \quad \text{MD}(t) \}
\end{align*}
\]

\[
\begin{align*}
B_{2} & \quad \{ \\
20: & \quad \text{nop} \quad \sim \quad t := \phi(a, z) \quad \text{MD}(t, z) \}
\end{align*}
\]

\[
\begin{align*}
\{ & \quad z := \phi(x, y) \quad \sim \quad z := \phi(42, z) \quad \text{MD}(t, z) \}
\end{align*}
\]

Note that a set of phinodes can appear at the beginning of a block and are executed simultaneously. For example, in the program source above, when control flows from \(B_{2}\) to itself, the phinodes \(z\) and \(w\) are set to the old values of \(y\) and \(z\) before executing the phinodes, respectively. Also note that there is a cyclic dependence between \(x\) and \(y\).

The key point of the above ERHL proof is that we temporarily lose the equivalence of \(z_{\text{src}}\) and \(z_{\text{tgt}}\) after the phinodes of \(B_{2}\) and recover it after line 20. The proof makes this point in two steps. It first shows that \(z_{\text{src}} = t_{\text{tgt}} + 1\) holds after the phinodes, and then using it, shows that \(z_{\text{src}} = z_{\text{tgt}}\) holds after line 20. The second step holds trivially and the first step holds as follows. When control comes from \(B_{1}\), after the phinodes, we have \(z_{\text{src}} = \bar{x}_{\text{src}} = \delta_{\text{src}} + 1 = \tilde{a}_{\text{tgt}} + 1 = t_{\text{tgt}} + 1\), where \(r_{\text{src}}\) denotes the current value of the register \(r\) after the phinodes and \(r_{\text{src}}\) the old value of \(r\) before the phinodes in the source program; and similarly for \(r_{\text{tgt}}\) and \(r_{\text{tgt}}\).

When control comes from \(B_{2}\), after the phinodes, we have \(z_{\text{src}} = \bar{y}_{\text{src}} = \tilde{z}_{\text{src}} + 1 = \tilde{z}_{\text{tgt}} + 1 = t_{\text{tgt}} + 1\).

**Proof Validation** We show how the ERHL proof checker formally validates the above proof at the most interesting case: the phinodes of \(B_{2}\) when control comes from itself.

First of all, we need to take into account the old values of registers, as discussed above. For this, we use ghost registers again. Specifically, we reserve a set of ghost registers, denoted \(\bar{r}\) and called old registers, for all registers \(r\) to represent the old value of \(r\). Note, however, that old registers are just normal ghost registers and technically have nothing to do with physical old values of the corresponding registers.

Now we see how the proof checker uses those old registers at the phinodes of \(B_{2}\) when control comes from itself. First, it computes a post-assertion from the pre-assertion \(\{ y_{\text{src}} = z_{\text{src}} + 1, \text{MD}(t, \bar{t}) \}\) as follows.

1. It completely removes all old registers from the pre-assertion and copies assertions about current registers into those about old ones. Thus we have \(\{ y_{\text{src}} = z_{\text{src}} + 1, \bar{y}_{\text{src}} = \tilde{z}_{\text{src}} + 1, \text{MD}(t, \bar{t}) \}\).  
2. It computes a post-condition from this new assertion as if the assignments \(z := \bar{y}, w := \bar{z}\) are executed in the source and \(t := \bar{z}, w := \bar{z}\) in the target. Specifically, it (i) removes source assertions about \(z, w\) and target ones about \(t, w\) because those registers are updated; (ii) adds \(t, z\) to the maydiff set because they are updated differently in the source and target (note that \(w\) is updated equivalently since \(\bar{z}\) is not in the maydiff set); and (iii) adds the equalities corresponding to the executed assignments. Thus we have \(\{ \bar{y}_{\text{src}} = \tilde{z}_{\text{src}} + 1, \bar{y}_{\text{src}} = \bar{y}_{\text{tgt}}, w_{\text{src}} = \tilde{z}_{\text{src}}, t_{\text{tgt}} = \tilde{t}_{\text{tgt}}, w_{\text{tgt}} = \bar{z}_{\text{src}}, \text{MD}(t, \bar{z}, t) \}\).

Then the proof gives the rule in intro_ghost \(t, (\bar{z}, \bar{z} + 1)\), which adds \(\{ z_{\text{src}} + 1 = \tilde{z}_{\text{src}}, \tilde{z}_{\text{tgt}} = \tilde{z}_{\text{tgt}} + 1 \}\) because \(\bar{z}\) is not in the maydiff set. Then the automation function adds \(\{ z_{\text{src}} = \tilde{z}_{\text{src}}, \tilde{z}_{\text{tgt}} = \bar{z}_{\text{tgt}} + 1 \}\) by transitivity: \(z_{\text{src}} = \bar{y}_{\text{src}} = z_{\text{src}} + 1 = \tilde{z}_{\text{src}}\) and \(\bar{z}_{\text{tgt}} = \tilde{z}_{\text{tgt}} + 1 = t_{\text{tgt}} + 1\). Finally, all the old registers are completely removed and the assertion after the phinodes \(\{ z_{\text{src}} = \tilde{z}_{\text{src}}, t_{\text{tgt}} = \bar{t}_{\text{tgt}} + 1, \text{MD}(t, z) \}\) trivially follows.

## 5 ERHL Proof Checker and Logic

In this section, we explain the proof checker in terms of the ERHL logic, and describe the soundness of the proof checker using the semantic interpretation of the logic. All our results are formally verified in Coq (see Appendix H for details).

**Proof Rules** The checker is based on the proof rules presented in Fig. 4. The checker is given the source and target programs \(Prg_{\text{src}}, Prg_{\text{tgt}}\) and a translation proof \(\Phi\), and tries to deduce \(Pr_{\text{src}} \sim Pr_{\text{tgt}}\) using the (Sim) rule. Here, \(Entry(F)\) denotes the entry block of the function \(F\); \(Prg[F].\xi[B, l]\)
the $i$-th instruction of the block $B$ in $F$; and $Prg[F],\phi[B,B']$ the assignment instructions of the phinodes of $B'$ when control comes from $B$ (e.g., in the source program in §4), $Prg[F],\phi[B_1,B_2] = \{ x := x, t := 42 \}$. Also, $\Phi[F],a[B,i]$ denotes the assertion in the proof $\Phi$ just before the $i$-th instruction of $B$ in $F$ (it denotes the last assertion when $i = -1$).

The checker first checks if $Prg_{src}$ and $Prg_{tgt}$ have the identical CFG (CheckCFG), the assertion in the entry is satisfied by the initial states for each function (CheckInit), and the Hoare triple $\{ P \} I_{src} \rightarrow I_{tgt} \{ Q \}$ is valid for all matching intra-block commands $I_{src}$ and $I_{tgt}$ and their pre- and post-assertions $P$ and $Q$ given by $\Phi$. For example, in Fig. 2, it checks at line 20 if $\{ x_{src} = a_{src} + 1, MD(0) \} y := x + 2 \land y := a + 3 \{ MD(0) \}$ is valid. It also checks for each inter-block edge from $B$ to $B'$ that $\{ P \} Prg_{src},\phi[B,B'] \rightarrow Prg_{tgt},\phi[B,B'] \{ Q \}$ is valid, where $P$ is the last assertion in $B$ and $Q$ is the first assertion in $B'$.

To validate a Hoare triple $\{ P \} I_{src} \rightarrow I_{tgt} \{ Q \}$, the checker first computes a post-assertion $Q_0$ with $\{ P \} I_{src} \rightarrow I_{tgt} \{ Q_0 \}$ using the rule PostAssn (see Appendix H for the definition of CheckEquivBeh and CalcPostAssn). Then it suffices to validate $Q_0 \Rightarrow Q$ by the rule CONSEQUENCE.

Then, using the rules APPLYINF and TRANS, the checker iteratively applies a sequence of inference rules $rule_1, \ldots, rule_n$ (either given by $\Phi$ or generated by automation function) and deduces $Q_0 \Rightarrow Q_n$, where $Q_i = ApplyInf(rule_{i+1} \cdot Q_{i-1})$.

Finally, the checker validates $Q_n \Rightarrow Q$ using the rule INCL, where CheckIncl performs a simple inclusion check.

**Semantic Interpretation** For the soundness of the proof checker, we give the semantic interpretation of the top-level judgment as semantics preservation, or behavior refinement:

$$\| Prg_{src} \sim Prg_{tgt} \| \overset{(Sim)}{=} \text{Beh}(Prg_{src}) \supseteq \text{Beh}(Prg_{tgt}) .$$

The soundness of (Sim) is proved using a local simulation in the style of [14], which is a simplification of parametric bisimulation [13]. First, we show that CheckInit($\Phi$) implies:

$$\forall \sigma_{src}, \alpha. \text{Finit}(\sigma_{src}) \land \text{Finit}(\sigma_{tgt}) \implies [\{ P \}, \alpha(\sigma_{src}, \sigma_{tgt})] .$$

Here, $\text{Finit}(\sigma)$ means $\sigma$ is a possible initial state of a function call, $[\{ P \}]$ is the semantic interpretation of the assertion $P$ (see Appendix G for details), and $\alpha$ is a CompCert-style memory injection [19], which basically maps a memory block in the source to an equivalent one in the target.

Second, we give the semantic interpretation of the Hoare triple for non-call instructions $I_{src}, I_{tgt}$ as a simulation step:

$$\| \{ P \} I_{src} \rightarrow I_{tgt} \{ Q \} \| \overset{(App)}{=} \forall \sigma_{src}, \sigma_{tgt}. \text{Instr}(\sigma_{src}) \rightarrow I_{src} \implies \sigma_{src} \rightarrow \sigma_{src}' \implies [\sigma_{src}', \sigma_{tgt}] .$$

where, $\text{Instr}(\sigma)$ is the next instruction to execute in the program state $\sigma$, and $\sigma \rightarrow \sigma'$ means the state $\sigma$ steps to $\sigma'$ emitting an observable event $\epsilon$. Also, $\subseteq$ is the extension relation of memory injection.

For call instructions $I_{src}, I_{tgt}$, $\| \{ P \} I_{src} \rightarrow I_{tgt} \{ Q \} \|$ basically states that $Q$ is satisfied by all possible equivalent returns states when an arbitrary function is called from states satisfying $P$ (see Appendix H for details). We followed the basic approach of parametric bisimulation [13].

The semantic interpretation of $\Rightarrow$ is as follows:

$$\| Q \Rightarrow Q' \| \overset{(Trans)}{=} \forall \sigma_{src}, \sigma_{tgt}. [\| Q \|, \sigma_{src}, \sigma_{tgt}] \implies [\| Q' \|, \sigma_{src}', \sigma_{tgt}'] \land \alpha \subseteq \alpha' .$$

For the soundness of (APPLYINF), every custom rule should satisfy that $\| Q \Rightarrow ApplyInf(rule, Q) \|$ holds for all $Q$.

6 Implemention
We developed the CRELVM framework for LLVM 3.7.1.

**Proof-Generation Code** We wrote proof-generation codes for register promotion (mem2reg) and GVN-PRE (gvn). To demonstrate the generality of ERHL logic and proof checker, we also wrote proof-generation codes for a part of loop invariant code motion (11c) and 139 micro-optimizations of instruction combining (instcombine), see Appendix D for the list of the supported micro-optimizations.

We explicitly mark as "not supported" for translations using operations not supported by Vellvm, or relying on deep analyses such as division-by-zero and alias analyses.

Fig. 5 shows the SLOC in C++ of the compiler and proof generation code for each pass. The SLOC ratio of the proof generation code to that of the corresponding compiler code is 37.5% for mem2reg, 40.2% for gvn, 40.5% for 11c, and 193.3% for instcombine. The CRELLVM infrastructure for proof generation consists of 1,708 lines for common library and 15,980 lines for JSON serialization library, of which 72.2% is automatically generated from 2,079 SLOC in a simple DSL.

**Inference Rules** In the proof checker we installed 221 custom inference rules, of which 202 are arithmetic rules like assoc_add. All 9 non-arithmetic rules used for mem2reg, gvn,
and licm, including transitivity and intro_ghost, are formally verified in Coq (see Appendix I for details).

**Verification of Proof Checker** In order to reduce TCB, we formally verified the soundness of the proof checker in Coq (see §5). It is worth noting that we achieved the same level of guarantee as CompCert for the translations that are validated by the proof checker using only verified inference rules.

We used the formal semantics of LLVM IR from the Velllvm project [43], but significantly upgraded the semantics in various ways. In particular, Velllvm used the CompCert memory model [19] version 1.9 and we upgraded it to version 2.4 in order to use the notion of permission in the LLVM semantics; and added the switch instruction to the formalization of LLVM IR. Note that Velllvm has a simpler memory model than the LLVM’s informal official one (e.g., pointer equality tests and pointer-integer casts are more undefined).

In total, our Coq development consists of 25,970 SLOC. The proof checker is 2,987 SLOC, and its verification is 18,934 SLOC. The 221 inference rules are 2,193 SLOC, and the verification of 9 rules took 1,856 SLOC. Note that the underlying semantics model of Velllvm consists of 39,307 SLOC.

**Experience** Writing proof-generation code was an iterative process: we had to repeat bug-fix processes many times. When proof checking fails, it tells us a logical reason for the failure so that we could easily identify the bug in proof-generation code (or else in the compiler). We believe the iteration could be shortened if we collaborated with LLVM developers.

Custom functions for automatically finding inference rules are greatly helpful for developing proof-generation algorithms. Using such automation, we could develop much simpler proof-generation algorithms for mem2reg and gvn, compared to our initial development, by making the code size less than half and speeding up more than twice.

**Crellvm** is less cost-effective for peephole optimizations in instcombine. We had to write 1.9 lines of proof-generation code for each line of the corresponding compiler code, and we did not verify arithmetic inference rules. Even though Crellvm achieves higher level of reliability, we think more automated approaches such as Alive [20] is more cost-effective for peephole optimizations.

### 7 Experiment

**Benchmarks** Using Crellvm, we validated the compilation of the SPEC CINT2006 C Benchmarks [4], LLVM nightly test suite, and five open-source projects written in C (the biggest benchmarks used in [27]11), totaling 5.3 million LOC in C. We omitted 4 files from the benchmarks because 3 of them contain instructions currently not supported by Velllvm, including the indirectbr instruction, and the other is too big (151MB) for our proof checker to handle.

Fig. 6 summarizes the validation results and the time spent on running the proof-generation codes and the proof checker for each optimization pass. In the experiment, we compiled each benchmark program with the -O2 flag, and validated the intermediate translations with the generated proofs. For more detailed results, see Appendix A.

We show the total number of translation steps (#V), the number of not-supported translations (#NS), and the number of translations failed at validation (#F). The rest of the translations (i.e., #V – #F – #NS) succeeded in validation. Also, all the successful translations were shown to be equivalent to the original translations using the llvm-diff tool. During the experiment, we also found and reported a bug in llvm-diff, which has been confirmed and fixed.

Out of 2,195K validations in total, 1,623K (73.9%) are successfully validated. All 305 (0.01%) failures are due to compiler bugs: 295 are due to the two bugs we found in gvn, and 10 are due to a register promotion bug we found (see §1.2). After fixing the bugs we found, the compiler no longer failed at validation. 572.0K (26.1%) translations are currently not supported in our validator, among which 555.7K (97.1%) use instructions not supported by Velllvm: vector operations 515.0K (92.7%), aggregate type operations 30.4K (5.47%), debug attributes 8,673, (1.56%) and atomic operations 1,714 (0.31%). 13,013 (2.27%) use the alias and division-by-zero instructions not supported in our validator, among which 11,988 (92.6%) use instructions not supported by Velllvm: vector operations 11,353 (94.2%), aggregate type operations 545 (4.70%), debug attributes 8,673, (1.56%) and atomic operations 1,714 (0.31%). 13,013 (2.27%) use the alias and division-by-zero instructions not supported in our validator, among which 11,988 (92.6%) use instructions not supported by Velllvm: vector operations 11,353 (94.2%), aggregate type operations 545 (4.70%), debug attributes 8,673, (1.56%) and atomic operations 1,714 (0.31%).

We measured the time spent on performing each optimization in the original compiler (Orig); on performing each optimization and calculate validation proofs in the modified compiler (PCal); on writing and reading the source and target programs with the proofs via files (I/O); and on validating the proofs by the proof checker (PCheck). We measured the time taken by each job, and simply summed them up to compute the total time taken.

In the experiment, we embarrassingly parallelized compilation and validation jobs and fully utilized the 96 hardware threads from four identical workstations with Intel Xeon E5-2630 CPU (2.6GHz, 12 cores, 2 hardware threads per core),

---

11We omitted Linux, since it is currently not compiled with LLVM (see [3]).
128GB RAM, and 1TB SSD (Samsung 850 PRO). The whole experiment took about three hours in wall clock.

**Validating Randomly Generated Programs** We randomly generated 1,000 C programs using Csmith [41], compiled them with -O2 flag, and validated the intermediate translations with the generated proofs. All 50,064 mem2reg validations are successfully validated, except for one due to the mem2reg bug we found (see §1.2). Out of 42,584 GVN-PRE validations, 11,816 (27.7%) are currently not supported, and the other 30,768 (72.3%) are all successfully validated.

**Performance** Proof checking takes much more time than regular compilation, but we believe it is still reasonable for compiler writers to use Crellvm for stabilizing compilers. Also, as we have shown in the experiment, compiler writers can further reduce runtime by checking proofs in parallel. Furthermore, there is still a large room for performance improvement as we have not done any serious performance analysis and tuning for the proof checker. In particular, we believe we can significantly reduce I/O time, which is the current bottleneck, by writing proofs in binary format rather than in plain-text JSON format. Currently, the proof size per source file is about 9.3 MB in total (5.4 MB for mem2reg, 0.4 MB for gvn, 2.2 MB for licm, 1.3 MB for instcombine).

8 Related Work

A large number of prior work on improving reliability of compiler are roughly classified into the following categories.

**Credible Compilation** Rinard et al. [32], who coined the term credible compilation, proposed the framework of credible compilation and presented a relational Hoare logic, in which one can reason about register allocation and instruction scheduling optimizations in the presence of pointer aliasing. Independently, Benton [8] proposed a relational Hoare logic for a functional language. However, their logics are designed for simple languages, and the framework has not been implemented and applied to compilers.

Namjoshi et al. [23, 24] presented a “proof of concept” implementation of credible compilation (or a witnessing compiler in their terminology) for LLVM optimizations such as constant propagation, deadcode elimination, and LICM. However, the work can be seen as rather preliminary for the following reasons. First, their proof checker supports a small subset of LLVM IR, most notably ignoring memory operations. Second, it assumes that main functions of the compiler are correct. For example, it assumes that the constant folding function of LLVM is correct.

Verified translation validation is similar to verified credible compilation but differs in that it develops a verified validator specialized for a particular optimization, rather than developing a proof checker for a general logic. Various verified translation validators have been developed for CompCert: instruction scheduling [38], lazy code motion [39], software pipelining [40]; register allocation [31]; SSA transformation [7]; and GVN and sparse conditional constant propagation (SCCP) [11].

(Foundational) proof carrying code (PCC) [6, 25] is similar to (verified) credible compilation, but it employs a (verified) unary logic for validating safety properties of the generated target program.

**Translation Validation** Translation validators check equivalence (precisely, refinement) between any two given source and target programs. Since they are agnostic to the compiler’s internal logic and regard the target optimization as a black box, they are inherently incomplete. Therefore, a variety of translation validators with different heuristics and trade-offs were proposed [12, 26, 28, 29, 33–35, 37, 42, 45, 46]. In particular, Tristan et al. [37] and Stepp et al. [34] developed translation validators for LLVM optimization passes, including dead code elimination, GVN-PRE, constant propagation, and LICM. However, they failed at about 20% of the validations, most of which are likely to be false alarms.

**Compiler Verification** Verified compilers provide the highest level of reliability by proving the semantics preservation property for all input programs in a proof assistant. CompCert [17, 18] is the most sophisticated formally verified optimizing C compiler, whose correctness is proved in Coq [1], and CakeML [15] is an optimizing ML compiler formally verified in the HOL4 theorem prover [2]. However, verifying a full-fledged compiler is highly costly and verified compilers are usually much less performant than production compilers.

Zhao et al. [43, 44] implemented and verified the vmem2reg pass for LLVM in Coq, but its algorithm is significantly simplified compared to that in LLVM. Their simplified algorithm is based on a rewriting logic in which each rewriting step preserves semantics and each intermediate program is type-checked. On the other hand, LLVM’s register promotion algorithm temporarily breaks semantics preservation property and even the intermediate programs are not type-checked, because ill-formed empty phinodes are inserted in the middle and their arguments are filled later. According to the authors, this renders the formal verification hard for the register promotion implementation in LLVM.

**DSL for Optimizations** Lopes et al. [20–22] presented Alive, a DSL for writing peephole optimizations using the SMT solver Z3 [5]. With Alive, one can either prove the correctness of an optimization or find a counterexample. They ported 300 micro-optimizations of instcombine to Alive, and in doing so they found 8 bugs in instcombine. However, Alive is limited to supporting only peephole optimizations that do not involve reasoning about cyclic control flows. In addition, Alive makes simplifying assumptions on the LLVM semantics, and their encoding of an optimization into SMT queries is a part of TCB. Tatlock and Lerner [36] also presented a DSL for writing CompCert optimizations based on
a rewriting logic, but it is not general enough to support GVN-PRE and register promotion.

**Compiler Testing** Random testing tools such as CSmith [9, 30, 41] and EMI [16] have been very successful. They have found hundreds of bugs in GCC and LLVM. However, most of them are found in the instcombine pass and none of them in mem2reg and gvn.

### 9 Conclusion

We have demonstrated that the credible compilation approach scales to the production compiler LLVM by developing our Crellvm framework. We also empirically show that Crellvm can be an effective tool for achieving high reliability of major optimizations by discovering four long-standing bugs in the mem2reg and gvn passes, for which to the best of our knowledge, no previous systematic approaches have found bugs.
References


A Experimental Results

Fig. 7 and Fig. 8 shows the validation results and the time spent on running the proof-generation codes and the proof checker.

B Miscompilation of a Realistic Program due to a Mem2reg Bug

One of the bug we found in mem2reg miscompiles the following program:

```c
#include <stdio.h>

int sqr(int i, int prev, int cur) {
    return cur * cur;
}

int diffSqr(int i, int prev, int cur) {
    if (i==0) return 0; else return (cur-prev) * (cur-prev);
}
```

Figure 7. Validation Results

<table>
<thead>
<tr>
<th>LOC</th>
<th>mem2reg</th>
<th>gvn</th>
<th>lcim</th>
<th>instcombine</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total</td>
<td>5289.18</td>
<td>76.62K</td>
<td>10</td>
<td>8.58K</td>
</tr>
</tbody>
</table>

Figure 8. Time Spent on Running the Proof-Generation Codes and the Proof Checker
void foo(int arr[], int n) {
    int sqrsum = 0, diffsqrs = 0;
    int i, prev, cur;
    for (i = 0; i < n; ++i) {
        prev = cur = arr[i];
        sqrsum += sqr(i, prev, cur);
        diffsqrs += diff(i, prev, cur);
    }
    printf("square sum=%d, diff sqr sum=%d \n", sqrsum, diffsqrs);
}

int main () {
    int a[3] = {1, 2, 5};
    foo(a, 3);
    return 0;
}

The function foo() takes an array a and its size n and prints the sum of the squares of the numbers in a, and the sum of the squares of the differences of adjacent numbers in a, i.e.,:
\[
\sum_{i=0}^{n-1} a[i]^2 \quad \text{and} \quad \sum_{i=1}^{n-1} (a[i] - a[i-1])^2 .
\]

The function foo() calculates the two summations in a single loop. Note that the function diff(i, prev, cur) returns (cur - prev)^2 if i > 0, and zero otherwise.

Clang 3.7.1 with -O2 flag miscompiles this program. The compilation result prints out 30 and 0 instead of the correct answer 30 (= 1^2 + 2^2 + 5^2) and 10 (= (2 − 1)^2 + (5 − 2)^2). This is due to the mem2reg bug we found on the special case that all stores to a promotable location is in a single block. In essence, the loads from the local variable prev are illegally promoted to undef, the function call to diff() is inlined, and exploiting the undef semantics, the result of the inlined call is optimized to 0.

Even though the program invokes undefined behavior according to the C standard, we believe it is still likely to be written in the real-world: a programmer may logically conclude that the undef value is never used, and the program is safe.

C Global Value Numbering with Partial Redundancy Elimination

Global Value Numbering optimization (GVN), which is implemented in the gvn pass of LLVM, detects and removes redundant instructions. GVN algorithm first groups expressions and values into equivalent classes and assigns a unique “value number” to each class. Then, a leader value is chosen for each class, and all the non-leader, redundant, instructions are substituted with the leader value. The gvn pass also does Partial Redundancy Elimination optimization (PRE), which eliminates instructions that are partially redundant depending on the control flow.

In this section, we show how we generate and validate proofs for GVN-PRE optimization. It is worth noting that even though the GVN and PRE algorithms are separately written in the gvn pass, their validation logics are so similar that the resemblance of their validation logics enabled us to write a single proof generation code that uniformly works for both GVN and PRE.

C.1 Translation Example

We use a PRE translation example because it is more interesting than a GVN example. The shaded part of Fig. 9 shows an example translation of the PRE optimization, where the registers n and c1 are the function parameters.

First, by analyzing the source program, GVN assigns unique value numbers to the classes of equivalent values and expressions. For example, in Fig. 9, GVN constructs the following tables, VT, ET and LT.

\[
\begin{align*}
VT & = \{x_1, x_2 \mapsto 1; \ y_1, y_2, y_3 \mapsto 2\} \\
ET & = \{n - 2 \mapsto 1; \ 1 + 1 \mapsto 2\} \\
LT & = \{\ldots; \ B_{\text{empty}} \mapsto [1 \mapsto x_1; \ 2 \mapsto 10]; \ B_{\text{right}} \mapsto [1 \mapsto x_2; \ 2 \mapsto y_2]; \ \ldots\}
\end{align*}
\]

Originally, The gvn pass also assigns value numbers to singleton classes, but in this example, we only consider those classes with more than one value for brevity.

Here, the value table VT assigns value number 1 to x1 and x2, and 2 to y1, y2 and y3; and the expression table ET assigns value number 1 to the expression n - 2, and 2 to 1 + 1. This means that, at any point of execution, there exist some values for 1 and 2 such that x1, x2 and n - 2 evaluate to the value 1 and y1, y2, y3 and 1 + 1 evaluate to the value 2 whenever
We can turn our intuition for soundness into a formal proof. The unshaded part of Fig. 9 shows the proof, each assertion of which can be split into three parts.

In the example, PRE detects partial redundancies of the instruction \( y_3 := x_1 + 1 \) in \( B_{exit} \) because \( y_3 \) belongs to \( \ominus \) and \( B_{exit} \)'s incoming blocks have a leader value for \( \ominus \): 10 in block \( B_{empty} \) and \( y_2 \) in \( B_{right} \). In other words, depending on the control flow, the expression \( y_3 \) is equivalent to either 10 or \( y_2 \). To eliminate the redundant instruction, PRE (i) inserts the phinode \( y_4 := \phi(10, y_2) \) at the beginning of \( B_{exit} \); (ii) replaces all uses of \( y_3 \) with \( y_4 \) at line 41; and (iii) eliminates \( y_3 := x_1 + 1 \) at line 40. This translation is beneficial because the inserted phinode is compiled down to a move instruction, which is more cost-effective than the eliminated addition instruction.

C.2 ERHL Proof

We can turn our intuition for soundness into a formal proof. The unshaded part of Fig. 9 shows the proof, each assertion of which can be split into three parts.

**Expression Assertions** The red equalities of the form \( e_{src} = \hat{v}_{src} \) or \( \hat{v}_{tgt} = e_{tgt} \) relate expressions with their value numbers according to the expression table \( ET \). For example, the assertion after line 21 states that there exist some values \( \hat{v}_1 \) and \( \hat{v}_2 \) (for \( \oplus \) and \( \ominus \)) such that \( \hat{v}_1 = n - 2 \) and \( \hat{v}_2 = 1 + \hat{v}_1 \) hold for both the source and target states.

**Value Assertions** The green equalities of the form \( x_{src} = \hat{v}_{src} \) or \( \hat{v}_{tgt} = x_{tgt} \) relate values with their value numbers according to the value table \( VT \). For example, the assertion after line 21 also states that \( x_1 \) has the value \( \hat{v}_1 \) in the source and \( y_1 \) has the value \( \hat{v}_2 \) in the target.

**Branching Assertions** The blue equalities show properties derived from branching conditions. For example, the equality \( c_{tgt} = (y_{tgt} == 10) \) at line 21 is derived from the definition of the branching register \( c_2 \), and the equality \( \hat{v}_2 = 10 \) in the block \( B_{empty} \) is derived from the fact that \( c_2 \) must be true in \( B_{empty} \) and thus \( y_1 \) is 10 and so is its associated value \( \hat{v}_2 \).
C.3 Proof Validation

The proof in Fig. 9 is validated as follows.

Adding Value Assertions  At line 40 (and similarly at line 30), the proof checker computes a post-assertion by adding \( y_3_{src} = x_1_{src} + 1 \), and deduces, by applying the inference rules from the proof, that:

\[
\begin{align*}
y_3_{src} &= x_1_{src} + 1 \\
&= \nu_1_{src} + 1 \\
&= 1 + \nu_1_{src} \\
&= \nu_2_{src}
\end{align*}
\]

(post-assertion)

(substitution(x_1_{src} + 1, x_1_{src}, \nu_1_{src}))

(commutativity_add(\nu_1_{src}, 1))

(expression assertion)

from which the next assertion trivially follows.

Adding Expression Assertions  At line 31 (and similarly at lines 10 and 20), the proof checker computes a post-assertion by adding \( y_2_{src} = x_2_{src} + 1, x_2_{tgt} + 1 = y_2_{tgt} \). Then the proof gives the inference rule intro_ghost(1 + \nu_1, \nu_2), which adds \( 1 + \nu_1_{src} = \nu_2_{src}, \nu_2_{tgt} = 1 + \nu_1_{tgt} \). Then the proof checker deduces \( \nu_2_{tgt} = y_2_{tgt} \), similarly as in line 40.

Adding Phinode Assertions  At the phinode of B_{exit}, the assertion is separately validated for each incoming block. For the incoming block B_{left} (and similarly for B_{right}), the proof checker computes a post-assertion by adding \( y_1 = y_4_{tgt} \) and adding \( y \) to the maydiff set, and derives \( \nu_2_{tgt} = 10 = y_4_{tgt} \) by applying the transitivity rule from the proof.

Adding Branching Assertions  At line 21, the proof checker computes a post-assertion by adding \( c_2_{src} = (y_1_{tgt} = 10), c_2_{tgt} = (y_1_{tgt} = 10) \), from which the next assertion trivially follows.

Using Branching Assertions  At the beginning of B_{empty}, the proof checker computes a post-assertion by adding \( \text{true} = c_2_{src}, \text{true} = c_2_{tgt} \) from the branching condition, and deduces, by applying the inference rules from the proof, that \( \text{true} = c_2_{tgt} = (y_1_{tgt} = 10) \) and

\[
\begin{align*}
\nu_2_{tgt} &= y_1_{tgt} \\
&= 10
\end{align*}
\]

(value assertion)

(from true = (y_1_{tgt} = 10) by icmp_to_eq(true,y_1_{tgt},10))

from which the next assertion trivially follows.

C.4 Proof Generation

Now we explain how we generate proofs for the GVN-PRE optimization.

We first add codes to GVN and PRE algorithms that generate auxiliary data, \( RET, RPT \) and \( BCT \). For example, we compute the following for Fig. 9:

\[
\begin{align*}
RET &= [x_1, x_2 \mapsto \{ n_{src} - 2 = \nu_1_{src}, \nu_1_{tgt} = n_{tgt} - 2 \}; \\
y_1, y_2, y_3, y_4 \mapsto \{ n_{src} - 2 = \nu_1_{src}, \nu_1_{tgt} = n_{tgt} - 2, 1 + \nu_1_{src} = \nu_2_{src}, \nu_2_{tgt} = 1 + \nu_1_{tgt} \}] \\
RPT &= [x_1, x_2 \mapsto (\_, \_); \ y_1 \mapsto (\_, 1); \ y_2, y_3 \mapsto (3, \_); \ y_4 \mapsto (2, 2)] \\
BCT &= [(10, 2) \mapsto \{ (B_{left}, \text{true}, y_1) \}].
\end{align*}
\]

The register expression table \( RET \) contains sufficient expression assertions for reasoning about each register. For example, \( RET(x_1) \) contains \( \{ n_{src} - 2 = \nu_1_{src}, \nu_1_{tgt} = n_{tgt} - 2 \} \), which is sufficient for reasoning about \( x_1 \). The register parameter table \( RPT \) contains the value numbers of the instruction parameters for each register. For example, \( RPT[y_1] \) contains \( (\_, 1) \), which means that the second parameter \( x_1 \) of the instruction 20: \( y_1 := 1 + x_1 \) has the value number \( 1 \). The branching condition table \( BCT \) maps pairs of constants/function parameters and their value numbers to their associated branching information. For example, \( BCT[(10, 2)] \) contains \( (B_{left}, \text{true}, y_1) \), which means that \( (i) \ y_1 = 10 \) holds when control flows from B_{left} with the branching condition being true \( (i.e., c_2 = \text{true}) \), and \( (ii) \ y_1 \text{'s value number is 2} \). We can easily construct \( RET, RPT, \) and \( BCT \) following the construction of \( VT \) and \( ET \).

Algorithm 3 presents the common proof-generation code that generate assertions for both GVN and PRE, which is given in a rather functional style for presentation purposes. Specifically, \( \text{ProofGen}(F, r, v, VT, RET, RPT, BCT) \) generates a proof for the replacement of \( r \) with \( v \) in function \( F \). In essence, the proof generation code adds assertions, starting from those for \( r_{src} \) and \( v_{tgt} \) at \( r \text{'s definition point and recursively down to the arguments, using a worklist and auxiliary data } RET, RPT, \text{ and } BCT \). The recursion stops when the target value is the only value of its value number. More concretely, the algorithm works for the example of Fig. 9 as follows.
Algorithm 3 ProofGen(F: Function, r: Register, v: Value, VT, RET, RPT, BCT)

A1: (l0: r := _) := FindDef(F, r)
A2: WL := [(l0, r, VT[r], src), (l0, v, VT[r], tgt)]
A3: while NonEmpty(WL) do
   A4: ((l0, x, ň, side) := WL) := WL
   A5: match FindDef(F, x) with
      A6: if (l0: x := e) ⇒
         A7: Assn(RET[x], l0, l0), Assn(x[side] = ň[side], l0)
         A8: for (l': y, m) in MatchExpr(e, RPT[x]) do
            A9: WL := (l', y, m, side) :: WL
         A10: end for
      A11: | Some (ConstantOrParameter(C)) ⇒
         A12: match FindBranchingCondition(BCT[|C, ň|], l0, F) with
            A13: if FindDef(F, e) = Some (l0: c := e) then Assn(c[side] = e[side], l0, l0) end if
         A14: Inf(icmp_to_eq(v1tgt, C), l0)
         A15: Assn(RET[y], l0, l0), Assn(C[side] = ň[side], l0, l0)
         A16: WL := (l, y, ň, side) :: WL
      A17: end match
   A18: end match
A19: end while
A20: Auto(GVN_PRE)

Initialize Worklist The code finds where r is defined (line A1), and add a work for the source and another for the target to the worklist (line A2). A work (l0, x, ň, side) means that (i) x’s value number is ň, and (ii) line l0 needs the assertion x[side] = ň[side] (where side is either src or tgt) and the expression assertions. For the translation in Fig. 9, x = y3 is defined at line 40 in the source, so the initial works are (40, y3, v2, src) and (40, y4, v2, tgt). The code processes each work (l0, x, ň, side) as follows (lines A3-A4).

Processing Registers If x is a register defined as e at l (line A6), the code adds the expression assertions RET[x] for x from l to l0, and the value assertion x = ň at side (line A7). Consider the work (40, y4, v2, tgt), for example. The register y4 is defined by the phinode of Bexit, so l is the phinode of Bexit and e = φ(10, y2). Hence RET[y4] and v2tgt = y4tgt are inserted from the phinode of Bexit to line 40. For the work (40, y3, v2, src), since y3 is defined at line 40, no assertions are inserted.

In order to justify the inserted assertions at line l, the code adds sub-works for the values in e (lines A8-A9). Note that MatchExpr(e, RPT[x]) matches e against the register parameter table in order to know which value, say y, should have which value number, say m, at line l’ (line A8). For example, in order to deduce y3src = v2src after line 40, x1src = v1src should hold before the line, so the code adds (40, x1, v1, src) to the worklist; for justifying v2tgt = y4tgt after the phinode of Bexit, the code adds (edge from Bempty to Bexit), 10, v2, tgt) and (edge from Bright to Bexit), y2, v2, tgt) to the worklist.

Processing Constants and Parameters If x is a constant or parameter C (line A11), the code looks for a branching condition in BCT{|C, ň|}, which justifies that C has the value number ň at l0 (line A12). The result (v, y, c, l) means y = C holds at l thanks to the branching condition c being equal to v.

The code adds the branching assertion when necessary (line A14), and also adds an icmp_to_eq inference rule (line A15). Also, similarly to the case of registers, the code adds expression and branching assertions (line A16) to the proof, and adds sub-works for the value y (line A17).

For example, consider the work (Ee, 10, v2, tgt), where Ee is the edge from Bempty to Bexit. The code finds a branching condition in BCT{|10, v2|} which guarantees that 10 = v2 holds at Ee (line A12). Block Bleft has such a branch when c2 equals to true (v, y, c, l) = (true, y1, c2, Ei) at line 13 where Ei is the edge Bleft to Bempty. Hence the code adds c2tgt = (y1tgt == 10) from line 21 to Ei (line A14) and adds the inference rule icmp_to_eq(true,y1tgt,10) at Ei (line A15). Then the code adds the expression assertions RET[y1] for y1 and the branching assertion v2tgt = 10 from Ei to Ee (line A16), and adds the work (Ei, y1, v2, tgt) to the worklist (line A16).

Automation Function Throughout the proof we use the GVN_PRE automation function, which adds intro_ghost, commutativity, and substitution in a specific way for GVN-PRE, and adds transitivity and reduce_maydiff in the same way as for assoc_add and mem2reg.
D Micro-Optimizations in instcombine

We validated the following 139 micro-optimizations in instcombine:

- add-comm-sub, add-const-not, add-dist-sub, add-mask, add-onebit, add-or-and, add-select-zero, add-shift,
- add-signbit, add-sub, add-xor-and, add-zext-bool, and-de-morgan, and-mone, and-not, and-or-const2,
- and-or-not1, and-or, same, and-undef, and-xor-const, and-zero, bitcast-bitcast, bitcast-fpext,
- bitcast-fptosi, bitcast-fptoui, bitcast-fptrunc, bitcast-inttoptr, bitcast-ptrtoint, bitcast-sametype,
- bitcast-sext, bitcast-sitofp, bitcast-trunc, bitcast-uitofp, bitcast-zext, bop-associativity,
- dead-code-elim, dead-store-elim, fold-phi-bin-const, fold-phi-bin, fpext-bitcast, fptext-fpext,
- fptosi-bitcast, fptoui-bitcast, fptoui-fpext, fptrunc-bitcast, fptrunc-fpext, icmp-eq-add-add,
- icmp-eq-srem, icmp-eq-sub-sub, icmp-eq-xor, icmp-ne-const, icmp-ne-add-add, icmp-ne-srem,
- icmp-ndef-sub-sub, icmp-ne-sub-xor, icmp-ne-xor, icmp-sge-or-not, icmp-sgt-and-not,
- icmp-sle-or-not, icmp-slt-and-not, icmp-swap, icmp-uge-or-not, icmp-ugt-and-not, icmp-ule-or-not,
- icmp-ult-and-not, inttoptr-bitcast, inttoptr-ptrtoint, mul-boof, mul-mone, mul-mneg, mul-shl, or-and-xor,
- or-and, or-mone, or-or, or-or2, or-or, or-or2, or-or, or-or2, or-or3, or-xor4, or-xor, or-zero,
- ptrtoint-bitcast, ptrtoint-inttoptr, sdiv-mone, select-bop-fold, select-icmp-eq-ior1, select-icmp-eq-ior2,
- select-icmp-eq, select-icmp-gt-const, select-icmp-lt-const, select-icmp-eq-ior1, select-icmp-eq-ior2,
- select-icmp-eq, select-icmp-gt-const, select-icmp-gt-const, select-icmp-xor-ior1, select-icmp-xor-ior2,
- select-icmp-xor-ior1, select-icmp-xor-ior2, sext-bitcast, sext-sext, sext-trunc-ashr, sext-zext,
- shift-undef1, shift-undef2, shift-zero1, shift-zero2,
- sitofp-bitcast, sitofp-sext, sitofp-zext, sub-add, sub-const-add, sub-const-not, sub-mone, sub-onebit,
- sub-or-xor, sub-remove, sub-shl, sub-sub, trunc-bitcast, trunc-onebit, trunc-slt, trunc-trunc, trunc-zext,
- uopf-bitcast, uopf-sext, xor-same, xor-undef, xor-zero, ext-bitcast, ext-trunc-and-xor, ext-trunc-and,
- zext-xor, zext-zext

Note that we gave these names and they are not officially used in LLVM.

E Program Points between Two Lines

Assertion(\(P, l_1, l_2\)) in the proof generation code means predicate \(P\) should be added to the assertions between \(l_1\) and \(l_2\). More specifically, the proposition \(P\) should be added at every program point appearing in a path from \(l_1\) to \(l_2\) that does not visit \(l_1\) but may visit \(l_2\) in-between. Since \(l_1\) is the source of the proposition \(P\) so that we can get \(P\) as a post-assertion every time we visit \(l_1\), there is no need to add \(P\) along a path from \(l_1\) to \(l_2\). For example, consider the following program.

![Diagram of a program with program points labeled]

The marked area between \(l_1\) and \(l_2\) is where we should add the proposition \(P\).

Thanks to the SSA property [10], we can efficiently calculate the program points between \(l_1\) and \(l_2\). First, we can assume that \(l_1\) dominates \(l_2\) (i.e., one should have visited \(l_1\) to reach \(l_2\) from the entry point), since the proposition \(P\) created at \(l_1\) should hold at \(l_2\). Then a program point \(i\) is on a path from \(l_1\) to \(l_2\) that does not visit \(l_1\) in-between if and only if (i) \(l_1\) dominates \(l\) and (ii) \(l_2\) is reachable from \(l\) without visiting \(l_1\). We efficiently check the first condition using the dominator tree [10] and the second condition by a backward BFS search from \(l_2\).

In the above example, any program point in the marked area is such that \(l_1\) dominates it and \(l_2\) is reachable from it without visiting \(l_1\). For example, \(l_2\) is not reachable from the block \(B_4\) without visiting \(l_1\), and \(l_1\) does not dominate the block \(B_5\).

Note that this algorithm is not a part of TCB validation may fail but cannot succeed incorrectly due to bugs in this algorithm.

F Lessdef Predicates

LLVM and CompCert has the notion of undef value, which is designated as the result of erroneous operations. The compilers are allowed to replace undef by an arbitrary value. For example, LLVM’s InstCombine performs the following translation, where the register \(y\) is replaced by 1, presumably because \(y_{src} = a_{src} - x_{src} = a_{src} - (a_{src} - 1) = 1\) holds for any integer value of...
a_{src}:

\[
\begin{align*}
x & := a - 1; \\
y & := a - x; \quad \leadsto y := a - x; \\
z & := y + 1; \\
\end{align*}
\]

However, if \(a_{src}\) is \texttt{undef}, then we have \(z_{src} = (a_{src} - (a_{src} - 1)) + 1 = \texttt{undef} \) \texttt{undef} is propagated in the arithmetic. Thus the equation \(z_{src} = z_{tgt}\) no longer holds, breaking the equality relationship between the source and target values.

In order to reason such optimizations, we use the CompCert-style lessdef relation [19] throughout this work instead of the equality as assertion predicates. Concretely, \(x\) is less defined than \(y\), denoted \(x \sqsubseteq y\), if \(x\) is \texttt{undef} or it equals to \(y\). For example, we have \(y_{src} = a_{src} - x_{src} = a_{src} - (a_{src} - 1) \sqsubseteq 1\) and thus \(z_{src} = y_{src} + 1 \sqsubseteq 1 + 1 = z_{tgt}\) regardless of whether \(a = \texttt{undef}\) or not, which justifies the above translation.

So far we used the equality instead of lessdef relation for simplicity of presentation. However, all the assertions presented in this paper works even when equalities are replaced by lessdef relations. Note that the post-assertion generator should be adapted to lessdef relations so that they introduce both \(x \sqsubseteq e\) and \(e \sqsubseteq x\) after \(x := e\).

### G Semantic Interpretation of Assertions

The syntax of assertions is as follows:

\[
\begin{align*}
\text{Reg} & \ni r ::= \cdots \\
\text{Const} & \ni c ::= \cdots \\
\text{Typ} & \ni \text{typ} ::= \text{int} | \ast \text{typ} | \cdots \\
\text{SVal} & \ni v ::= r | c \\
\text{Tag} & \ni \text{tag} ::= \text{Phy} | \text{Ghost} | \text{Old} \\
\text{Reg} \times \text{Tag} & \ni rT ::= (r, \text{tag}) \\
\text{SVal} \times \text{Tag} & \ni vT ::= (v, \text{tag}) \\
\text{Expr} & \ni e ::= \text{add} vT vT | \text{load} vT \text{typ} \text{align} | \cdots \\
\text{Pred} & \ni \text{pred} ::= e \sqsubseteq e | \text{Uniq}(r) | \text{Priv}(rT) | vT \bot vT \\
\text{MD} & \ni M ::= \{rT, \cdots\} \\
\text{AssnU} & \ni S, T ::= \{\text{pred}, \cdots\} \\
\text{Assn} & \ni P, Q ::= (S, T, M)
\end{align*}
\]

\text{Reg}, \text{Const}, \text{and} \text{Typ} are the types of LLVM registers, constants, and types. A (static) value is either a register or a constant. Registers and values may be tagged (see /coq/def/Exprs.v:106): the tag \text{Phy} means physical registers, \text{Ghost} means regular ghost registers (§3.2), and \text{Old} means old ghost registers (§4). We write \(r\) for \((r, \text{Phy})\), \(\hat{r}\) for \((r, \text{Ghost})\), and \(\overline{r}\) for \((r, \text{Old})\).

An expression is the right-hand side of a side-effect-free LLVM instructions with operand values being tagged (see /coq/def/Exprs.v:366 in the Coq development\footnote{We submitted the Coq proof scripts as supplementary material.}). Notice that load is side-effect-free (except for undefined behavior) so there are load expressions, while store is side-effectful and there are no store expressions. Recall that \(e_1 \sqsubseteq e_2\) means either \(e_1 = e_2\) or \(e_1 = \texttt{undef}\), and the predicate \(vT_1 \bot vT_2\) means the addresses in \(vT_1\) and \(vT_2\) point to disjoint memory blocks. A unary assertion is a set of predicates (see /coq/def/Hints.v:34), and a maydiff set is a set of tagged registers. An assertion consists of a unary assertion for source, another for target, and a maydiff set (see /coq/def/Hints.v:41).

We use the following semantic domains:

\[
\begin{align*}
\text{Val} & \ni V ::= \cdots \\
\text{RF} & \ni rs ::= \{r \mapsto V, \cdots\} \\
\text{State} & \ni \sigma ::= \cdots \\
\text{StateT} & \ni \sigma T ::= (\sigma, rs, rs) \\
\text{Meminj} & \ni a ::= \cdots
\end{align*}
\]
Let $Val$ be the set of (dynamic) values, $RF$ be the set of register files, $State$ be the set of states, $StateT$ be the set of extended states, which are tuples of a state, a register file for ghost registers, and another for old ghost registers. $Meminj$ is the set of CompCert-style memory injections, which basically maps a source memory block to the equivalent target block [19].

The semantics of assertions is as follows:

$$\begin{align*}
[r],\{e\},\{v\} & : State \rightarrow Val \overset{\text{def}}{=} \cdots \\
[rT],\{vT\} & : StateT \rightarrow Val \overset{\text{def}}{=} \cdots \\
\{pred\} & : 2^{Blk} \rightarrow StateT \rightarrow \mathbb{P} \\
[e_1 \equiv e_2](priv,\sigma T) & \overset{\text{def}}{=} [e_1](\sigma T) \supseteq [e_2](\sigma T) \\
[\text{Uniq}(r)](priv,\sigma T) & \overset{\text{def}}{=} \forall b,b',o,o'. [r](\sigma T) = (b,o) \wedge (b',o') \in \sigma(T \setminus r) \implies b \neq b' \\
[\text{Priv}(rT)](priv,\sigma T) & \overset{\text{def}}{=} \forall b,o.[rT](\sigma T) = (b,o) \implies b \in priv \\
[vT_1 \perp vT_2](priv,\sigma T) & \overset{\text{def}}{=} \forall b_1,b_2,o_1,o_2. [vT_1](\sigma T) = (b_1,o_1) \wedge [vT_2](\sigma T) = (b_2,o_2) \implies b_1 \neq b_2 \\
[M](\alpha,\sigma_T,\sigma_{T_{igt}}) & \overset{\text{def}}{=} \forall rT \notin M. [rT](\sigma_T) \sim_{\alpha} [rT](\sigma_{T_{igt}}) \\
[S](priv,\sigma T) & \overset{\text{def}}{=} \forall \text{pred} \in S. \{\text{pred}\}(priv,\sigma T) \\
[(S,T,M)](\alpha,\sigma_T,\sigma_{T_{igt}}) & \overset{\text{def}}{=} \exists \sigma_T,\sigma_{T_{igt}}. [M](\alpha,\sigma_T,\sigma_{T_{igt}}) \wedge \\
& \sigma_T : 0 = \sigma_{T_{igt}} : 0 = \sigma_{T_{igt}} : [T](\text{priv}_{igt}(\alpha),\sigma_{T_{igt}}) \\
& \sigma_{T_{igt}} : 0 = \sigma_{T_{igt}} : [T](\text{priv}_{igt}(\alpha),\sigma_{T_{igt}}) \\
\end{align*}$$

Registers, constants, static values have obvious semantics that maps a state to a dynamic value. Tagged registers and values map an extended state to a dynamic value. The semantics of $\text{pred}$ is a predicate over a set of blocks, which represents the set of private blocks, and an extended state (see /coq/proof/InvState.v:346). In particular, $[\text{Uniq}(r)](priv,\sigma T)$ means $r$ is not aliased with any other values in $rT$ (see /coq/proof/InvState.v:314), and $[\text{Priv}(rT)](priv,\sigma T)$ means if $rT$ represents a pointer, it is not in $priv$ (see /coq/proof/InvState.v:332). $[vT_1 \perp vT_2]$ means if two values are pointers, they points to different memory block ($b$ for block, $o$ for offset, see /coq/proof/InvState.v:244). The semantics of a maydiff set is that all corresponding values are injected except for those in the maydiff set. The semantics of an assertion is that there exist ghost and old register files of the source and target states, satisfy the semantics of the source and target assertions and the maydiff set (see /coq/proof/InvState.v:430). Here, $\text{priv}_{src}(\alpha)$ is the set of those source blocks that are not mapped to a target block, and $\text{priv}_{igt}(\alpha)$ is the set of those target blocks that are not mapped from a source block.

H Details of ERHL Proof Checker

H.1 Semantic Interpretation of Hoare triples for call instructions

We give the semantic interpretation of the Hoare triple for call instructions $I_{src} = (x_{src} := \text{call } f_{src} \text{ args}_{src})$ and $I_{igt} = (x_{igt} := \text{call } f_{igt} \text{ args}_{igt})$ as follows:

$$\begin{align*}
\{P\} I_{src} \sim_{igt} \{Q\} & \overset{\text{def}}{=} \forall \sigma_{src},\sigma_{igt},\alpha. [P]\alpha(\sigma_{src},\sigma_{igt}) \wedge \text{Instr}(\sigma_{src}) = I_{src} \wedge \text{Instr}(\sigma_{igt}) = I_{igt} \implies \\
& ([P] \sim_{igt} f_{igt}) \wedge (\text{args}_{src} \sim_{\alpha} \text{args}_{igt}) \wedge \\
& \forall v_{src},v_{igt},\sigma'_{src},\sigma'_{igt},\alpha' \overset{\text{def}}{=} \alpha'. [T]\alpha'(\sigma'_{src},\sigma'_{igt}) \wedge v_{src} \sim_{\alpha} v_{igt} \wedge \sigma_{src} \overset{\text{v}_{igt}}{\Rightarrow} \sigma'_{src} \wedge \sigma_{igt} \overset{\text{v}_{igt}}{\Rightarrow} \sigma'_{igt} \implies \\
& \exists \alpha'' \overset{\text{def}}{=} \alpha'. [Q]\alpha''(\sigma'_{src},\sigma'_{igt}) .
\end{align*}$$

where, $v_{src} \sim_{\alpha} v_{igt}$ means $v_{src}$ is injected into $v_{igt}$ via memory injection $\alpha$, $\top$ is the assertion with no predicates and the full maydiff set (i.e., the values of every register may differ), and $\sigma \Rightarrow \sigma'$ means $\sigma$ is about to call a function, and $\sigma'$ is a possible return state after the function call for the case that the callee returns $v$. 

22
Algorithm 4 CheckEquivBeh(\(P, I_{\text{src}}, I_{\text{tgt}}\); Command)

1: \textbf{match} \(I_{\text{src}}, I_{\text{tgt}}\) with
2: \quad // call
3: \quad \mid (x_{\text{src}} := \text{call}_{\text{src}} \text{args}_{\text{src}}), (x_{\text{tgt}} := \text{call}_{\text{tgt}} \text{args}_{\text{tgt}}) \Rightarrow \text{return} (f_{\text{src}} \sim_p f_{\text{tgt}}) \land (\text{args}_{\text{src}} \sim_p \text{args}_{\text{tgt}})
4: \quad \mid (\_ := \text{call} \_\_), \_ \mid (\_ := \text{call} \_\_) \Rightarrow \text{return} \ false
5: \quad // alloca
6: \quad \mid (p_{\text{src}} := \text{alloca} \ x_{\text{src}}), (p_{\text{tgt}} := \text{alloca} \ x_{\text{tgt}}) \Rightarrow \text{return} \ x_{\text{src}} \sim_p x_{\text{tgt}}
7: \quad \mid (p_{\text{src}} := \text{alloca} \ x_{\text{src}}, \text{lnop}) \Rightarrow \text{return} \ true
8: \quad \mid (\_ := \text{alloca} \_), \_ \mid (\_ := \text{alloca} \_) \Rightarrow \text{return} \ false
9: \quad // store
10: \quad \mid (\text{store} p_{\text{src}} v_{\text{src}}), (\text{store} p_{\text{tgt}} v_{\text{tgt}}) \Rightarrow \text{return} \ (p_{\text{src}} \sim_p p_{\text{tgt}}) \land (v_{\text{src}} \sim_p v_{\text{tgt}})
11: \quad \mid (\text{store} p_{\text{src}} v_{\text{src}}), \text{lnop} \Rightarrow \text{return} \ \text{Priv}(p_{\text{src}}) \in P
12: \quad \mid (\text{store} \_), \_ \mid (\text{store} \_) \Rightarrow \text{return} \ false
13: \quad // target is load
14: \quad \mid (v_{\text{src}} := \text{load} p_{\text{src}}), (v_{\text{tgt}} := \text{load} p_{\text{tgt}}) \Rightarrow \text{return} \ p_{\text{src}} \sim_p p_{\text{tgt}}
15: \quad \mid \_, (v_{\text{tgt}} := \text{load} p_{\text{tgt}}) \Rightarrow \text{return} \ false
16: \quad // target is div
17: \quad \mid (\_, := \text{div} \_ \ b_{\text{src}}), (\_, := \text{div} \_ \ b_{\text{tgt}}) \Rightarrow \text{return} \ b_{\text{src}} \sim_p b_{\text{tgt}}
18: \quad \mid \_, (\_, := \text{div} \_ \ b_{\text{tgt}}) \Rightarrow \text{return} \ \text{IsNonzero}(P, b_{\text{tgt}})
19: \quad // misc.
20: \quad \_, \_ \Rightarrow \text{return} \ true
21: \textbf{end match}

Algorithm 5 CalcPostAssnCmd(\(P\); Assn, \(I_{\text{src}}, I_{\text{tgt}}\); Command): Assn

1: \(P' := \text{Prune}(P, I_{\text{src}}, I_{\text{tgt}})\)
2: \((P''_{\text{src}}, P''_{\text{tgt}}, M'') := \text{AddMemoryPreds}(I_{\text{src}}, I_{\text{tgt}}, P')\)
3: \(P'''_{\text{src}} := \text{AddLessdefPreds}(I_{\text{src}}, P''_{\text{src}})\)
4: \(P'''_{\text{tgt}} := \text{AddLessdefPreds}(I_{\text{tgt}}, P''_{\text{tgt}})\)
5: \textbf{return} \text{ReduceMaydiff}(P'''_{\text{src}}, P'''_{\text{tgt}}, M'')

The semantic interpretation means that the source and the target is about to call equivalent functions with equivalent arguments, and for all future extension \(\alpha'\) of the current memory injection \(\alpha\), if the return values and return states are related by \(\alpha'\), then there exists a future extension \(\alpha''\) of \(\alpha'\) for which \([Q]\) is satisfied for the return states.

In order to prove a Hoare triple, we need to guarantee that the source and target call equivalent functions with equivalent arguments, and in turn, we can rely on the fact that the callees return equivalent states. Using this semantics interpretation of calls, we proved semantics preservation of programs using the basic approach of parametric bisimulation [13]. See \(\text{/coq/proof/SimulationLocal.v:164}\) and \(\text{/coq/proof/AdequacyLocal.v:243}\) for more details.

H.2 Post-Assertion Computation for Commands

CheckEquivBeh Algorithm 4 is the CheckEquivBeh() algorithm for commands (see \(\text{/coq/def/Postcond.v:769}\)). Here, \(x_{\text{src}} \sim_p y_{\text{tgt}}\) means one of the followings holds: (i) \(x = y\) and \(x\) is not in the maydiff set; (ii) \((x_{\text{src}} \equiv y_{\text{src}}) \in P_{\text{src}}\) and \(y\) is not in the maydiff set; or (iii) \(x\) is not in the maydiff set and \((x_{\text{tgt}} \equiv y_{\text{tgt}}) \in P_{\text{tgt}}\). Basically, this implies \(x_{\text{src}} \sim_a y_{\text{tgt}}\) holds for all \(\alpha\) that is compatible with \(P\). Also, \(e_{\text{src}} \sim_p e_{\text{tgt}}\) means \(e\) and \(e'\) are of the same expression kind, e.g., they are both add, and for all matching operands \(x, x'\), \(x_{\text{src}} \sim_p x'_{\text{tgt}}\) holds. \text{IsNonzero()} performs an analysis for proving that the value is nonzero. For simplicity, we omit the details.

In essence, if the target instruction may emit an event or invoke undefined behavior, the source instruction should be similar to that. Note that we allow source load instruction with target \text{lnop} instruction, which indeed occurs in the validation of \text{mem2reg}.
Algorithm 6 Prune(P: Assn, Isrc, Itgt; Command): Assn

1: \((P_{\text{src}}, P_{\text{tgt}}, M) := P\)
2: \(P'_{\text{src}} := \text{PruneU}(P_{\text{src}}, I_{\text{src}})\)
3: \(P'_{\text{tgt}} := \text{PruneU}(P_{\text{tgt}}, I_{\text{tgt}})\)
4: \(M' := M \cup \{\text{Def}(I_{\text{src}}), \text{Def}(I_{\text{tgt}})\}\)
5: \text{return} \((P'_{\text{src}}, P'_{\text{tgt}}, M')\)

CalcPostAssn Algorithm 5 is the post-assertion computation algorithm for commands \(I_{\text{src}}, I_{\text{tgt}}\) (see /coq/def/Postcond.v:1058 for definition and /coq/proof/SoundPostcondCmd.v:309, /coq/proof/SoundPostcondCall.v:254 for the soundness of (PostAssn) for commands). At line 1, it removes predicates that no longer hold after executing the commands (Prune). Then at line 2, it adds Uniq and Priv predicates when necessary (AddMemoryPreds). At lines 3-4, it adds lessdef predicates to unary assertions for both source and target (AddLessdefPreds), and at line 5, finally tries to remove registers from the maydiff set.

Prune Algorithm 6 is the assertion pruning algorithm for commands \(I_{\text{src}}, I_{\text{tgt}}\) (see /coq/def/Postcond.v:355, /coq/def/Postcond.v:386). At lines 2-3, it removes predicates from unary assertions for both source and target (PruneU). Then at line 4, the left-hand sides of \(I_{\text{src}}, I_{\text{tgt}}\) are added to the maydiff set.

PruneU PruneU(S, I) removes the following memory predicates from a unary assertion:
- If I defines a register r, remove all predicates on r (see /coq/def/Postcond.v:373).
- If I is a store instruction \(s p := v\), remove all lessdef equations on \(s q\) for which we cannot prove \(p \perp q\). We can prove \(p \perp q\) if we have it as a predicate, or \(p \neq q\), either \(p\) or \(q\) is unique, and the other is physical (holds (see /coq/def/Postcond.v:341).
- If I is a call instruction, remove all lessdef equations on \(s q\) unless Priv(q) holds (see /coq/def/Postcond.v:409).
- Remove Uniq(p) if p leaked, i.e., copied to another register, used as the instruction’s operand (see /coq/def/Postcond.v:347), or a function is called unless Priv(p) holds (see /coq/def/Postcond.v:411).

AddMemoryPreds AddMemoryPreds(Isrc, Itgt, P) adds the following memory predicates to P:
- If (Isrc, Itgt) is allocations (psrc := alloca(···), ptgt := alloca(···)), add Uniq(psrc) to the source assertion. If psrc = ptgt then remove psrc from the maydiff set (see /coq/def/Postcond.v:893).
- If Isrc is an allocation (psrc := alloca(···)) and Itgt is lnop, then add Uniq(psrc), Priv(psrc) to the source assertion (see /coq/def/Postcond.v:905).
- If (Isrc, Itgt) are call instructions (xsrc := call(···)), (xtgt := call(···)) and xsrc = xtgt, then remove xsrc from the maydiff set (see /coq/def/Postcond.v:927).

AddLessdefPreds AddLessdefPreds(I, S) adds the following lessdef predicates to S:
- If I is a side-effect-free operation \((x := op \, args)\), add \((x \equiv op \, args)\) and \((op \, args \equiv x)\) (see /coq/def/Postcond.v:936). Note that load is regarded as side-effect-free.
- If I is a store instruction \(s p := v\), add \(s p \equiv v\) (see /coq/def/Postcond.v:939).
- If I is an allocation instruction \(s p := alloca(···)\), add \(s p \equiv undef\) (see /coq/def/Postcond.v:942).

H.3 Post-Assertion Computation for Phinodes

Phinodes do not emit any events so that any pair of phinodes behaves equivalently. Thus, for phinodes, CheckEquivBeh() checks nothing. As described in §4, CalcPostAssn() works for phinodes as follows (see /coq/def/Postcond.v:689 for the definition and /coq/proof/SoundPostcondPhinodes.v:960 for the soundness of (PostAssn) for phinodes):
- Remove old registers, and copy predicates on physical registers into those about old ones (see /coq/def/Postcond.v:673).
- Remove predicates on those registers defined in the phinodes (see /coq/def/Postcond.v:674).
- Add predicates \(x \equiv y, y \equiv x\) for each assignment \(x := y\) that is performed in the phinodes (see /coq/def/Postcond.v:680).
- Tries to reduce the maydiff set (see /coq/def/Postcond.v:686).

Misc. CheckCFG is implemented in /coq/def/Validator.v:76 and /coq/def/Validator.v:129.
CheckInit is implemented in /coq/def/Validator.v:247, and its specification is proved in /coq/proof/SimulationValid.v:1314.
CheckIncl is implemented in /coq/def/Hints.v:187, and the soundness of (Incl) is proved in /coq/proof/SoundInfrules.v:279.

I Non-Arithmetic Inference Rules

In order to support mem2reg, gvn, and 1icm, we use 9 non-arithmetic inference rules. In the Coq development, we define the 9 rules in coq/def/Infrules.v:392, and formally verified their soundness in coq/proof/SoundInfrules.v:52.
(transitivity)  
\[ e_{src} \equiv e'_{src} \quad e'_{src} \equiv e''_{src} \]
\[ e_{src} \equiv e''_{src} \]

(transitivity_tgt)  
\[ e_{tgt} \equiv e'_{tgt} \quad e'_{tgt} \equiv e''_{tgt} \]
\[ e_{tgt} \equiv e''_{tgt} \]

(substitute)  
\[ v_{T_{src}} \equiv v_{T'_{src}} \]
\[ e_{src} \equiv e_{src}[v_{T_{src}} \mapsto v_{T'_{src}}] \]

(substitute_tgt)  
\[ v_{T_{tgt}} \equiv v_{T'_{tgt}} \]
\[ e_{tgt} \equiv e_{tgt}[v_{T_{tgt}} \mapsto v_{T'_{tgt}}] \]

(substitute_rev)  
\[ v_{T_{src}} \equiv v_{T'_{src}} \]
\[ e_{src}[v_{T'_{src}} \mapsto v_{T_{src}}] \equiv e_{src} \]

(intro_ghost)  
\[ e_{src} \sim e_{tgt} \quad e_{src} \equiv \hat{g}_{src}, \hat{g}_{tgt} \equiv e_{tgt} \quad \hat{g} \text{ not used} \]

(reduce_maydiff_non_physical)  
\[ r_{T_{src}} \sim r_{T'_{tgt}} \quad \text{rT is not physical (i.e. ghost or old) and not used} \]

(reduce_maydiff_lessdef)  
\[ r_{T_{src}} \equiv e_{src} \quad e_{src} \sim e'_{tgt} \quad e'_{src} \equiv r_{T_{src}} \]
\[ r_{T_{src}} \sim r_{T'_{tgt}} \]

Figure 10. Formally Verified Inference Rules

Each of the rules is based on one of the proof rules in Fig. 10. Here, \( r_{T_{src}} \sim r_{T'_{tgt}} \) in premises means \( r_{T_{src}} \sim P \quad r_{T'_{tgt}} \) for the current assertion \( P \). Also, \( e_{src} \sim e'_{tgt} \) means \( e \) and \( e' \) are of the same expression kind, e.g., they are both add, and for all matching operands \( r_{T}, r_{T'}, r_{T_{src}} \sim P \quad r_{T'_{tgt}} \) holds for the current assertion \( P \). \( r_{T_{src}} \sim r_{T_{tgt}} \) in conclusions means you can remove \( r_{T} \) from the maydiff set.