Abstract
Production compilers such as GCC and LLVM are large complex software systems, for which achieving a high level of reliability is hard. Although testing is an effective method for finding bugs, it alone cannot guarantee a high level of reliability. To provide a higher level of reliability, many approaches that examine compilers’ internal logics have been proposed. However, none of them have been successfully applied to major optimizations of production compilers.

This paper presents CRELVM: a verified credible compilation framework for LLVM, which can be used as a systematic way of providing a high level of reliability for major optimizations in LLVM. Specifically, we augment an LLVM optimizer to generate translation results together with their correctness proofs, which can then be checked by a proof checker formally verified in Coq. As case studies, we applied our approach to two major optimizations of LLVM: register promotion (mem2reg) and global value numbering (gvn), having found four new miscompilation bugs (two in each).

CCS Concepts • Theory of computation → Hoare logic; • Software and its engineering → Compilers; Formal software verification;

Keywords LLVM, Coq, credible compilation, translation validation, compiler verification, relational Hoare logic

1 Introduction
Production compilers such as GCC and LLVM are large complex software systems, for which achieving a high level of reliability is hard. Their complexity comes in two fold. First, to generate efficient target code, they perform various complex optimizations. Second, to consume less time and memory during compilation, they are usually written in C/C++ using sophisticated data structures. Due to such complexity, it is hard to make mainstream compilers very reliable.

Although testing is an effective method for finding bugs, that alone hardly guarantees a high level of reliability. Recent random testing tools such as CSmith [53] and EMI [24] have shown their effectiveness by finding hundreds of bugs in GCC and LLVM. However, they missed bugs in the gvn and mem2reg passes of LLVM, which we discovered later (see §1.2 for details), since they treat compilers as black boxes without examining their internal logics.

In order to provide a higher level of reliability, many approaches that examine compilers’ internal logics have been proposed, none of which, however, have been successfully applied to major optimizations of production compilers. For example, while compiler verification techniques have been applied to compilers such as CompCert [26] to guarantee their formal correctness, this approach is not readily applicable to production compilers since it requires compilers to be written in the language of a proof assistant such as Coq. As another example, Alive [30] is a domain-specific language (DSL) in which one can manually write a compiler’s optimization logic and automatically verify its correctness or else generate a counterexample. Though this approach has been successfully applied to LLVM, its application is...
limited to peephole optimizations because it is hard to faithfully translate the implementation of complex optimizations into Alive and, more importantly, Alive does not support cyclic control flows such as loop. As the last example, the credible compilation [16, 33, 34, 44] and verified translation validation [14, 19, 43, 50–52] approaches augment compilers to generate translation results together with their correctness proofs, which can then be checked by a (verified) proof checker. Since a correctness proof is generated and checked at each compilation time, it provides a formal correctness guarantee for the particular translation or else finds a bug (either in the compiler code or in the proof-generation code). However, there has been only a preliminary attempt to apply these approaches to production compilers so far. (See §9 for detailed comparison.)

This paper presents Crellvm: a verified credible-compilation framework for LLVM, which can be used as a systematic way of providing a high level of reliability for major optimizations in LLVM. Specifically:

1. We design and develop a logic and its proof checker for reasoning about LLVM optimizations, called Extensible Relational Hoare Logic (ERHL), in the proof assistant Coq. This logic’s novelty lies in its representation of relational predicates as mostly unary predicates (see §2.2 for details).
2. We fully verify a semantics-preservation result for our proof checker in the style of CompCert using the Coq formalization of LLVM IR (Intermediate Representation) from the Vellvm project [55].
3. As case studies, we wrote proof-generation codes (213 and 440 SLOC\(^1\) in C++) for two major optimizations: register promotion in the \texttt{mem2reg} pass and global value numbering (GVN) with partial-redundancy elimination (PRE) in the \texttt{gvn} pass. Then we performed validation of the two optimizations for standard benchmarks, five large open-source projects and test files randomly generated by CSmith.
4. As a result, we found four new miscompilation bugs (two in each optimization). It is notable that all the four bugs had been hidden for 7-8 years until we found them.

1.1 Overview of Crellvm

Framework The framework of Crellvm works as follows. First, as shown in Fig. 1, we separate the compilation and validation phases. For compilation, as depicted in the left side of Fig. 1, we use the original optimizer to translate the source IR code \texttt{src\.ll} to the target IR code \texttt{tgt\.ll}. After the compilation, we can conduct validation, as depicted in the right side of Fig. 1. For this, we first run the optimizer extended with a proof-generation code that produces the target \texttt{tgt'\.ll} together with the proof \texttt{Proof}. Then the proof checker validates \texttt{Proof} to see whether \texttt{src\.ll} is correctly translated to \texttt{tgt'\.ll}. If the validation fails, we can see a logical reason for the failure, with which we can find a bug either in the compiler or in the proof-generation code. If the validation succeeds, we finally compare \texttt{tgt\.ll} and \texttt{tgt'\.ll} using the LLVM IR comparison tool \texttt{llvm-diff}.

There are two points to note about the framework. First, \texttt{llvm-diff} essentially performs alpha-equivalence checking, which is necessary because while \texttt{tgt\.ll} may have unnamed IR registers, \texttt{tgt'\.ll} has explicit names for all registers for proof-generation purposes. Second, since we just add proof-generation code without modifying existing compiler code except for giving names to unnamed registers, the original and proof-generating compilers are expected to generate alpha-equivalent programs, which is always checked using \texttt{llvm-diff} as described above. Therefore, programmers can use the original compiler in regular usage and then run the proof-generating one on occasion to check correctness because the former is much faster than the latter. On the other hand, compiler developers can use the latter for testing on regular basis to find bugs.

ERHL and Proof Checker For validation in Crellvm, we develop ERHL, which is a variant of relational Hoare logic [16] specialized for LLVM IR. The logic and its proof checker is extensible because (i) the logic can be extended with any custom inference rules and (ii) the proof checker can be extended with any custom automation functions that try to fill in the gaps in incomplete proofs by automatically finding appropriate inference rules, like the auto tactic in Coq.

Verification of Proof Checker In the Crellvm framework, the TCB (Trusted Computing Base) includes only the proof checker, the equality checker (\texttt{llvm-diff}) and custom inference rules. In particular, the proof-generation code in the compiler is not a part of the TCB because any incorrect proof would be invalidated by the proof checker.

We further remove the proof checker and inference rules from the TCB by implementing and verifying them in Coq. Though we currently use the (unverified) standard \texttt{llvm-diff} tool for comparing IR programs, it would also be possible to implement and verify it in Coq.

Note that verification of the proof checker and inference rules matters in practice. First, we found various corner-case
bugs in our proof checker during its verification. Second, we also found one of our two mem2reg bugs [9] during the verification of inference rules. See the example below.

\[
p := \text{alloca(); } r := \ast p
\]

\[
\text{foo}(r) \quad \rightarrow \quad \text{foo}(1 / ((\text{int})G - (\text{int})G))
\]

\[
\ast p := 1 / ((\text{int})G - (\text{int})G)
\]

Here \( G \) is the constant address of a global variable.

To see why this translation is incorrect, suppose that the function \( \text{foo}(r) \) ignores \( r \) and repeatedly prints out 0 without returning to the caller. Then division-by-zero never happens in the source program, while it does in the target. The problem here is that the mem2reg pass assumes that constant expressions never raise any undefined behavior such as division-by-zero, which is not true since \( 1 / ((\text{int})G - (\text{int})G) \) forms a valid constant expression in LLVM. Following the logic of mem2reg, we also added such a custom inference rule, which we found unsound during the verification of the rule.

It is important to note that all the programs in this paper represent LLVM IR programs and we just use C syntax to help with understanding. For example, the source program in the above transformation is undefined as a C program but well-defined as an IR program. Thus, the transformation is only unsound as an IR-to-IR transformation. The LLVM community considers such an IR-to-IR miscompilation as a definite bug even when it does not cause any C-to-Assembly miscompilation since it can potentially cause an end-to-end miscompilation for other source languages such as Swift and Rust.

**Results** We wrote proof-generation codes for register promotion in the mem2reg pass and for GVN-PRE in the gvn pass; and also partly for loop-invariant code motion in the licm pass, and 139 micro-optimizations in the instcombine pass in order to demonstrate the generality of ERHL. We then conducted validation of the optimizations for the SPEC CINT2006 C Benchmarks [15], LLVM nightly test suite, and five open-source projects: sendmail, emacs, python, gimp, and ghostscript, in total 5.3 million LOC in C. As a result, we found four new miscompilation bugs.

We present the details of mem2reg validation in §3 and gvn validation in [1, §C].

### 1.2 Advantages of Crellvm over Testing

Crellvm checks whether optimizations are performed by correct reasoning, while testing simply checks results of the test programs. This can make a difference as follows.

First, an optimization performed by incorrect reasoning may still be correct for most programs including all the test programs. In this case, testing cannot uncover the bug, while Crellvm can because it checks the underlying reasoning. For example, we found our first mem2reg bug [5] in this situation.

Specifically, the following optimization shows the bug.

\[
p := \text{alloca()}
\]

\[
\text{loop (}
\]

\[
r := \ast p; \text{foo}(r); \ast p := 42 \quad \rightarrow \quad \text{foo(undef)}
\]

This translation is incorrect because only in the first iteration of the loop is \( r \) undefined; in the remaining iterations \( r \) is 42 according to the semantics of LLVM. The mem2reg pass performs this due to faulty reasoning.

However, this faulty reasoning is often not visible in the final compiled program. The reason is that, since the input to \( \text{foo} \) is sometimes undefined, \( \text{foo} \) to be well behaved it often ignores its input \( r \) (e.g., by using an operation like \( r \& 0x0 \)). Thus this transformation is actually correct in such a program since the value of \( r \) is never used in the program. Indeed, the SPEC benchmark that provoked this faulty reasoning behaved this way, and so the faulty reasoning never led to a faulty program, which is why the bug had been hidden for such a long time.

The fact that the faulty reasoning was inconsequential in this case does not mean the bug is unimportant. As we said before, the LLVM community cares about such an IR-to-IR miscompilation and immediately fixed the bug after we reported it. Moreover, visible miscompilations due to the bug could happen in a realistic situation (see [1, §B] for a concrete example).

Second, a potential flaw introduced by miscompilation may not be exploited by the current compiler and silently disappear during the compilation. Also in this case, Crellvm can detect the bug because it checks the underlying reasoning. For example, we found the two gvn bugs [6, 7] in this situation, which had not been found for 8 years. Note that the two bugs are caused by the same reason but we counted them as two because they appear in two separate places.

Specifically, the following optimization shows the bug.

q1 := \((p + 10)\) inbounds

\[
q1 := (p + 10) \quad \rightarrow \quad \text{q2 := (p + 10)}
\]

\[
\text{bar(q1, q2)} \quad \rightarrow \quad \text{bar(q1, q1)}
\]

In the source program, \((p + 10)\) inbounds\(^4\) is defined to be undef\(^5\) when the index 10 is out of the bounds of \( p \), while \((p + 10)\) is always defined to be the computed address. Thus replacing q2 with q1 introduces more undefinedness, which is incorrect because it can be potentially exploited by subsequent optimizations. However, so far the LLVM compiler has not exploited such undefinedness, thereby causing no observable misbehaviors. Indeed this miscompilation happened many times during validation of the standard benchmarks but testing has failed to detect it.

\(^2\)Since \( \ast p \) is uninitialized, it contains undef, which is a special value representing undefinedness

\(^3\)This denotes the GetElementPtr (GEP) operation.

\(^4\)Technically, it is defined to be poison but the difference does not matter here.
We now construct a proof for the assoc-add optimization of the instcombine pass as a motivating example.

### 2.1 Translation Example

We first give an example translation of the assoc-add optimization, which is shown in the shaded part of Fig. 2. Here \( y := \text{add} \ x \ 2 \) is replaced by \( y := \text{add} \ a \ 3 \) at line 20. This translation can be beneficial because after it, the register \( x \) may no longer be used and thus \( x := \text{add} \ a \ 1 \) at line 10 may be eliminated later. This translation is also sound because (i) the assertion \( x = \text{add} \ a \ 1 \) holds throughout lines 10-20, since the registers \( a \) and \( x \) are not redefined between line 10 and 20 thanks to the Static Single Assignment (SSA) property \(^5\); and (ii) from this, we can infer that add \( x \ 2 \) = add (add a 1) 2 = add a 3 holds at line 20.

### 2.2 Proof Validation

We now construct a proof for the assoc-add translation example and validate it in ERHL.

**ERHL Proof** A formal proof of the translation is given in the box of Fig. 2. Specifically, the proof consists of a set of assertions and a list of inference rules at each line. For example, at line 20, the set of assertions is \( \{ \{ \text{MD}(\emptyset) \} \) and the list of inference rules is \( \{ \text{assoc_add}(x_{src}, y_{src}, a_{src}, 1, 2) \} \text{reduce_maydiff}(y) \). This ERHL proof captures the assertion and the inference step of the intuitive reasoning above. First, the assertion \( \text{MD}(\emptyset) \) at every line states that every register contains the same value in the source and target program states. Second, the additional assertion \( x_{src} = \text{add} \ a \ 1 \) between line 10 and line 20 states that in the source state, the value of the register \( x \) is equal to the result of \( \text{add} \ a \ 1 \). Finally, the inference rules \( \text{assoc_add}(x_{src}, y_{src}, a_{src}, 1, 2) \) and \( \text{reduce_maydiff}(y) \) at line 20 are those that need to be applied for validation at line 20. The details of the rules will be given later when we discuss the validation process.

**ERHL Assertions** Before we proceed to the validation of the proof, we discuss ERHL assertions in more details. An ERHL assertion is a triple \((S, T, M)\), where \( S \) is a set of assertions that should hold for the source state; \( T \) is for the target state; and \( M \) is an assertion relating the source and target states.

First, the source and target assertions, \( S \) and \( T \), can contain various forms of predicates. For example, \( x_{src} = \text{add} \ a \ 1 \) is a source assertion and \( y_{tgt} = \text{add} \ a \ 3 \) is a target assertion. Here and henceforth, \( x_{src} \) and \( y_{tgt} \) represent the values of the register \( x \) in the source and target states, respectively. Though we only use the equality predicate for assoc-add, we will introduce various other predicates later. It is important to note that we do not allow arbitrary assertions relating the source and target states such as \( x_{src} = y_{tgt} + 1 \).

Second, the relational assertion \( M \) is a set of registers, called the maydiff set, that may contain different values in the source and target states. In other words, all the registers not in \( M \) should have the same value in the source and target states, which we denote by \( \text{MD}(M) \):

\[
\text{MD}(M) \iff \forall x \not\in M. \ x_{src} = x_{tgt}.
\]

Note that the maydiff set is the only form of relational assertion relating the source and target states.

Finally, every ERHL assertion implicitly requires the public parts of the source and target memories to be equivalent. More precisely, we use the CompCert-style memory-injection relation \(^{28}\). Later we introduce predicates that allow private memory allocations that do not belong to the public part of memory (see §3.2).

The main novelty of ERHL assertions is that we can use the standard algorithm of (unary) Hoare logic to compute post relational assertions, because ERHL assertions are mainly unary \( i.e., \) only for the source state, or for the target state, not relating them) except for the maydiff set. This unary nature greatly simplifies the ERHL proof checker and its correctness proof. Though mainly unary, ERHL assertions can indirectly encode general forms of relational assertions (see §3.2 for details).

---

\(^{5}\)The SSA property says that for every used register \( x \), there is statically \( i.e., \) syntactically exactly one instruction that defines \( x \) \( i.e., \) assigning a value to \( x \), which moreover comes before every use of \( x \).
Proof Validation  The gray text in Fig. 2 shows the validation process performed by the ERHL proof checker, which proceeds as follows.

First, the proof checker checks that the initial assertion holds for all possible initial states. It accepts the initial assertion \( \{ \text{MD}(\emptyset) \} \) in Fig. 2 since the source and target states are initially equivalent.

Second, the proof checker checks whether the Hoare triple \( \{ P \} \ I_{\text{src}} \leadsto I_{\text{tgt}} \{ Q \} \) at each line is valid. This means that the assertion \( Q \) after the line holds for all program states resulted by executing the source and target instructions \( I_{\text{src}} \) and \( I_{\text{tgt}} \) at the line under any program states satisfying the assertion \( P \) before the line. In Fig. 2, we only explain validations at lines 10 and 20 in detail because the others are trivial.

At line 10, the proof checker first computes a strong post-assertion, \( \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, \ x_{\text{tgt}} = \text{add} a_{\text{tgt}} 1, \text{MD}(\emptyset) \} \), using our post-assertion computation algorithm. Here, the algorithm simply adds the equality predicates corresponding to the executed instructions. Then, the assertion after line 10, \( \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, \text{MD}(\emptyset) \} \), follows from the computed strong post-assertion by a simple inclusion check.

At line 20, the checker also computes a strong post-assertion, \( \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, y_{\text{src}} = \text{add} a_{\text{src}} 2, y_{\text{tgt}} = \text{add} a_{\text{tgt}} 3, \text{MD}(y) \} \). Here, the post-assertion computation adds the equality predicates corresponding to the executed instructions and also adds the register \( y \) to the maydiff set because the executed source and target instructions are not identical. Then, the proof checker applies the inference rules given by the proof. The rule \( \text{assoc}\_\text{add}(x_{\text{src}}, y_{\text{src}}, a_{\text{src}} 1, 2) \) derives \( x_{\text{src}} = \text{add} a_{\text{src}} 3 \) from \( x_{\text{src}} = \text{add} a_{\text{src}} 1 \) and \( y_{\text{src}} = \text{add} x_{\text{src}} 2 \) by associativity:

\[
\begin{align*}
(\text{assoc}\_\text{add}(x, y, a, C_1, C_2)) \\
x = \text{add} a C_1 & \quad y = \text{add} x C_2 & C = C_1 + C_2 \\
\text{add} (y = \text{add} a C) &
\end{align*}
\]

The rule \( \text{reduce}\_\text{maydiff}(y) \) removes the register \( y \) from the maydiff set because \( y_{\text{src}} = \text{add} a_{\text{src}} 3, y_{\text{tgt}} = \text{add} a_{\text{tgt}} 3 \) and \( a \) is not in the maydiff set:

\[
\begin{align*}
(\text{reduce}\_\text{maydiff}(y, e)) \\
y_{\text{src}} = e_{\text{src}} & \quad e_{\text{tgt}} = y_{\text{tgt}} \\
\text{no registers in } e & \text{ in the maydiff set} & \text{remove } y \text{ from the maydiff set}
\end{align*}
\]

Then, the assertion after line 20, \( \{ \text{MD}(\emptyset) \} \), easily follows by a simple inclusion check.

Finally, the proof checker checks whether the same observable events (i.e., the same sequence of system calls) are produced at each line. It is the case in Fig. 2 because at line 20, no observable events are produced; and at the other lines, the source and target instructions are identical and the maydiff sets are empty implying that the source and target states are equivalent. In particular, at line 21, the proof checker explicitly checks that the same value is passed to the function \( \text{foo} \) because the function may produce observable events.

### 2.3 Proof Generation

Now we explain how we generate proofs for \( \text{assoc}\_\text{add} \).

**Algorithm** Algorithm 1 shows the \( \text{assoc}\_\text{add} \) optimization algorithm implemented in LLVM’s instcombine pass, which is presented in a rather functional style for presentation purposes. Specifically, \( \text{AssocAdd}(F) \) optimizes each function definition \( F \) as follows (ignore the boxes for now, which are the proof-generation code).

- **[Line A1]** Find an instruction of the form \( l_2: y := \text{add} x C_2 \) with \( C_2 \) constant. In Fig. 2, \( 20: y := \text{add} x 2 \) can be picked.
- **[Line A2]** Check if \( x \) is defined by an instruction of the form \( l_1: x := \text{add} a C_1 \) with \( C_1 \) constant. Here, \( \text{FindDef}(F, x) \) finds the instruction that defines the register \( x \)\(^6\). In Fig. 2, \( 10: x := \text{add} a 1 \) is picked. **[Lines A3-A4]** If it is the case, compute the constant \( C = C_1 + C_2 \) and replace the instruction at \( l_2 \) with \( y := \text{add} a C \). In Fig. 2, the instruction at line 20 is replaced by \( y := \text{add} a 3 \).

**Proof Generation** Once we understand the \( \text{assoc}\_\text{add} \) optimization algorithm, it is quite straightforward to write the proof-generation code given in the boxes of Algorithm 1.

- **[Line A5]** Add the assertion \( x_{\text{src}} = \text{add} a_{\text{src}} C_1 \) at every line between \( l_1 \) and \( l_2 \). In Fig. 2, the assertion \( x_{\text{src}} = \text{add} a_{\text{src}} 1 \) is added at every line between 10 and 20. **[Line A6]** Add the inference rule \( \text{assoc}\_\text{add}(x_{\text{src}}, y_{\text{src}}, a_{\text{src}}, C_1, C_2) \) at line \( l_2 \). In Fig. 2, the rule \( \text{assoc}\_\text{add}(x_{\text{src}}, y_{\text{src}}, a_{\text{src}}, 1, 2) \) is added at line 20. **[Line A9]** Enable the custom automation function named \( \text{reduce}\_\text{maydiff} \), which tries to find and insert appropriate \( \text{reduce}\_\text{maydiff} \) rules when necessary. In Fig. 2, it figures out that \( \text{reduce}\_\text{maydiff}(y) \) is needed at line 20.

**Automation** An automation function works as follows. When it remains to prove \( Q \) implies \( Q' \), the designated automation function examines the assertions \( Q \) and \( Q' \) and tries to find a sequence of inference rules that derives \( Q' \) from \( Q \). For example, at line 20 in Fig. 2, after applying the \( \text{assoc}\_\text{add} \) rule it remains to prove \( Q = \{ x_{\text{src}} = \text{add} a_{\text{src}} 1, y_{\text{src}} = \text{add} x_{\text{src}} 2, y_{\text{src}} = \text{add} a_{\text{src}} 3, y_{\text{tgt}} = \text{add} a_{\text{tgt}} 3, \text{MD}(y) \} \) implies \( Q' = \{ \text{MD}(\emptyset) \} \), from which the automation function finds the inference rule \( \{ \text{reduce}\_\text{maydiff}(y) \} \).

---

\(^6\) The instruction that defines \( x \) is unique thanks to the SSA property.
Automation functions can greatly simplify proof generation in certain cases. A good example is transitivity reasoning because it is much harder at proof generation time than at validation time. For instance, given a goal $x = y$, to prove it by transitivity, we have to figure out intermediate equations (e.g., $x = a$, $a = b$, $b = y$). For this, at proof generation time, we have to write a code that (sometimes recursively) search through the compiler internal states, which is tightly coupled with the compiler code; while at validation time, since a concrete pre-assertion is given, we just need to search through the equations given in the pre-assertion, which is completely generic and can be easily automated.

It is important to note that automation functions do not need to be verified (i.e., not a part of TCB) because all they do is to insert inference rules, which is a part of proof construction, not that of proof checking.

3 Register Promotion

Register-promotion optimization, the `mem2reg` pass of LLVM, transforms memory accesses to locally allocated memory locations into register accesses, provided that the memory location is only used for loads and stores (i.e., never copied or escaped). This translation is important because register accesses are cheaper than memory accesses, and are subject to further optimizations.

The optimization also performs the SSA transformation so that the target program has the SSA property. This transformation is necessary because there can be statically multiple stores to a single location, and just transforming them to writes to a single register would break the SSA property.

In this section, we show how we generate and validate proofs for the `mem2reg` optimization.

3.1 Translation Example

The shaded part of Fig. 3 shows an example translation of the `mem2reg` optimization, where all memory accesses via $p$ is promoted to register accesses to $p1$ and uses of $42$ and $x$. Note that $c$, $x$, and $q$ are the function parameters.

More specifically, the allocation, load and store instructions to $p$ are removed (ignore `lnop` for now), and every use of the result of a load from $p$ is replaced by the value stored in $p$ at the time of the load. For example, in Fig. 3, the compiler figures out that $p$ contains $42$ at line $20$ (and so does the register $a$) due to the store of $42$ in $p$ at line $11$, and thus replaces the use of $a$ with $42$ at line $21$. This translation is sound because (i) the assertion $a_{src} = 42$ holds from line $11$ to $20$; and (ii) $a_{src} = 42$ holds from line $20$ to $21$. Note that we use the blue color for assertions about $p$ and the red color about the registers containing the value loaded from $p$.

In a case where the value stored in $p$ depends on the control flow, the compiler inserts a `phantom` node, which is a unique construct in the SSA form and assigns different values to a register depending on the control flow. For example, at line $40$, $p$ contains $42$ if the control comes from $B_{left}$, and $x$ if it comes from $B_{right}$. In this case, the compiler inserts a `phantom` $p1 := \phi(42, x)$ at the beginning of $B_{exit}$, which defines $p1$ to be $42$ when coming from $B_{left}$ and $x$ when coming from $B_{right}$. Then, the use of the register $b$ containing the loaded value from $p$ can be replaced by $p1$ at line $41$.

3.2 ERHL Proof

We show how to turn the intuition for soundness into a formal ERHL proof, which is given in the unshaded part of Fig. 3 including `lnop`. Here we omit the inference rules for simplicity, which will be shown later. We introduce interesting features of ERHL by explaining each part of the proof.

**Logical No-Ops for Instruction Alignment** Logical no-ops, denoted `lnop`, can be inserted as part of a proof in order to align source and target instructions when their alignment is broken by a translation. For example, in Fig. 3, `lnop` is inserted at lines $10, 11, 20, 30, 40$ because the instructions there are removed by `mem2reg`.

Note that `lnop` is logical because it is absent from the real IR code and used only for validation purposes. During validation, it is interpreted as doing nothing (i.e., no-op).

**Ghost Registers for Relational Assertions** For complex optimizations, we often need to state relational properties (i.e., relating the source and target states) in a proof. For example, in Fig. 3, we need to state $a_{src} = p1_{tgt}$ before line 42.

![Figure 3](image-url)
40, which relates a value in the source (*p_src) with that in the target (p1tgt).

Though not directly supported in ERHL, such relational properties can be encoded using ghost registers. Specifically, we can encode e_src = e’_tgt, using a fresh ghost register ̂

\{ e_src = ̂e_src, ̂e_tgt = e’_tgt, MD(M) \} with ̂\notin M

Since the ghost register ̂is not in the maydiff set, we have ̂e_src = ̂e_tgt, which, by transitivity, implies e_src = e’_tgt. For example, in Fig. 3, the assertion \{ *p_src = ̂p_src, ̂p_tgt = p1_tgt, MD(p, p1, a, b) \} before line 40 effectively states *p_src = p1_tgt.

Note that the ghost register ̂has nothing to do with the physical register p and we use (−) for ghost registers to distinguish them from the physical ones.

Ghost registers are logical ones that do not exist in physical program states. Instead, they are existentially quantified in the semantics of ERHL assertions. More specifically, a pair of source and target states (σ_src, σ_tgt) satisfies an ERHL assertion P, if there exists a pair of source and target ghost register files (rs_src, rs_tgt) such that the pair of σ_src extended with rs_src and σ_tgt extended with rs_tgt satisfies P.

Taking ghost registers into account, the proof in Fig. 3 has five relational assertions: *p_src = 42_tgt between line 11 and the end of B_left, σ_src = 42_tgt between line 20 and line 21, *p_src = x_tgt between line 30 and the end of B_right, *p_src = p1_tgt between the beginning of B_exit and line 41, and ̂p_tgt = p1_tgt between line 40 and line 41. It is easy to see that these assertions correctly capture the relational properties caused by executing different instructions in the source and target.

Uniqueness Predicate for Isolation We can use the predicate Uniq in order to state that an address is completely isolated. For example, in Fig. 3, we have Uniq(p_src) at every line. It means that in the source, if p contains an address ℓ, (i) ℓis not aliased with any address stored in the other registers or in memory (i.e., they point to disjoint memory blocks); and (ii) ℓis private (i.e., it is not in the public part of the memory injection) meaning that it has no corresponding equivalent address in the target. In other words, the address contained in p should point to a completely isolated block.

Note that ERHL also supports memory predicates weaker than Uniq(p): (i) the privateness predicate, Priv(p), which states that the address in p is private; and (ii) the noalias predicate, p ⊥ q, which states that the addresses in p and q point to disjoint memory blocks.

Maydiff Sets Finally, we have MD({ p, p1, a, b }) at every line because these registers are removed or introduced so that they have different values in the source and target.

3.3 Proof Validation

We show how our proof checker validates the ERHL proof.
in the post-assertion and 1nop in the target. In this case, it computes a post-assertion by (i) removing all assertions containing \( b_{rc} \) because \( b_{src} \) is updated, (ii) adding \( b_{src} = *p_{src} \) and then (iii) adding \( b \) to the maydiff set. Thus, we have \((\text{Uniq}(p_{src}), *p_{src} = \hat{p}_{src}, \hat{p}_{tgt} = p_{1tg}, b_{src} = *p_{src}, \text{MD}(p, p, 1, a, b))\).

At this point, the proof gives the rules \( \text{intro\_ghost}(b, \hat{p}) \), which adds \( \{ \hat{p}_{src} = \hat{b}_{src}, \hat{p}_{tgt} = \hat{p}_{tgt} \} \) because \( b \) is not in the maydiff set. Then the proof gives appropriate transitivity rules, which derives \( b_{src} = *p_{src} = \hat{p}_{src} = b_{src} \) and \( b_{tgt} = \hat{p}_{tgt} = p_{1tg} \), from which the assertion after line 40 trivially follows.

**Equivalence Checking**  At lines 21, 31 and 41, the proof checker checks that the behaviors of the source and target instructions are equivalent. Specifically, it checks that equivalent values are passed to the same function (at line 21) and stored at equivalent public locations (at lines 31, 41) because these can be observed by other functions. These checks succeed thanks to the relational assertions \( \{ a_{src} = \hat{a}_{src}, \hat{a}_{tgt} = 42 \} \) at line 21, \( \{ b_{src} = \hat{b}_{src}, \hat{b}_{tgt} = p_{1tg} \} \) at line 41.

**Alias Checking**  At lines 21, 31, and 41, the proof checker computes post-assertions using memory-alias information. In general, for a function call or store instruction, it updates the public part of the memory. The proof checker removes all assertions stored in the memory locations \( p \) (i.e., those including \( *p \)) unless (i) \( p \) is in the private part of the memory (i.e., \( \text{Priv}(p) \) or \( \text{Uniq}(p) \)), or (ii) \( p \) is not aliased with \( q \) (i.e., \( p \perp q \)) in case \( *q \) is updated by the store instruction. At lines 21, 31, and 41, thanks to \( \text{Uniq}(p_{src}) \), the assertions about \( *p_{src} \) are preserved.

Note that in the example of Fig. 3, it suffices to use \( \text{Priv}(p_{src}) \) instead of \( \text{Uniq}(p_{src}) \). However, in general when more than one location is promoted, we need to know that those promoted locations are not aliased with each other, which follows from \( \text{Uniq}(p_{src}) \) for each promoted location \( p \). Also for the sake of performance, we use \( \text{Uniq} \) instead of introducing \( \perp \) between each pair of promoted locations.

### 3.4 Proof Generation

LLVM’s \text{mem2reg} pass consists of three algorithms: the general register-promotion algorithm and two specialized ones optimized for efficiency: one for the case that the promotable location is stored at most once and the other for the case that the location is used only within a single block. In this section we explain the general algorithm and its proof-generation code. Note that we also validate the two specialized algorithms in the same way since they are just degenerate cases.

Algorithm 2 shows the general algorithm implemented in LLVM’s \text{mem2reg} pass and the proof-generation code, given in the box, that we inserted. Note that we do not modify the existing compiler code at all and only add the proof-generation code. In detail, the overall algorithm including proof generation works as follows.

### Algorithm 2 RegisterPromotion(F:Function)

\[
\text{A1: for } i : p = \text{alloca}(i) \in F \text{ if } p \text{'s uses are loads/stores only do }
\]

\[
\text{A2: InsertEmptyPhinodesFor(F, p)}
\]

\[
\text{// Add the } \phi \text{-nodes to the maydiff set globally}
\]

\[
\text{A3: Remove(lu), Nop(lu, tgt), } \text{Inf}(\text{intro\_ghost}(p, \text{undef}), lu)
\]

\[
\text{A4: } \text{Assn}(\text{Uniq}(p_{src}), \text{MD}(p)) \text{global}
\]

\[
\text{A5: } \text{Assn}(\text{MD}(p)) \text{global}
\]

\[
\text{A6: while } \text{NonEmpty}(WL) \text{ do }
\]

\[
\text{A7: for } (l, j : i) \in \text{Instr}(B) \text{ do }
\]

\[
\text{A8: Replace(F, l, j, x, v), } \text{Assn}((x_{src} = x_{src}, x_{tgt} = x_{tgt}), lu, lu)
\]

\[
\text{A9: end if}
\]

\[
\text{A10: end for}
\]

\[
\text{A11: for } B' \in \text{Successor}(B) \text{ do }
\]

\[
\text{A12: if } B' \text{ has a } \phi \text{-node (z := } \phi(x, \cdot) \text{) inserted at line A2 then }
\]

\[
\text{A13: z[B] := v, } \text{Assn}(\text{MD}(p_{src}), \text{MD}(p_{src}), \text{MD}(p_{src}), \text{MD}(p_{src}))
\]

\[
\text{A14: end if}
\]

\[
\text{A15: end for}
\]

\[
\text{A16: end while}
\]

\[
\text{A17: Auto(transitivity)}
\]

### Promotable Allocation  [Line A1] We find a promotable allocation \( p \) at line \( l_a \).  [Line A2] Then we insert empty \( \phi \)-nodes wherever needed\(^1\), and add them to the maydiff set globally (i.e., at every line).  [Line A3] We also remove the allocation, insert 1nop at that line, and add \( \text{Uniq}(p_{src}) \) and \( \text{MD}(p) \) globally.  [Line A4] In addition, we add the rule \( \text{intro\_ghost}(p, \text{undef}) \) because the initial value \( \text{undef} \) may be used by some load from \( *p \) (though it does not happen in Fig. 3). In that case, the code at line A13 would introduce \( \{ *p_{src} = \hat{p}_{src}, p_{tgt} = \text{undef} \} \) at line \( l_a \), which will be inferred with the help of \( \text{intro\_ghost}(p, \text{undef}) \).

For example, in Fig. 3, the empty \( \phi \)-node \( p_1 := \phi(-, -) \) is inserted in \( B_{exit} \) and \( p_1 \) is added to the maydiff set globally;

\(^1\) The optimization uses the “dominance frontier” algorithm [18] in order to list up the blocks that require a \( \phi \)-node. We omit the details for brevity.
then the allocation at line 10 is removed, 1nop is inserted, Uniq(\(p_{\text{src}}\)) is added and \(p\) is added to the maydiff set globally; and finally intro_ghost(\(\hat{p}, \text{undef}\)) is added at line 10.

**Block Traversal** [Lines A5-A7] We traverse the blocks in DFS order starting from the entry block using the worklist \(WL\). An element of \(WL\) consists of triple \((B, v, l)\), where \(B\) is the block to visit, \(v\) is the value in \(\ast p\) at the beginning of \(B\), and \(l\) is the line number where the value \(v\) is stored in \(\ast p\). [Line A5] Initially, we put \((\text{Entry}(F), \text{undef}, \text{line } l_0)\) in \(WL\) and mark the entry block \(\text{Entry}(F)\) as visited. [Lines A6-A7] Then we process the blocks in \(WL\) one by one. For example, in Fig. 3, \(B_{\text{entry}}, B_{\text{left}}, B_{\text{exit}}, \) and \(B_{\text{right}}\) are visited in order.

**Instruction Traversal** [Line A8] Given a work \((B, v, l)\), we traverse each instruction \((i_1 : i)\) in the block \(B\) as follows.

**Store Instructions** [Lines A9-A11] If \(i\) is a store instruction \(\ast p := w\) (line A9), then we remove the instruction (line A10) and update \(v\) with the stored value \(w\) (line A11). The proof-generation code inserts 1nop, adds intro_ghost(\(\hat{p}, w\)) (line A10) and updates \(l\) with the store location \(l_1\) (line A11).

For example, in Fig. 3, when \(i\) is 11: \(\ast p := 42\), the store \(i\) is replaced by 1nop; intro_ghost(\(\hat{p}, 42\)) is added at line 11; and \(v\) and \(l\) are updated to be 42 and line 11.

**Load Instructions** [Lines A12-A18] If \(i\) is a load instruction \(x := \ast p\) (line A12), then we replace all the uses of \(x\) with the current value \(v\) (lines A15-A17), and remove the load instruction (line A18). The proof-generation code adds the relational assertion \(\ast p_{\text{src}} = v_{\text{tgt}}\) from the store site \(i_{\text{src}}\) to the load site \(i_{\text{tgt}}\) (line A13) and the rule intro_ghost(\(x, \hat{p}\)) at \(l_i\) (line A14). Then it adds \(x_{\text{src}} = v_{\text{tgt}}\) from the load site \(i_{\text{src}}\) to every use site \(i_j\) (line A16). It also inserts 1nop at \(l_i\) in the target and adds \(x\) to the maydiff set globally (line A18).

For example, in Fig. 3, when \(i\) is 20: \(a := \ast p\), the load \(i\) is replaced by 1nop; the use of \(a\) is replaced by the current value 42 at line 21; \(\ast p_{\text{src}} = 42_{\text{tgt}}\) is added from 11 to 20; intro_ghost(\(\hat{a}, \hat{p}\)) is added at line 20; \(a_{\text{src}} = 42_{\text{tgt}}\) is added from 20 to 21; and \(a\) is added to the maydiff set globally.

**Successors** [Lines A21-A28] Now we handle the successor (i.e., outgoing) blocks of the current block \(B\). [Line A21] We traverse each successor block \(B'\) as follows.

- If \(B'\) has a \(\phi\)-node (\(z := \phi(\cdots)\)) that is inserted by the code at line A2 (line A22), then we update the \(\phi\)-node \(z\)’s component for the incoming block \(B\) with the value \(v\) of \(\ast p\) at the end of \(B\) (line A23). In addition, if \(B'\) has not been visited yet, we add \((B', z, \text{Begin}(B'))\) to the worklist \(WL\) (line A24). Since the value \(v\) is used at the \(\phi\)-node \(z\), we add \(\ast p_{\text{src}} = v_{\text{tgt}}\) from store location \(l\) to the end of \(B\) (line A23). For example, in Fig. 3, when \((B, B')\) is \((B_{\text{left}}, B_{\text{exit}})\), the \(\phi\)-node \(p_1 := \phi(\cdots)\) is updated to \(p_1 := \phi(42, \cdots)\) and \(\text{Begin}(B_{\text{exit}})\) is added to the worklist \(WL\). Also \(\ast p_{\text{src}} = 42_{\text{tgt}}\) is added from line 11 to the end of \(B_{\text{left}}\).

- If \(B'\) has no such \(\phi\)-node (line A25), then we simply add \((B', v, l)\) to the worklist \(WL\) if \(B'\) has not been visited yet (line A26). For example, when \((B, B')\) is \((B_{\text{entry}}, B_{\text{right}})\), the triple \((B_{\text{right}}, 42, \text{line } 11)\) is added to the worklist.

[Line A28] Finally the successor \(B'\) is marked as visited.

**Inference Rules** As shown in §3.3, the complete proof for mem2reg contains two inference rules, intro_ghost and transitivity. The intro_ghost rules are explicitly added by the proof-generation code shown in Algorithm 2, while the transitivity rules are automatically added by the automation function transitivity (line A32).

### 4 Reasoning about Cyclic Control Flows

In this section, using an example of fold-\(\phi\) optimization, we discuss a challenge in ERHL validation arising from cyclic control flows and show how to address it.

**Fold-\(\phi\) Optimization** Consider the translation below performed by the fold-\(\phi\) optimization of instcombine, and its ERHL proof. This translation basically replaces \(z := \phi(x, y)\) with \(z := \phi(a, z) + 1\) using the temporary variable \(t := \phi(a, z)\). This removes the dependence of \(z\) on \(x\) and \(y\), thereby allowing \(x\) and \(y\) to be eliminated away by a subsequent optimization unless they are used elsewhere. This translation is correct because we have \(z_{\text{src}} = \phi(x_{\text{src}}, y_{\text{src}}) = \phi(a_{\text{src}} + 1, z_{\text{src}} + 1) = \phi(a_{\text{tgt}} + 1, z_{\text{tgt}} + 1) = \phi(a_{\text{tgt}}, z_{\text{tgt}}) + 1 = t_{\text{tgt}} + 1 = z_{\text{tgt}}\).

Note that a set of \(\phi\)-nodes can appear at the beginning of a block and are executed simultaneously. For example, in the source program above, when control flows from \(B_2\) to itself, the \(\phi\)-nodes \(z\) and \(w\) are set to the old values of \(y\) and \(z\) just before executing the \(\phi\)-nodes, respectively. In particular, \(w\) is set to the old value of \(z\), not the new value stored in \(z\) at the first \(\phi\)-node, and thus \(w\) contains the same value in the source and target programs.

**Challenge** The challenge here is that we should be able to express and reason about both old and new values of \(z\). This is because \(z\) is used and defined at the same time in the \(\phi\)-nodes, which is only possible due to cyclic control flows in the SSA form. Specifically, the proof checker should derive
something like \( z_{src} = y_{src} \) and \( w_{src} = \text{old}(z_{src}) \) as part of the strong post-condition after the \( \phi \)-nodes when control flows from \( B_2 \).

We address this challenge by expressing the old value of register \( \text{old}(z_{src}) \) using a ghost variable. Specifically, we reserve a set of ghost registers, denoted \( \hat{r} \) and called old registers, for all registers \( r \) to represent the old value of \( r \). Note, however, that old registers are just normal ghost registers and technically have nothing to do with physical old values of the corresponding registers.

**Proof Validation** We show how the ERHL proof checker systematically uses the old registers by validating the above proof in the most interesting case: the \( \phi \)-nodes of \( B_2 \) when control comes from itself. First, it computes a post-assertion from the pre-assertion \( \{ y_{src} = z_{src} + 1, MD(t, \hat{t}) \} \) as follows.

1. It removes all assertions about old registers from the pre-assertion and copies all assertions about current registers into those about old ones.
   \[
   \{ y_{src} = z_{src} + 1, \hat{y}_{src} = \hat{z}_{src} + 1, MD(t, \hat{t}) \}.
   \]

2. It computes a post-condition from this new assertion as if the assignments \( z := y, w := z \) executed in the source and \( t := z, w := z \) in the target. Specifically, it (i) removes source assertions about \( z \), \( w \), and target ones about \( t \), \( w \), because those registers are updated; (ii) adds \( t, z \) to the maydiff set because they are updated differently in the source and target (note that \( w \) is updated equivalently since \( z \) is not in the maydiff set); and (iii) adds the equalities corresponding to the executed assignments. Thus we have
   \[
   \{
   \hat{y}_{src} = \hat{z}_{src} + 1, \hat{y}_{src} = \hat{z}_{src} + 1, \hat{w}_{src} = \hat{z}_{src},
   t_{tgt} = \hat{z}_{tgt}, w_{tgt} = \hat{z}_{tgt}, \text{MD}(t, \hat{z}, z)
   \}.
   \]

Then the proof gives the rule \( \text{intro}_{\text{ghost}}(z, z + 1) \), which adds \( \{ z_{src} + 1 = \hat{z}_{src}, \hat{z}_{tgt} = \hat{z}_{tgt} + 1 \} \) because \( z \) is not in the maydiff set. Then the automation function derives \( \{ z_{src} = \hat{z}_{src}, z_{tgt} = \hat{z}_{tgt} + 1 \} \) by transitivity: \( z_{src} = \hat{y}_{src} = \hat{z}_{src} + 1 = \hat{z}_{src} \) and \( \hat{z}_{tgt} = \hat{z}_{tgt} + 1 = t_{tgt} + 1 \). Then it eliminates \( t \) from the maydiff set after eliminating all assertions about \( t \), which is sound because \( \hat{t} \) is just a ghost variable that has nothing to do with a physical value of the register \( t \). Finally, the assertion after the \( \phi \)-nodes \( \{ z_{src} = \hat{z}_{src}, z_{tgt} = t_{tgt} + 1, \text{MD}(t, z) \} \) trivially follows by a simple inclusion check.

### 5 ERHL Proof Checker and Logic

In this section, we explain the proof checker in terms of the ERHL logic, and describe the soundness of the proof checker using the semantic interpretation of the logic. All our results are formally verified in Coq (see [1, §H] for details).

**Proof Rules** The checker is based on the proof rules presented in Fig. 4. The checker is given the source and target programs \( Prg_{src}, Prg_{tgt} \) and a translation proof \( \Psi \), and tries to deduce \( Prg_{src} \sim Prg_{tgt} \) using the (Sim) rule. Here, \( Entry(F) \) denotes the entry block of the function \( F \); \( Prg[F].\{B,i\} \) the \( i \)-th instruction of the block \( B \) in \( F \); and \( Prg[F].\{B,B'\} \) the assignment instructions of the \( \phi \)-nodes of \( B' \) when control comes from \( B \) (e.g., in the source program in §4, \( Prg[F].\{B_1,B_2\} = \{ z := x, w := 42 \} \)). Also, \( \Psi[F].a[B,i] \) denotes the assertion in the proof \( \Psi \) just before the \( i \)-th instruction of \( B \) in \( F \) (it denotes the last assertion when \( i = -1 \)).

The checker first checks if \( Prg_{src} \) and \( Prg_{tgt} \) have the same CFG (CheckCFG), the assertion in the entry is satisfied by the initial states for each function (CheckInit), and the Hoare triple \( \{ P \} \text{src} \sim I_{tgt} \{ Q \} \) is valid for all matching intra-block commands \( I_{src} \) and \( I_{tgt} \) and their pre- and post-assertions \( P \) and \( Q \) given by \( \Psi \). For example, in Fig. 2, it checks at line 20 if \( \{ x_{src} = a_{src} + 1, MD(0) \} \) \( y := x + 2 \Rightarrow y = a + 3 \) (MD(0)) is valid. It also checks for each inter-block edge from \( B \) to \( B' \) that \( \{ P \} Prg_{src}.\phi[B,B'] \sim Prg_{tgt}.\phi[B,B'] \{ Q \} \) is valid, where \( P \) is the last assertion in \( B \) and \( Q \) is the first assertion in \( B' \).

To validate a Hoare triple \( \{ P \} I_{src} \sim I_{tgt} \{ Q \} \), the checker first computes a post-assertion \( Q_0 \) with \( \{ P \} I_{src} \sim I_{tgt} \{ Q_0 \} \) using the rule PostAssn (see [1, §H] for the definition of CheckEquivBeh and CalcPostAssn). Then it suffices to validate \( Q_0 \Rightarrow Q \) by the rule Consequence.

For this, using the rules ApplyInf and Trans, the checker iteratively applies a sequence of inference rules \( rule_1, \ldots, rule_n \) (either given by \( \Psi \) or generated by an automation function) and deduces \( Q_0 \Rightarrow Q_n \), where \( Q_1 = \text{ApplyInf}(rule_1, Q_{i-1}) \).

Finally, the checker validates \( Q_n \Rightarrow Q \) using the rule Incl, where CheckIncl performs a simple inclusion check.

**Semantic Interpretation** For the soundness of the proof checker, we give the semantic interpretation of the top-level judgment as semantics preservation, or behavior refinement:

\[
\boxed{[Prg_{src} \sim Prg_{tgt}] \overset{\text{def}}{=} \text{Beh}(Prg_{src}) \supset \text{Beh}(Prg_{tgt})}.
\]

The soundness of (Sim) is proved using a local simulation in the style of [22], which is a simplification of parametric
bismulation [21]. First, we show that CheckInit(P) implies:
\[ \forall \sigma_{src}, \sigma_{tgt}, \alpha. \text{Init}(\sigma_{src}) \land \text{Init}(\sigma_{tgt}) \implies [P]_{\alpha}(\sigma_{src}, \sigma_{tgt}) \].

Here, \( \text{Init}(\sigma) \) means \( \sigma \) is a possible initial state of a function call. \([P]\) is the semantic interpretation of the assertion \( P \) (see [1, §G] for details), and \( \alpha \) is a CompCert-style memory injection [28], which basically maps a memory block in the source to an equivalent one in the target.

Second, we give the semantic interpretation of the Hoare triple for non-call instructions \( I_{src}, I_{tgt} \) as a simulation step:
\[ [[\{P\} I_{src} \sim I_{tgt} \{Q\}]] \overset{\text{def}}{=} \forall \sigma_{src}. \text{Instr}(\sigma_{src}) = I_{src} \implies \forall \sigma_{tgt}. \text{Instr}(\sigma_{tgt}) = I_{tgt} \implies \forall \alpha, \sigma'_{tgt}, \epsilon. [P]_{\alpha}(\sigma_{src}, \sigma_{tgt}) \land \sigma_{tgt} \xrightarrow{\epsilon} \sigma'_{tgt} \implies \exists \sigma'_{src}, \alpha'. [Q]_{\alpha'}(\sigma'_{src}, \sigma'_{tgt}) \land \sigma_{src} \xrightarrow{\epsilon} \sigma'_{src} \land \alpha \subseteq \alpha' \]

where, \( \text{Instr}(\sigma) \) is the next instruction to execute in the program state \( \sigma \), and \( \sigma \xrightarrow{\epsilon} \sigma' \) means the state \( \sigma \) steps to \( \sigma' \) emitting an observable event \( \epsilon \). Also, \( \subseteq \) is the extension relation of memory injection.

For call instructions \( I_{src}, I_{tgt}, [[\{P\} I_{src} \sim I_{tgt} \{Q\}]] \) basically states that \( Q \) is satisfied by all possible equivalent returns states when an arbitrary function is called from states satisfying \( P \) (see [1, §H] for details). We followed the basic approach of parametric bismulation [21].

The semantic interpretation of \( \implies \) is as follows:
\[ [[Q] \implies [Q']] \overset{\text{def}}{=} \forall \sigma_{src}, \sigma_{tgt}, \alpha. [Q]_{\alpha}(\sigma_{src}, \sigma_{tgt}) \implies \exists \sigma'_\alpha. [Q']_{\alpha'}(\sigma'_{src}, \sigma'_{tgt}) \land \alpha \subseteq \alpha' \]

For the soundness of (ApplyInf), every custom rule should satisfy that [[Q] \implies ApplyInf(rule, Q)] holds for all Q.

6 Implementation
We developed the Crellvm framework for LLVM 3.7.1.

Coverage We wrote proof-generation code for register promotion in the mem2reg pass and GVN-PRE in the gvn pass implemented in the following files respectively:
- lib/Transforms/Utils/PromoteMemoryToRegister.cpp
- lib/Transforms/Scalar/GVN.cpp

For mem2reg, we covered the entire file, and for gvn, we covered all functions except for the following functions: SimplifyInstruction, processLoad, splitCriticalEdges and MergeBlockIntoPredecessor. These functions are not part of the main GVN-PRE algorithm because they are not technically related to value numbering (i.e., neither using nor constructing value numbering). Other reasons why we omitted them are because SimplifyInstruction is a common function that just consists of many peephole optimizations and the others use features that are not currently supported by Crellvm: processLoad uses the alias analysis module and splitCriticalEdges and MergeBlockIntoPredecessor change control-flow graphs. Note that the reason why those functions are used by the gvn pass is because they transform programs in such a way that opportunities for GVN-PRE optimizations are increased.

To demonstrate the generality of ERHL logic and the proof checker, we also covered a part of the loop-invariant code motion (licm) pass that can be currently supported by Crellvm and 139 micro-optimizations of the instruction combining (instcombine) pass (see [1, §D] for details).

Proof-Generation Code We explicitly mark as “not supported” for translations using operations not supported by Vellvm, or relying on deep analyses such as division-by-zero and alias analyses.

Fig. 5 shows the SLOC in C++ of the compiler and proof-generation code for each pass. The SLOC ratio of the proof-generation code to that of the corresponding compiler code is 37.5% for mem2reg, 40.3% for gvn, 40.5% for licm, and 193.3% for instcombine. The Crellvm infrastructure for proof-generation consists of 1,708 lines for common library and 15,980 lines for JSON serialization library, of which 72.2% is automatically generated from 2,079 SLOC in a simple DSL.

Inference Rules In the proof checker we installed 221 custom inference rules, of which 202 are arithmetic rules like assoc_add. All 9 non-arithmetic rules used for mem2reg, gvn, and licm, including transitivity and intro_ghost, are formally verified in Coq (see [1, §I] for details).

Verification of Proof Checker In order to reduce TCB, we formally verified the soundness of the proof checker in Coq (see §5). It is worth noting that we achieved the same kind of guarantee as CompCert for the translations that are validated by the proof checker using only verified inference rules.

We used the formal semantics of LLVM IR from the Vellvm project [55], but significantly upgraded the semantics in various ways. In particular, Vellvm used the CompCert memory model [28] version 1.9 and we upgraded it to version 2.4 in order to use the notion of permission in the LLVM semantics; and added the switch instruction to the formalization of LLVM IR. Note that Vellvm has a simpler memory model than the LLVM’s informal official one (e.g., pointer-equality tests and pointer-integer casts are more undefined).

In total, our Coq development consists of 25,970 SLOC. The proof checker is 2,987 SLOC, and its verification is 18,934 SLOC. The 221 inference rules are 2,193 SLOC, and the verification of 9 rules took 1,856 SLOC. Note that the underlying semantics of Vellvm consists of 39,307 SLOC.

Experience Writing proof-generation code was an iterative process: we had to repeat bug-fix processes many times. When proof checking fails, it tells us a logical reason for
the failure so that we could easily identify the bug in proof-generation code (or else in the compiler). We believe the iteration could be shortened if we collaborated with LLVM developers.

Custom functions for automatically finding inference rules are greatly helpful for developing proof-generation algorithms. Using such automation, we could develop much simpler proof-generation algorithms for mem2reg and gvn, compared to our initial development, by making the code size less than half and speeding up more than twice.

CLANG is less cost-effective for peephole optimizations in instcombine. We had to write 1.9 lines of proof-generation code for each line of the corresponding compiler code, and we did not verify arithmetic inference rules. Even though CLANG achieves higher level of reliability, we think more automated approaches using an SMT solver such as Alive [30] would be more cost-effective for peephole optimizations.

7 Experiment

Benchmarks Using CRELLVM, we validated the compilation of the SPEC CINT2006 C Benchmarks [15], LLVM nightly test suite, and five open-source projects written in C (the biggest benchmarks used in [37]), totaling 5.3 million LOC in C. We omitted 3 files from the benchmarks because they contain instructions currently not supported by VELLM, including the indirectbr instruction.

Fig. 6 summarizes the validation results and the time spent on running the proof-generation codes and the proof checker for each optimization pass. In the experiment, we compiled each benchmark program with the -O2 flag, and validated the intermediate translations with the generated proofs. For more detailed results, see [1, §A].

We show the total number of translation steps (#V), the number of not-supported translations (#NS), and the number of translations failed at validation (#F). The rest of the translations (i.e., #V - #F - #NS) succeeded in validation. Also, all the successful translations were shown to be equivalent to the original translations using the llvm-diff tool. During the experiment, we also found and reported a bug in llvm-diff, which has been confirmed and fixed [8].

Out of 2,205K validations in total, 1632K (74.0%) are successfully validated. All 463 (0.01%) failures (#F) are due to compiler bugs: 10 are due to the mem2reg bug [5] we discussed in §1.2, 295 are due to the two gvn bugs [6, 7] we

---

We omitted Linux, since it is currently not compiled with LLVM (see [29]).

Kang, Kim, Song, Lee, Park, Shin, Kim, Cho, Choi, Hur, Yi

The other 572.2K (26.0%) translations (#NS) are currently not supported in our validator. Among them, 555.9K (97.1%) use instructions not supported by VELLM: vector operations 515.1K (90.0%), aggregate type operations 30.4K (5.31%), debug attributes 8.7K (1.52%), and atomic operations 1.7K (0.29%). 13.0K (2.27%) use the alias and division-by-zero analysis modules of LLVM: 2.3K (0.41%) alter type declarations; and 0.7K (0.12%) require deeper analysis on functions such as read-only function analysis.

We measured the time spent on performing each optimization in the original compiler (Orig); on performing each optimization and calculating validation proofs in the modified compiler (PCal); on writing and reading the source and target programs with the proofs via files (I/O); and on validating the proofs by the proof checker (PCheck). The table shows total times aggregated over the entire run.

In the experiment, we embarrassingly parallelized compilation and validation jobs and fully utilized the 96 hardware threads from four identical workstations with Intel Xeon E5-2630 CPU (2.6GHz, 12 cores, 2 hardware threads per core), 128GB RAM, and 1TB SSD (Samsung 850 PRO). The whole experiment took about three hours in wall clock.

Validating Randomly Generated Programs We randomly generated 1,000 C programs using CSmith [53], compiled them with -O2 flag, and validated the intermediate translations with the generated proofs. All 55,008 validations for gvn are successfully validated, except for one due to the gvn bug [6] we found. Out of 42,584 validations for mem2reg, 11,816 (27.7%) are currently not supported due to LLVM lifetime intrinsics, which is not supported by VELLM. The other 30,768 (72.3%) are all successfully validated.

Performance Proof checking takes much more time than regular compilation, but we believe it is still reasonable for compiler writers to use CRELLVM for stabilizing compilers. Also, as we have shown in the experiment, compiler writers can further reduce runtime by checking proofs in parallel. Furthermore, there is still a large room for performance improvement as we have not done any serious performance analysis and tuning for the proof checker. In particular, we believe we can significantly reduce I/O time, which is one of the current bottlenecks, by writing proofs in binary format rather than in plain-text JSON format and also by writing only the changes made between IR files rather than writing full IR files. In our benchmark, the CLANG frontend generated 4,885 IR files with average size of 187.63 KB, from which 2,205K validations with average proof size of 17.5 KB were generated.
Bug Reports  By November 2016 when we completed our initial implementation of Crelvm for LLVM 3.7.1, we reported three miscompilation bugs, one in mem2reg [5] and two in gvn [6, 7], which were immediately confirmed and subsequently fixed. Around July 2017 when we verified selected inference rules, we reported another miscompilation bug in mem2reg [9], which was immediately confirmed but has not been fixed yet (as of 14 April 2018) because it is unlikely to occur in practice (it did not occur in our benchmark either) and there is no consensus on how to fix it. Around March 2018, we additionally covered the function performScalarPREInsertion in gvn, which was omitted initially because it is loosely related to value numbering: deciding whether to perform the transformation, not the transformation itself, depends on value numbering. The reason for this coverage is because we were informed of a new bug [11] found in the function. As we have seen above, Crelvm successfully detected the bug by failing at 158 validations.

8 Discussion

8.1 Reliability

In order to see how effectively Crelvm improved reliability of LLVM, we investigated all bug reports about miscompilation in mem2reg and gvn since the release of LLVM 3.7.1. To the best of our knowledge, other than the five bugs [5–7, 9, 11] detected by Crelvm, there is no confirmed miscompilation bug that is (i) due to the code we covered in mem2reg and gvn and (ii) not related to any LLVM feature that is currently not supported by Crelvm (as of 14 April 2018).

Specifically, we conducted our investigation as follows. We checked all relevant bug reports in the LLVM bug tracker [4] and OSS-Fuzz bug tracker [3]. Moreover, we asked the llvm-dev mailing list about relevant bugs [2]. We also posted a draft of this paper on our website in February 2018 and received comments. One of the most important comments was about the gvn bug [11] in the code we newly covered (i.e., the function performScalarPREInsertion). The bug was discovered and fixed in October 2017 by Azul Systems via fuzz testing of the company’s LLVM-based Java JIT compiler, using JavaFuzzer [10] (private communication with Philip Reames, March 2018).

8.2 Maintainability

To evaluate maintenance cost, we ported our full development of Crelvm to LLVM 5.0.1 just omitting instcombine because it is not our main target. After the initial porting, which took two days, we found one validation fail in gvn due to insufficient proof generation. We fixed it by adding an automation function, which took 5 days by one person including analysis of the problem. After applying the gvn bug fix [11] in the main trunk to LLVM 5.0.1, our benchmark experiment produces no validation failures except for not-supported ones (see [1, §A] for details).

8.3 Limitations and Future Work

We discuss current limitations of Crelvm, which also indicate a direction of future research.

Semantics  Vellvm does not fully formalize the LLVM IR semantics. First, it does not support several features of LLVM IR, including atomic operations for concurrency, vector operations and attributes like nolalias, readonly and nsw.

Second, Vellvm does not properly formalize casts between integers and pointers, which itself is a challenging research topic. Applying the idea of Kang et al. [22] would be interesting future research.

Finally, Vellvm does not properly formalize the undef and poison values, which is another research problem. Recently, Lee et al. [25] proposed a possible solution to this problem using a new instruction, called freeze. Applying it to Vellvm would be interesting work.

Analyses  Our proof checker does not support various analysis passes such as division-by-zero analysis, alias analysis, read-only function analysis, and memory dependence analysis. We believe it would be possible to support them by adding appropriate predicates and inference rules in the underlying logic of proof checker.

CFG-Changing Optimizations  Crelvm relies on the condition that the source and target programs can be aligned line-by-line by inserting logical no-op instructions. While we think this condition holds for majority of LLVM optimizations, there are several important optimizations that break the condition by changing the control-flow graph. Examples include loop unrolling, loop unswitching and loop splitting. We believe it would be possible to support them by generalizing the proof checker following the ideas from existing translation validation works [36, 49–52, 57].

9 Related Work

A large number of prior work on improving reliability of compiler are roughly classified into the following categories.

Credible Compilation  Rinard et al. [44], who coined the term credible compilation, proposed the framework of credible compilation and presented a relational Hoare logic, in which one can reason about register allocation and instruction scheduling optimizations in the presence of pointer aliasing. Independently, Benton [16] proposed a relational Hoare logic for a functional language. However, their logics are designed for simple languages, and the framework has not been implemented and applied to compilers.

Namjoshi et al. [33, 34] presented a “proof of concept” implementation of credible compilation (or a witnessing compiler in their terminology) for LLVM optimizations such
as constant propagation, dead-code elimination, and LICM. However, the work can be seen as rather preliminary for the following reasons. First, their proof checker supports a small subset of LLVM IR, most notably ignoring memory operations. Second, it assumes that main functions of the compiler are correct. For example, it assumes that the constant-folding function of LLVM is correct.

Verified translation validation is similar to verified credible compilation but differs in that it develops a verified validator specialized for a particular optimization, rather than developing a proof checker for a general logic. Various verified translation validators have been developed for CompCert: instruction scheduling [30], lazy code motion [51], software pipelining [52], register allocation [43]; SSA transformation [14]; and GVN and sparse conditional constant propagation (SCCP) [19].

(Foundational) proof carrying code (PCC) [12, 35] is similar to (verified) credible compilation, but it employs a (verified) unary logic for validating safety properties of the generated target program.

**Translation Validation** This approach develops a general validator that checks correctness of any given translation between IR programs without requiring any proof. Compared to credible compilation, translation validation is more scalable (i.e., more easily applicable to different optimizations) because it requires much less manual effort due to no need for writing proof-generation code. On the other hand, though it can be used to guarantee correctness of certain compilations, it can hardly be used to find compiler bugs due to many false positives. The reason for false positives is that such a general validator is inherently incomplete since it is agnostic to the compiler’s internal logic.

Due to such incompleteness, a variety of translation validators with different heuristics and trade-offs were proposed [20, 36, 38, 39, 45–47, 49, 54, 57, 58]. In particular, Tristan et al. [49] and Stepp et al. [46] developed translation validators for LLVM optimization passes, including dead-code elimination, GVN-PRE, constant propagation, and LICM. However, they failed at about 20% of the validations, most of which are likely to be false positives.

**Compiler Verification** Verified compilers provide the highest level of reliability by proving the semantics-preservation property for all possible source programs in a proof assistant. CompCert [26, 27] is the most sophisticated formally verified optimizing C compiler, whose correctness is proved in Coq [13], and CakeML [23] is an optimizing ML compiler formally verified in the HOL4 theorem prover [40]. However, verifying a full-fledged compiler is highly costly and verified compilers are usually much less performant than production compilers.

Zhao et al. [55, 56] implemented and verified the vmem2reg pass for LLVM in Coq, but its algorithm is significantly simplified compared to that in LLVM. Their simplified algorithm is based on a rewriting logic in which each rewriting step preserves semantics and each intermediate program is type-checked. On the other hand, LLVM’s register-promotion algorithm temporarily breaks the semantics-preservation property and even the intermediate programs are not type-checked, because ill-formed empty \( \phi \)-nodes are inserted in the middle and their arguments are filled later. According to the authors, this renders the formal verification hard for the register-promotion implementation in LLVM.

**DSL for Optimizations** Lopes et al. [30–32] presented Alive, a DSL for writing peephole optimizations using the SMT solver Z3 [41]. With Alive, one can either prove the correctness of an optimization or find a counterexample. They ported 300 micro-optimizations of `instcombine` to Alive, and in doing so they found 8 bugs in `instcombine`. However, the Alive DSL is not expressive enough to describe complex algorithms such as `mem2reg` and `gvn`, and limited to supporting only peephole optimizations that do not involve reasoning about cyclic control flows. In addition, Alive makes simplifying assumptions on the LLVM semantics, and their encoding of an optimization into SMT queries is a part of the TCB. Furthermore, since there is a gap between an actual implementation in C++ and a corresponding algorithm description in Alive DSL, implementation bugs cannot be detected. Tatlock and Lerner [48] also presented a DSL for writing CompCert optimizations based on a rewriting logic, but it is not general enough to support register promotion and GVN-PRE.

**Compiler Testing** Random testing tools such as CSmith [17, 42, 53] and EMI [24] have been very successful. They have found hundreds of bugs in GCC and LLVM. However, most of them are found in the `instcombine` pass and none of them are miscompilation bugs in `mem2reg` and `gvn`.

**10 Conclusion**

We have demonstrated that the credible-compilation approach scales to the production compiler LLVM by developing our CRELLVM framework. We also empirically demonstrated that CRELLVM can be an effective tool for achieving high reliability of major optimizations by discovering four long-standing bugs in the `mem2reg` and `gvn` passes.

**Acknowledgments**

We thank Daniel Berlin, Davide Italiano, Yeonwoo Kim, Philip Reames, John Regehr, and anonymous reviewers for very helpful feedback, and Sung-bwan Lee for his contribution to early development of CRELLVM. This research was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1502-07. Jeehoon Kang, Yoonseung Kim, and Junyeong Lee have been supported by Korea Foundation for Advanced Studies Scholarships.
References

[57] Lenore Zuck, Amir Pnueli, Benjamin Goldberg, Clark Barrett, Yi Fang, and Ying Hu. 2002. Translation and Run-Time Validation of Loop Transformations (RV ’02).
the validation results and the time spent on the proof-generation codes and the proof checker for each benchmark program and optimization pass for Fig. 7 and Fig. 8 shows the validation results and the time spent on running the proof-generation codes and the proof checker for each benchmark program.

### A Experimental Results

Fig. 7 and Fig. 8 shows the validation results and the time spent on running the proof-generation codes and the proof checker for each benchmark program and optimization pass for LLVM 3.7.1. Fig. 9, Fig. 10, and Fig. 11 show the experimental results for LLVM 5.0.1 without gvn bug[7] Patch. Fig. 9 is a summary of the entire experimental results. Fig. 10 and Fig. 11 represents the validation results and the time spent on the proof-generation codes and the proof checker for each benchmark program.
<table>
<thead>
<tr>
<th>LOC</th>
<th>LOC</th>
<th>mem2reg</th>
<th>GVN</th>
<th>LICM</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>#V</td>
<td>#F</td>
<td>#NS</td>
<td></td>
</tr>
<tr>
<td>perlbench 400</td>
<td>168.16K</td>
<td>1.75K</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>bzip2 401</td>
<td>8.29K</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>gcc 403</td>
<td>517.52K</td>
<td>5.43K</td>
<td>0</td>
<td>7</td>
</tr>
<tr>
<td>mcf 429</td>
<td>2.69K</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>milc 433</td>
<td>15.04K</td>
<td>235</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>gobmk 445</td>
<td>196.24K</td>
<td>2.64K</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>hammer 456</td>
<td>35.99K</td>
<td>558</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>sjeng 458</td>
<td>13.85K</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>libquantum 462</td>
<td>4.36K</td>
<td>123</td>
<td>79</td>
<td>0</td>
</tr>
<tr>
<td>h264ref 464</td>
<td>51.58K</td>
<td>532</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>gobmk 456</td>
<td>197.65K</td>
<td>13.03K</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>gcc 470</td>
<td>56.52K</td>
<td>5.16K</td>
<td>0</td>
<td>12</td>
</tr>
<tr>
<td>sphinx3 482</td>
<td>25.09K</td>
<td>365</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>total</td>
<td>5289.18K</td>
<td>76.84K</td>
<td>0</td>
<td>10.63K</td>
</tr>
</tbody>
</table>

**Figure 10.** Validation Results for LLVM 5.0.1 before GVN patch.

<table>
<thead>
<tr>
<th>LOC</th>
<th>Orig PCal</th>
<th>I/O</th>
<th>PCheck</th>
<th>Orig PCal</th>
<th>I/O</th>
<th>PCheck</th>
</tr>
</thead>
<tbody>
<tr>
<td>perlbench 400</td>
<td>17.81</td>
<td>0.19K</td>
<td>0.74K</td>
<td>1.70</td>
<td>6.87</td>
<td>1.25K</td>
</tr>
<tr>
<td>bzip2 401</td>
<td>0.12</td>
<td>0</td>
<td>0.12</td>
<td>0.12</td>
<td>0.56</td>
<td>0.16K</td>
</tr>
<tr>
<td>gcc 403</td>
<td>19.44K</td>
<td>0</td>
<td>0.02K</td>
<td>0.10</td>
<td>0.46</td>
<td>0.07K</td>
</tr>
<tr>
<td>mcf 429</td>
<td>29.71K</td>
<td>0</td>
<td>0.02K</td>
<td>0.12</td>
<td>10</td>
<td>1.00K</td>
</tr>
<tr>
<td>milc 433</td>
<td>8.86K</td>
<td>0</td>
<td>0.01K</td>
<td>0.12</td>
<td>24</td>
<td>0.22K</td>
</tr>
<tr>
<td>gobmk 445</td>
<td>5.30</td>
<td>0</td>
<td>0.02K</td>
<td>0.12</td>
<td>1.60K</td>
<td>0.20K</td>
</tr>
<tr>
<td>hammer 456</td>
<td>0.82</td>
<td>0.01K</td>
<td>0.01K</td>
<td>0.16</td>
<td>0.53</td>
<td>0.05K</td>
</tr>
<tr>
<td>libquantum 462</td>
<td>247</td>
<td>0</td>
<td>0.02K</td>
<td>0.17</td>
<td>19.44K</td>
<td>0.01K</td>
</tr>
<tr>
<td>h264ref 464</td>
<td>104.20K</td>
<td>0</td>
<td>0.02K</td>
<td>0.17</td>
<td>30.14K</td>
<td>0.01K</td>
</tr>
<tr>
<td>sphinx3 482</td>
<td>797.65K</td>
<td>0</td>
<td>0.02K</td>
<td>0.17</td>
<td>50.11K</td>
<td>0.01K</td>
</tr>
<tr>
<td>total</td>
<td>3589.18K</td>
<td>76.84K</td>
<td>0</td>
<td>10.63K</td>
<td>285.82K</td>
<td>134</td>
</tr>
</tbody>
</table>

**Figure 11.** Time Spent on Running the Proof-Generation Codes and the Proof Checker for LLVM 5.0.1 before GVN patch.

and optimization pass. Fig. 12, Fig. 13, and Fig. 14 shows the corresponding result for Fig. 9, Fig. 10, and Fig. 11 for LLVM 5.0.1 with gvn bug[7] Patch. TODO: Add 5.0.1 summary data
### Results Time (sec.)

<table>
<thead>
<tr>
<th></th>
<th>#V</th>
<th>#F</th>
<th>#NS</th>
<th>Orig</th>
<th>PCal</th>
<th>I/O</th>
<th>PCheck</th>
</tr>
</thead>
<tbody>
<tr>
<td>mem2reg</td>
<td>76.84K</td>
<td>0</td>
<td>10.63K</td>
<td>13.20</td>
<td>385.15</td>
<td>13.73K</td>
<td>20.62K</td>
</tr>
<tr>
<td>gvn</td>
<td>285.64K</td>
<td>0</td>
<td>8.75K</td>
<td>49.93</td>
<td>214.90</td>
<td>37.62K</td>
<td>31.90K</td>
</tr>
<tr>
<td>licm</td>
<td>181.53K</td>
<td>0</td>
<td>30.70K</td>
<td>24.32</td>
<td>891.71</td>
<td>64.18K</td>
<td>13.26K</td>
</tr>
</tbody>
</table>

**Figure 12.** Experimental Results for LLVM 5.0.1 after GVN patch

<table>
<thead>
<tr>
<th>LOC</th>
<th>LOC</th>
<th>Result</th>
<th>mem2reg</th>
<th>GVN</th>
<th>LICM</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>#V</td>
<td>#F</td>
<td>#NS</td>
<td>#V</td>
<td>#F</td>
</tr>
<tr>
<td>400.perlbench</td>
<td>168.16K</td>
<td>1.75K</td>
<td>0</td>
<td>9.24K</td>
<td>0</td>
</tr>
<tr>
<td>401.bzip2</td>
<td>8.29K</td>
<td>90</td>
<td>0</td>
<td>1.60K</td>
<td>0</td>
</tr>
<tr>
<td>403.gcc</td>
<td>517.52K</td>
<td>5.43K</td>
<td>0</td>
<td>25.28K</td>
<td>0</td>
</tr>
<tr>
<td>429.mcf</td>
<td>2.69K</td>
<td>24</td>
<td>0</td>
<td>76</td>
<td>0</td>
</tr>
<tr>
<td>433.milc</td>
<td>15.04K</td>
<td>235</td>
<td>0</td>
<td>2</td>
<td>1.60K</td>
</tr>
<tr>
<td>445.gobmk</td>
<td>196.24K</td>
<td>2.64K</td>
<td>0</td>
<td>4.96K</td>
<td>0</td>
</tr>
<tr>
<td>456.hmmer</td>
<td>35.99K</td>
<td>558</td>
<td>0</td>
<td>2.85K</td>
<td>0</td>
</tr>
<tr>
<td>458.sjeng</td>
<td>13.85K</td>
<td>130</td>
<td>0</td>
<td>1.22K</td>
<td>0</td>
</tr>
<tr>
<td>462.libquantum</td>
<td>4.36K</td>
<td>123</td>
<td>0</td>
<td>247</td>
<td>0</td>
</tr>
<tr>
<td>464.h264ref</td>
<td>51.58K</td>
<td>532</td>
<td>0</td>
<td>12</td>
<td>12.15K</td>
</tr>
<tr>
<td>470.libm</td>
<td>1.16K</td>
<td>21</td>
<td>0</td>
<td>22</td>
<td>0</td>
</tr>
<tr>
<td>482.sphinx3</td>
<td>25.09K</td>
<td>365</td>
<td>0</td>
<td>1.49K</td>
<td>0</td>
</tr>
<tr>
<td>sendmail-8.15.2</td>
<td>138.68K</td>
<td>536</td>
<td>0</td>
<td>403</td>
<td>3.60K</td>
</tr>
<tr>
<td>emacs-25.1</td>
<td>463.54K</td>
<td>5.16K</td>
<td>0</td>
<td>19.12K</td>
<td>0</td>
</tr>
<tr>
<td>python-3.4.1</td>
<td>486.38K</td>
<td>8.79K</td>
<td>0</td>
<td>22.77K</td>
<td>0</td>
</tr>
<tr>
<td>gimp-2.8.18</td>
<td>1004.20K</td>
<td>19.44K</td>
<td>0</td>
<td>27.65K</td>
<td>0</td>
</tr>
<tr>
<td>ghostscript-9.14.0</td>
<td>797.65K</td>
<td>13.03K</td>
<td>0</td>
<td>9.20K</td>
<td>50.11K</td>
</tr>
<tr>
<td>LLVM nightly test</td>
<td>1358.76K</td>
<td>17.99K</td>
<td>0</td>
<td>297</td>
<td>247</td>
</tr>
<tr>
<td>Total</td>
<td>5289.18K</td>
<td>76.84K</td>
<td>0</td>
<td>10.63K</td>
<td>285.64K</td>
</tr>
</tbody>
</table>

**Figure 13.** Validation Results for LLVM 5.0.1 after GVN patch

### B Miscompilation of a Realistic Program due to a Mem2reg Bug

One of the bug we found in mem2reg miscompiles the following program:

```c
#include <stdio.h>

int sqr(int i, int prev, int cur) {
    return cur * cur;
}

int diffsqr(int i, int prev, int cur) {
    if (i==0) return 0; else return (cur-prev) * (cur-prev);
}

void foo(int arr[], int n) {
    int sqrs = 0, difffs = 0;
    int i, prev, cur;
    for (i = 0; i < n; ++i) {
        prev = cur; cur = arr[i];
        sqrs += sqr(i, prev, cur);
        difffs += diffsqr(i, prev, cur);
    }
    printf("square sum=%d, diff sq sum=%d \n", sqrs, difffs);
}
```

One of the bug we found in mem2reg miscompiles the following program:

```c
#include <stdio.h>

int sqr(int i, int prev, int cur) {
    return cur * cur;
}

int diffsqr(int i, int prev, int cur) {
    if (i==0) return 0; else return (cur-prev) * (cur-prev);
}

void foo(int arr[], int n) {
    int sqrs = 0, difffs = 0;
    int i, prev, cur;
    for (i = 0; i < n; ++i) {
        prev = cur; cur = arr[i];
        sqrs += sqr(i, prev, cur);
        difffs += diffsqr(i, prev, cur);
    }
    printf("square sum=%d, diff sq sum=%d \n", sqrs, difffs);
}
```
<table>
<thead>
<tr>
<th>Orig PCal</th>
<th>I/O</th>
<th>PCheck</th>
<th>Orig PCal</th>
<th>I/O</th>
<th>PCheck</th>
<th>Orig PCal</th>
<th>I/O</th>
<th>PCheck</th>
</tr>
</thead>
<tbody>
<tr>
<td>mem2reg</td>
<td>GVN</td>
<td>LICM</td>
<td>mem2reg</td>
<td>GVN</td>
<td>LICM</td>
<td>mem2reg</td>
<td>GVN</td>
<td>LICM</td>
</tr>
<tr>
<td>400.perlbench</td>
<td>0.38</td>
<td>18.16</td>
<td>0.19K</td>
<td>1.70</td>
<td>6.69</td>
<td>1.24K</td>
<td>3.10K</td>
<td>0.30</td>
</tr>
<tr>
<td>401.bzip2</td>
<td>0.02</td>
<td>1.71</td>
<td>0.01K</td>
<td>0.12</td>
<td>0.52</td>
<td>0.16K</td>
<td>0.23K</td>
<td>0.07</td>
</tr>
<tr>
<td>403.gcc</td>
<td>1.04</td>
<td>47.11</td>
<td>5.32K</td>
<td>6.47</td>
<td>27.12</td>
<td>3.16K</td>
<td>3.73K</td>
<td>1.72</td>
</tr>
<tr>
<td>429.mcf</td>
<td>&lt;0.01</td>
<td>0.10</td>
<td>&lt;0.01K</td>
<td>0.12</td>
<td>0.47</td>
<td>0.06K</td>
<td>0.03K</td>
<td>0.08</td>
</tr>
<tr>
<td>445.gobmk</td>
<td>0.29</td>
<td>9.44</td>
<td>1.88K</td>
<td>0.82</td>
<td>3.49</td>
<td>0.34K</td>
<td>0.26K</td>
<td>0.32</td>
</tr>
<tr>
<td>456.hmmer</td>
<td>0.11</td>
<td>3.33</td>
<td>0.03K</td>
<td>0.36</td>
<td>1.57</td>
<td>0.11K</td>
<td>0.07K</td>
<td>0.20</td>
</tr>
<tr>
<td>458.sjeng</td>
<td>0.03</td>
<td>0.92</td>
<td>0.01K</td>
<td>0.16</td>
<td>0.50</td>
<td>0.05K</td>
<td>0.02K</td>
<td>0.06</td>
</tr>
<tr>
<td>462.libquantum</td>
<td>0.01</td>
<td>0.30</td>
<td>&lt;0.01K</td>
<td>0.03</td>
<td>0.12</td>
<td>&lt;0.01K</td>
<td>0.01K</td>
<td>0.01</td>
</tr>
<tr>
<td>464.h264ref</td>
<td>0.16</td>
<td>8.14</td>
<td>0.06K</td>
<td>1.04</td>
<td>4.80</td>
<td>1.24K</td>
<td>0.96K</td>
<td>0.41</td>
</tr>
<tr>
<td>470.lbm</td>
<td>&lt;0.01</td>
<td>0.21</td>
<td>&lt;0.01K</td>
<td>0.01</td>
<td>0.03</td>
<td>&lt;0.01K</td>
<td>0.01K</td>
<td>0.01</td>
</tr>
<tr>
<td>482.sphinx3</td>
<td>0.06</td>
<td>1.97</td>
<td>0.02K</td>
<td>0.18</td>
<td>0.77</td>
<td>0.05K</td>
<td>0.02K</td>
<td>0.10</td>
</tr>
<tr>
<td>sendmail-8.15.2</td>
<td>0.22</td>
<td>4.44</td>
<td>0.03K</td>
<td>0.57</td>
<td>1.92</td>
<td>0.43K</td>
<td>0.32K</td>
<td>0.24</td>
</tr>
<tr>
<td>emacs-25.1</td>
<td>0.92</td>
<td>42.31</td>
<td>1.13K</td>
<td>3.56</td>
<td>16.12</td>
<td>3.85K</td>
<td>2.79K</td>
<td>1.05</td>
</tr>
<tr>
<td>python-3.4.1</td>
<td>3.27</td>
<td>37.51</td>
<td>0.33K</td>
<td>8.75</td>
<td>35.35</td>
<td>4.00K</td>
<td>2.89K</td>
<td>4.67</td>
</tr>
<tr>
<td>LLVM nightly test</td>
<td>3.02</td>
<td>110.94</td>
<td>2.38K</td>
<td>16.08</td>
<td>71.29</td>
<td>17.57K</td>
<td>15.04K</td>
<td>11.84</td>
</tr>
<tr>
<td>Total</td>
<td>13.20</td>
<td>385.15</td>
<td>13.73K</td>
<td>49.93</td>
<td>214.90</td>
<td>37.62K</td>
<td>31.90K</td>
<td>24.32</td>
</tr>
</tbody>
</table>

Figure 14. Time Spent on Running the Proof-Generation Codes and the Proof Checker for LLVM 5.0.1 after GVN patch

```c
int main () {
    int a[3] = {1, 2, 5};
    foo(a, 3);
    return 0;
}
```

The function `foo()` takes an array `a` and its size `n` and prints the sum of the squares of the numbers in `a`, and the sum of the squares of the differences of adjacent numbers in `a`, i.e.,

\[
\sum_{i=0}^{n-1} a[i]^2 \quad \text{and} \quad \sum_{i=1}^{n-1} (a[i] - a[i-1])^2.
\]

The function `foo()` calculates the two summations in a single loop. Note that the function `diffsqr(i, prev, cur)` returns \((cur - prev)^2\) if \(i > 0\), and zero otherwise.

Clang 3.7.1 with -O2 flag miscompiles this program. The compilation result prints out 30 and 0 instead of the correct answer 30 (= 1^2 + 2^2 + 5^2) and 10 (= (2-1)^2 + (5-2)^2). This is due to the `mem2reg` bug we found on the special case that all stores to a promotable location is in a single block. In essence, the loads from the local variable `prev` are `illegally` promoted to `undef`, the function call to `diffsqr()` is inlined, and exploiting the `undef` semantics, the result of the inlined call is optimized to 0.

Even though the program invokes undefined behavior according to the C standard, we believe it is still likely to be written in the real-world: a programmer may logically conclude that the `undef` value is never used, and the program is safe.

## C Global Value Numbering with Partial Redundancy Elimination

Global Value Numbering optimization (GVN), which is implemented in the `gvn` pass of LLVM, detects and removes redundant instructions. GVN algorithm first groups expressions and values into equivalent classes and assigns a unique "value number" to each class. Then, a leader value is chosen for each class, and all the non-leader, redundant, instructions are substituted with the leader value. The `gvn` pass also does Partial Redundancy Elimination optimization (PRE), which eliminates instructions that are partially redundant depending on the control flow.

In this section, we show how we generate and validate proofs for GVN-PRE optimization. It is worth noting that even though the GVN and PRE algorithms are separately written in the `gvn` pass, their validation logics are so similar that the resemblance of their validation logics enabled us to write a single proof generation code that uniformly works for both GVN and PRE.
We use a PRE translation example because it is more interesting than a GVN example. The shaded part of Fig. 15 shows an example translation of the PRE optimization, where the registers n and c1 are the function parameters.

First, by analyzing the source program, GVN assigns unique value numbers to the classes of equivalent values and expressions. For example, in Fig. 15, GVN constructs the following tables, VT, ET and LT.

\[
\begin{align*}
VT &= \{x_1, x_2 \mapsto 1; \quad y_1, y_2, y_3 \mapsto 2\} \\
ET &= \{n - 2 \mapsto 1; \quad 1 + 4 \mapsto 2\} \\
LT &= \ldots; \quad \text{B_{entry}} \mapsto \{\text{I} \mapsto 1; \; \text{II} \mapsto 2\}; \quad \text{B_{right}} \mapsto \{\text{I} \mapsto x_2; \; \text{II} \mapsto y_2\}; \ldots
\end{align*}
\]

Originally, the gvn pass also assigns value numbers to singleton classes, but in this example, we only consider those classes with more than one value for brevity.

Here, the value table VT assigns value number 1 to x1 and x2, and 2 to y1, y2 and y3; and the expression table ET assigns value number 1 to the expression n – 2, and 2 to 1 + 4. This means that, at any point of execution, there exist some values for 1 and 2 such that x1, x2 and n – 2 evaluate to the value 1 and y1, y2, y3 and 1 + 4 evaluate to the value 2 whenever they are well-defined. It is easy to see that indeed this property holds for the source program in Fig. 15. Finally, the leader table LT determines the leader value for each value number among the values in each block. Note that the leader values can be different for each block and not every block necessarily has a leader value for every value number. For example, in the block B_{entry}, the leader value for 2 does not exist because none of the values with value number 2 are defined in the block.

In the example, PRE detects partial redundancies of the instruction y3 := x1 + 1 in B_{exit} because y3 belongs to 2 and B_{exit}’s incoming blocks have a leader value for 2; 10 in B_{empty} and y2 in B_{right}. In other words, depending on the control flow, the expression y3 is equivalent to either 10 or y2. To eliminate the redundant instruction, PRE (i) inserts the pinode y4 := φ(10, y2) at the beginning of B_{exit}; (ii) replaces all uses of y3 with y4 at line 41; and (iii) eliminates y3 := x1 + 1 at line 40. This translation is beneficial because the inserted pinode is compiled down to a move instruction, which is more cost-effective than the eliminated addition instruction.

Figure 15. A PRE Example

C.1 Translation Example
C.2 ERHL Proof

We can turn our intuition for soundness into a formal proof. The unshaded part of Fig. 15 shows the proof, each assertion of which can be split into three parts.

**Expression Assertions** The red equalities of the form $e_{\text{src}} = \hat{v}_{\text{src}}$ or $\hat{v}_{\text{tgt}} = e_{\text{tgt}}$ relate expressions with their value numbers according to the expression table $ET$. For example, the assertion after line 21 states that there exist some values $v_1$ and $v_2$ (for $\circ$ and $\land$) such that $v_1 = n - 2$ and $v_2 = 1 + v_1$ hold for both the source and target states.

**Value Assertions** The green equalities of the form $x_{\text{src}} = \hat{v}_{\text{src}}$ or $\hat{v}_{\text{tgt}} = x_{\text{tgt}}$ relate values with their value numbers according to the value table $VT$. For example, the assertion after line 21 also states that $x_1$ has the value $v_1$ in the source and $y_1$ has the value $v_2$ in the target.

**Branching Assertions** The blue equalities show properties derived from branching conditions. For example, the equality $c_{2\mid \text{tgt}} = (y_{1\mid \text{tgt}} == 10)$ at line 21 is derived from the definition of the branching register $c_2$, and the equality $\hat{v}_{2\mid \text{tgt}} = 10$ in the block $B_{\text{empty}}$ is derived from the fact that $c_2$ must be true in $B_{\text{empty}}$ and thus $y_1$ is 10 and so is its associated value $\hat{v}_2$.

C.3 Proof Validation

The proof in Fig. 15 is validated as follows.

**Adding Value Assertions** At line 40 (and similarly at line 30), the proof checker computes a post-assertion by adding $\{ y_{3\mid \text{src}} = x_{1\mid \text{src}} + 1 \}$, and deduces, by applying the inference rules from the proof, that:

\[
\begin{align*}
y_{3\mid \text{src}} & = x_{1\mid \text{src}} + 1 & \text{(post-assertion)} \\
& = \hat{v}_{1\mid \text{src}} + 1 & \text{(substitution($x_{1\mid \text{src}} + 1, x_{1\mid \text{src}}, \hat{v}_{1\mid \text{src}}$))} \\
& = 1 + \hat{v}_{1\mid \text{src}} & \text{(commutativity_add($\hat{v}_{1\mid \text{src}}, 1$))} \\
& = \hat{v}_{2\mid \text{src}} & \text{(expression assertion)}
\end{align*}
\]

from which the next assertion trivially follows.

**Adding Expression Assertions** At line 31 (and similarly at lines 10 and 20), the proof checker computes a post-assertion by adding $\{ y_{2\mid \text{src}} = x_{2\mid \text{src}} + 1, x_{2\mid \text{tgt}} + 1 = y_{2\mid \text{tgt}} \}$. Then the proof gives the inference rule intro_ghost($1 + v_1, v_2$), which adds $\{ 1 + v_1 = v_2, v_2 = 1 + v_1 \}$. Then the proof checker deduces $v_{2\mid \text{tgt}} = y_{2\mid \text{tgt}}$ similarly as in line 40.

**Adding Phinode Assertions** At the phinode of $B_{\text{exit}}$, the assertion is separately validated for each incoming block. For the incoming block $B_{\text{left}}$ (and similarly for $B_{\text{right}}$), the proof checker computes a post-assertion by adding $10 = y_{4\mid \text{tgt}}$ and adding $y$ to the maydiff set, and derives $v_{2\mid \text{tgt}} = 10 = y_{4\mid \text{tgt}}$ by applying the transitivity rule from the proof.

**Adding Branching Assertions** At line 21, the proof checker computes a post-assertion by adding $\{ c_{2\mid \text{src}} = (y_{1\mid \text{tgt}} == 10), c_{2\mid \text{tgt}} = (y_{1\mid \text{tgt}} == 10) \}$, from which the next assertion trivially follows.

**Using Branching Assertions** At the beginning of $B_{\text{empty}}$, the proof checker computes a post-assertion by adding $\{ \text{true} = c_{2\mid \text{src}}, \text{true} = c_{2\mid \text{tgt}} \}$ from the branching condition, and deduces, by applying the inference rules from the proof, that $\text{true} = c_{2\mid \text{tgt}} = (y_{1\mid \text{tgt}} == 10)$ and

\[
\begin{align*}
v_{2\mid \text{tgt}} & = y_{1\mid \text{tgt}} \quad \text{(value assertion)} \\
& = 10 \quad \text{ (from true = (y_{1\mid \text{tgt}} == 10) by icmp_to_eq(true,y_{1\mid \text{tgt}},10))}
\end{align*}
\]

from which the next assertion trivially follows.

C.4 Proof Generation

Now we explain how we generate proofs for the GVN-PRE optimization.
We first add code to GVN and PRE algorithms that generate auxiliary data, \( RET \), \( RPT \) and \( BCT \). For example, we compute the following for Fig. 15:

\[
\begin{align*}
\text{RET} &= \{ x_1, x_2 \mapsto \{ n_{src} - 2 = \nu_1_{src}, \nu_1_{tgt} = n_{tgt} - 2 \}; \\
& \quad y_1, y_2, y_3, y_4 \mapsto \{ n_{src} - 2 = \nu_1_{src}, \nu_1_{tgt} = n_{tgt} - 2, 1 + \nu_1_{src} = \nu_2_{src}, \nu_2_{tgt} = 1 + \nu_1_{tgt} \} \}
\end{align*}
\]

\[
\begin{align*}
\text{RPT} &= \{ x_1, x_2 \mapsto (\_ \_); \\
& \quad y_1 \mapsto (\_ \_); \\
& \quad y_2, y_3 \mapsto (\_ \_ \_); \\
& \quad y_4 \mapsto (2, 2) \}
\end{align*}
\]

\[
\begin{align*}
\text{BCT} &= [(10, 2)] \mapsto \{(B_{\text{left}}, \text{true}, y_1)\}.
\end{align*}
\]

The register expression table \( RET \) contains sufficient expression assertions for reasoning about each register. For example, \( RET(x_1) \) contains \( \{ n_{src} - 2 = \nu_1_{src}, \nu_1_{tgt} = n_{tgt} - 2 \} \), which is sufficient for reasoning about \( x_1 \). The register parameter table \( RPT \) contains the value numbers of the instruction parameters for each register. For example, \( RPT[y_1] \) contains \( (\_ \_ \_), \) which means that the second parameter \( x_1 \) of the instruction 20: \( y_1 := 1 + x_1 \) has the value number \( \_ \_ \_ \). The branching condition table \( BCT \) maps pairs of constants/function parameters and their value numbers to their associated branching information. For example, \( BCT[(10, 2)] \) contains \( (B_{\text{left}}, \text{true}, y_1) \), which means that \( y_1 = 10 \) holds when control flows from \( B_{\text{left}} \) with the branching condition being \text{true} (i.e., \( c_2 = \text{true} \)), and \( (i) \) \( y_1 \)'s value number is \( 2 \). We can easily construct \( RET \), \( RPT \), and \( BCT \) following the construction of \( VT \) and \( ET \).

**Algorithm 3** presents the common proof-generation code that generate assertions for both GVN and PRE, which is given in a rather functional style for presentation purposes. Specifically, \( \text{ProofGen}(F, r, v, VT, RET, RPT, BCT) \) generates a proof for the replacement of \( r \) with \( v \) in function \( F \). In essence, the proof generation code adds assertions, starting from those for \( r_{\text{src}} \) and \( v_{\text{tgt}} \) at \( r \)'s definition point and recursively down to the arguments, using a worklist and auxiliary data \( RET \), \( RPT \), and \( BCT \).

The recusion stops when the target value is the only value of its value number. More concretely, the algorithm works for the example of Fig. 15 as follows.

**Initialize Worklist** The code finds where \( r \) is defined (line A1), and add a work for the source and another for the target to the worklist (line A2). A work \((l_0, x, \hat{n}, \text{side})\) means that \( (i) \) \( x \)'s value number is \( \hat{n} \), and \( (ii) \) line \( l_0 \) needs the assertion \( x_{\text{side}} = \hat{n}_{\text{side}} \) (where \( \text{side} \) is either \( \text{src} \) or \( \text{tgt} \)) and the expression assertions. For the translation in Fig. 15, \( x = y_3 \) is defined at line 40 in the source, so the initial works are \((40, y_3, \nu_2, \text{src}) \) and \((40, y_4, \nu_2, \text{tgt}) \). The code processes each work \((l_0, x, \hat{n}, \text{side})\) as follows (lines A3-A4).

**Processing Registers** If \( x \) is a register defined as \( e \) at \( l \) (line A6), the code adds the expression assertions \( RET[x] \) for \( x \) from \( l \) to \( l_0 \), and the value assertion \( x = \hat{n} \) at \( \text{side} \) (line A7). Consider the work \((40, y_4, \nu_2, \text{tgt}) \), for example. The register \( y_4 \) is defined by the phinode of \( B_{\text{exit}} \), so \( l \) is the phinode of \( B_{\text{exit}} \) and \( e = \phi(10, y_2) \). Hence \( RET[y_4] \) and \( \nu_2_{\text{tgt}} = y_4_{\text{tgt}} \) are inserted from the phinode of \( B_{\text{exit}} \) to line 40. For the work \((40, y_3, \nu_2, \text{src}) \), since \( y_3 \) is defined at line 40, no assertions are inserted.
In order to justify the inserted assertions at line \( l \), the code adds sub-works for the values in \( e \) (lines A8-A9). Note that MatchExpr(\( e \), RPT[\( x \)]) matches \( e \) against the register parameter table in order to know which value, say \( y \), should have which value number, say \( \hat{m} \), at line \( l' \) (line A8). For example, in order to deduce \( y_3 \approx_2 = y_2 \approx_2 \) after line 40, \( x_1 \approx_1 = y_1 \approx_1 \) should hold before the line, so the code adds (40, \( x_1 \), \( v_1 \), \( \approx_1 \)) to the worklist; for justifying \( v_2 \approx_2 = y_4 \approx_2 \) after the phinode of \( B_{\text{exit}} \), the code adds ([edge from \( B_{\text{empty}} \) to \( B_{\text{exit}} \)], 10, \( v_2 \), \( tgt \)) and ([edge from \( B_{\text{right}} \) to \( B_{\text{exit}} \)], \( y_2 \), \( v_2 \), \( tgt \)) to the worklist.

**Processing Constants and Parameters** If \( x \) is a constant or parameter \( C \) (line A11), the code looks for a branching condition in \( BCT[(C, \hat{n})] \), which justifies that \( C \) has the value number \( \hat{n} \) at \( l_0 \) (line A12). The result \((v, y, c, l)\) means \( y = C \) holds at \( l \) thanks to the branching condition \( c \) being equal to \( v \).

The code adds the branching assertion when necessary (line A14), and also adds an \( \text{icmp}_\text{to}_\text{eq} \) rule inference (line A15). Also, similarly to the case of registers, the code adds expression and branching assertions (line A16) to the proof, and adds sub-works for the value \( y \) (line A17).

For example, consider the work \((E_e, 10, \hat{v}_2, \hat{tgt})\), where \( E_e \) is the edge from \( B_{\text{empty}} \) to \( B_{\text{exit}} \). The code finds a branching condition in \( BCT[(10, \hat{v}_2)] \) which guarantees that \( 10 = \hat{v}_2 \) holds at \( E_e \) (line A12). Block \( B_{\text{left}} \) has such a branch when \( c2 \) equals to \([u, y, c, l] = (\text{true}, y_1, c_2, E_l)\) at line 13 where \( E_l \) is the edge \( B_{\text{left}} \) to \( B_{\text{empty}} \). Hence the code adds \( c2_{\hat{tgt}} = (y_1_{\hat{tgt}} = 10) \) from line 21 to \( E_l \) (line A14), and adds the inference rule \( \text{icmp}_\text{to}_\text{eq} \left[ \text{true}, y_1_{\hat{tgt}}, 10 \right] \) at \( E_l \) (line A15). Then the code adds the expression assertions \( \text{RET}[y] \) for \( y_1 \) and the branching assertion \( v_2 \approx_2 = 10 \) from \( E_l \) to \( E_e \) (line A16), and adds the work \((E_l, y_1, \hat{v}_2, \hat{tgt})\) to the worklist (line A16).

**Automation Function** Throughout the proof we use the GV\(N\)_\text{PRE} automation function, which adds intro\_ghost, commutativity, and substitution in a specific way for GV\(N\)_\text{PRE}, and adds transitivity and reduce\_maydiff in the same way as for associ\_add and mem\_2reg.

## D Validation Coverage

**TODO: LICM**

**Micro-Optimizations in instcombine** We validated the following 139 micro-optimizations in instcombine:
- \text{add-const-not}, \text{add-dist-sub}, \text{add-mask}, \text{add-onebit}, \text{and-or-and}, \text{add-select-zero}, \text{add-shift}, \text{add-signbit}, \text{add-xor-and}, \text{add-zext-bool}, \text{and-de-morgan}, \text{and-mone}, \text{and-or-const2}, \text{and-or-not1}, \text{and-or}, \text{and-undef}, \text{and-xor-const}, \text{and-zero}, \text{bitcast-bitcast}, \text{bitcast-fpext}, \text{bitcast-fptosi}, \text{bitcast-fptoui}, \text{bitcast-fptrunc}, \text{bitcast-ntoiptr}, \text{bitcast-ptrtoint}, \text{bitcast-sametype}, \text{bitcast-xext}, \text{bitcast-sitofp}, \text{bitcast-trunc}, \text{bitcast-uitofp}, \text{bitcast-zext}, \text{bop-associativity}, \text{dead-code-elim}, \text{dead-store-elim}, \text{fold-phi-bin-const}, \text{fold-phi-bin}, \text{fpext-bitcast}, \text{fpext-fpext}, \text{fptosi-bitcast}, \text{fptosi-fpext}, \text{fptoui-bitcast}, \text{fptoui-fpext}, \text{fptrunc-bitcast}, \text{fptrunc-fpext}, \text{icmp-eq-add-add}, \text{icmp-eq-srem}, \text{icmp-eq-sub-sub}, \text{icmp-eq-sub}, \text{icmp-eq-xor-not}, \text{icmp-eq-xor-xor}, \text{icmp-ne-add-add}, \text{icmp-ne-srem}, \text{icmp-ne-sub-sub}, \text{icmp-ne-sub}, \text{icmp-ne-xor-xor}, \text{icmp-ne-xor}, \text{icmp-sge-or-not}, \text{icmp-sgt-and-not}, \text{icmp-sle-or-not}, \text{icmp-slt-and-not}, \text{icmp-swap}, \text{icmp-uge-or-not}, \text{icmp-ugt-and-not}, \text{icmp-ule-or-not}, \text{icmp-ult-and-not}, \text{intoptr-bitcast}, \text{intoptr-ptrtoint}, \text{mul-bool}, \text{mul-mone}, \text{mul-neg}, \text{mul-shl}, \text{or-and-xor}, \text{or-and}, \text{or-mone}, \text{or-or2}, \text{or-or}, \text{or-or2}, \text{or-or2}, \text{or-or2}, \text{or-or}, \text{or-zero}, \text{ptrtoint-bitcast}, \text{ptrtoint-ptrtoint}, \text{sdv-mone}, \text{select-bop-fold}, \text{select-icmp-eq-xor1}, \text{select-icmp-eq-xor2}, \text{select-icmp-eq}, \text{select-icmp-ht-const}, \text{select-icmp-ht-const}, \text{select-icmp-ne-xor1}, \text{select-icmp-ne-xor2}, \text{select-icmp-ne}, \text{select-icmp-sgt-xor1}, \text{select-icmp-sgt-xor2}, \text{select-icmp-slt-xor2}, \text{sextr-bitcast}, \text{sextr-xext}, \text{sextr-trunc-ashr}, \text{sextr-xext}, \text{shift-undef1}, \text{shift-undef2}, \text{shift-zero1}, \text{shift-zero2}, \text{sitofp-bitcast}, \text{sitofp-xext}, \text{sitofp-zext}, \text{sub-add}, \text{sub-const-add}, \text{sub-const-not}, \text{sub-mone}, \text{sub-onebit}, \text{sub-or-xor}, \text{sub-remove}, \text{sub-shl}, \text{sub-shr}, \text{trunc-bitcast}, \text{trunc-onebit}, \text{trunc-xext}, \text{trunc-trunc}, \text{trunc-zext}, \text{uitofp-bitcast}, \text{uitofp-xext}, \text{xor-zero}, \text{xor-undef}, \text{xor-zero}, \text{zext-bitcast}, \text{zext-trunc-and-xor}, \text{zext-trunc-and}, \text{zext-xor}, \text{zext-xext}

Note that we gave these names and they are not officially used in LLVM.

## E Program Points between Two Lines

Assertion(\( P, l_1, l_2 \)) in the proof generation code means predicate \( P \) should be added to the assertions between \( l_1 \) and \( l_2 \). More specifically, the proposition \( P \) should be added at every program point appearing in a path from \( l_1 \) to \( l_2 \) that does not visit \( l \) but may visit \( l_2 \) in-between. Since \( l_1 \) is the source of the proposition \( P \) so that we can get \( P \) as a post-assertion every time we visit \( l_1 \), there is no need to add \( P \) along a path from \( l_1 \) to \( l_1 \). For example, consider the following program.
The marked area between $l_1$ and $l_2$ is where we should add the proposition $P$.

Thanks to the SSA property \[18\], we can efficiently calculate the program points between $l_1$ and $l_2$. First, we can assume that $l_1$ dominates $l_2$ (i.e., one should have visited $l_1$ to reach $l_2$ from the entry point), since the proposition $P$ created at $l_1$ should hold at $l_2$. Then a program point $l$ is on a path from $l_1$ to $l_2$ that does not visit $l_1$ in-between if and only if (i) $l_1$ dominates $l$ and (ii) $l_2$ is reachable from $l$ without visiting $l_1$. We efficiently check the first condition using the dominator tree \[18\] and the second condition by a backward BFS search from $l_2$.

In the above example, any program point in the marked area is such that $l_1$ dominates it and $l_2$ is reachable from it without visiting $l_1$. For example, $l_2$ is not reachable from the block $B_4$ without visiting $l_1$, and $l_1$ does not dominate the block $B_5$.

Note that this algorithm is not a part of TCB: validation may fail but cannot succeed incorrectly due to bugs in this algorithm.

### F Lessdef Predicates

LLVM and CompCert has the notion of `undef` value, which is designated as the result of erroneous operations. The compilers are allowed to replace `undef` by an arbitrary value. For example, LLVM’s InstCombine performs the following translation, where the register $y$ is replaced by 1, presumably because $y_{src} = a_{src} - x_{src} = a_{src} - (a_{src} - 1) = 1$ holds for any integer value of $a_{src}$:

\[
\begin{align*}
x & := a - 1; \quad x := a - 1; \\
y & := a - x; \quad y := a - x; \\
z & := y + 1; \quad z := 1 + 1;
\end{align*}
\]

However, if $a_{src}$ is `undef`, then we have $z_{src} = (a_{src} - (a_{src} - 1)) + 1 = \text{undef undef}$ is propagated in the arithmetic. Thus the equation $z_{src} = z_{tgt}$ no longer holds, breaking the equality relationship between the source and target values.

In order to reason such optimizations, we use the CompCert-style lessdef relation \[28\] throughout this work instead of the equality as assertion predicates. Concretely, $x$ is less defined than $y$, denoted $x \sqsubseteq y$, if $x$ is `undef` or it equals to $y$. For example, we have $y_{src} = a_{src} - x_{src} = a_{src} - (a_{src} - 1) \sqsupseteq 1$ and thus $z_{src} = y_{src} + 1 \sqsupseteq 1 + 1 = z_{tgt}$ regardless of whether $a = \text{undef}$ or not, which justifies the above translation.

So far we used the equality instead of lessdef relation for simplicity of presentation. However, all the assertions presented in this paper works even when equalities are replaced by lessdef relations. Note that the post-assertion generator should be adapted to lessdef relations so that they introduce both $x \sqsubseteq e$ and $e \sqsupseteq x$ after $x := e$.

### G Semantic Interpretation of Assertions

The syntax of assertions is as follows:
$\text{Reg} \ni r ::= \cdots$

$\text{Const} \ni c ::= \cdots$

$\text{Typ} \ni \text{typ} ::= \text{int} \mid * \text{typ} \mid \cdots$

$\text{SVal} \ni v ::= r \mid c$

$\text{Tag} \ni \text{tag} ::= \text{Phy} \mid \text{Ghost} \mid \text{Old}$

$\text{Reg} \times \text{Tag} \ni rT ::= (r,\text{tag})$

$\text{SVal} \times \text{Tag} \ni vT ::= (v,\text{tag})$

$\text{Expr} \ni e ::= \text{add} vT vT \mid \text{load} vT \text{typ} \text{align} \mid \cdots$

$\text{Pred} \ni \text{pred} ::= e \equiv e \mid \text{Uniq}(r) \mid \text{Priv}(rT) \mid vT \sqsubseteq vT$

$\text{MD} \ni M ::= \{rT,\cdots\}$

$\text{AssnU} \ni S,T ::= \{\text{pred},\cdots\}$

$\text{Assn} \ni P,Q ::= (S,T,M)$

$\text{Reg}$, $\text{Const}$, and $\text{Typ}$ are the types of LLVM registers, constants, and types. A (static) value is either a register or a constant. Registers and values may be tagged (see /coq/def/Exprs.v:106): the tag Phy means physical registers, Ghost means regular ghost registers (§3.2), and Old means old ghost registers (§4). We write $r$ for $(r,\text{Phy})$, $\hat{r}$ for $(r,\text{Ghost})$, and $\bar{r}$ for $(r,\text{Old})$.

An expression is basically the right-hand side of a side-effect-free LLVM instructions with operand values being tagged (see /coq/def/Exprs.v:366 in the Coq development). Notice that load is side-effect-free (except for undefined behavior) so there are load expressions, while store is side-effectful and there are no store expressions. Recall that $e_1 \equiv e_2$ means either $e_1 = e_2$ or $e_1 = \text{undef}$, and the predicate $vT_1 \sqsubseteq vT_2$ means the addresses in $vT_1$ and $vT_2$ point to disjoint memory blocks. A unary assertion is a set of predicates (see /coq/def/Hints.v:34), and a maydiff set is a set of tagged registers. An assertion consists of a unary assertion for source, another for target, and a maydiff set (see /coq/def/Hints.v:41).

We use the following semantic domains:

$\text{Val} \ni V ::= \cdots$

$\text{RF} \ni rs ::= \{r \mapsto V,\cdots\}$

$\text{State} \ni \sigma ::= \cdots$

$\text{StateT} \ni \sigma T ::= (\sigma,rs,rs)$

$\text{Meminj} \ni a ::= \cdots$

Let $\text{Val}$ be the set of (dynamic) values, $\text{RF}$ be the set of register files, $\text{State}$ be the set of states, $\text{StateT}$ be the set of extended states, which are tuples of a state, a register file for ghost registers, and another for old ghost registers. $\text{Meminj}$ is the set of CompCert-style memory injections, which basically maps a source memory block to the equivalent target block [28].

The semantics of assertions is as follows:

---

9We submitted the Coq proof scripts as supplementary material.
We give the semantic interpretation of the Hoare triple for call instructions \( v \sigma \), and in turn, we can rely on the fact that the callees return equivalent states. Using this semantics interpretation of \( \alpha \), we prove that the source and the target is about to call equivalent functions with equivalent arguments, and we can rely on the fact that the callees return equivalent states. Here, \( priv(a) \) is the set of those source blocks that are not mapped to a target block, and \( priv_{tgt}(a) \) is the set of those target blocks that are not mapped from a source block.

Registers, constants, static values have obvious semantics that maps a state to a dynamic value. Tagged registers and values map an extended state to a dynamic value. The semantics of \( pred \) is a predicate over a set of blocks, which represents the set of private blocks, and an extended state (see /coq/proof/InvState.v:346). In particular, \( \llbracket Uniq(r) \rrbracket (priv, \sigma T) \) means \( r \) is not aliased with any other values in \( \sigma T \) (see /coq/proof/InvState.v:314), and \( \llbracket Priv(rT) \rrbracket (priv, \sigma T) \) means if \( rT \) represents a pointer, it is not in \( priv \) (see /coq/proof/InvState.v:332). \( \llbracket vT_1 \cup vT_2 \rrbracket (priv, \sigma T) \) means if two values are pointers, they points to different memory block (\( b \) for block, \( o \) for offset, see /coq/proof/InvState.v:244). The semantics of a maydiff set is that all corresponding values are injected except for those in the maydiff set. The semantics of an assertion is that there exist ghost and old register files of the current memory injection \( J \). In particular, \( \llbracket M \rrbracket (a, \sigma_T_{src}, \sigma_T_{tgt}) \) means if two values are pointers, they points to different memory block (see /coq/proof/InvState.v:430). Here, \( priv_{src}(a) \) is the set of those source blocks that are not mapped to a target block, and \( priv_{tgt}(a) \) is the set of those target blocks that are not mapped from a source block.

## H Details of ERHL Proof Checker

### H.1 Semantic Interpretation of Hoare triples for call instructions

We give the semantic interpretation of the Hoare triple for call instructions \( I_{src} = (x_{src} := \text{call } f_{src} \ arg_{src}) \) and \( I_{tgt} = (x_{tgt} := \text{call } f_{tgt} \ arg_{tgt}) \) as follows:

\[
\llbracket \{ P \} I_{src} \leadsto I_{tgt} \{ Q \} \rrbracket \overset{def}{=} \forall \sigma_{src}, \sigma_{tgt}, \alpha. \llbracket P \rrbracket_\alpha (\sigma_{src}, \sigma_{tgt}) \land Instr(\sigma_{src}) = I_{src} \land Instr(\sigma_{tgt}) = I_{tgt} \implies \\
\forall \nu_{src}, \nu_{tgt}, \sigma'_{src}, \sigma'_{tgt}. \llbracket \{ P \} \rrbracket_\alpha (\sigma_{src}, \sigma_{tgt}) \land \nu_{src} \leadsto \sigma'_{src} \land \nu_{tgt} \leadsto \sigma'_{tgt} \implies \llbracket \{ Q \} \rrbracket_{\alpha'}(\sigma'_{src}, \sigma'_{tgt})
\]

where, \( \nu_{src} \leadsto \sigma'_{src} \) means \( \nu_{src} \) is injected into \( \nu_{tgt} \) via memory injection \( \alpha \), \( \llbracket T \rrbracket \) is the assertion with no predicates and the full maydiff set (i.e., the values of every register may differ), and \( \sigma \overset{\nu}{\leadsto} \sigma' \) means \( \sigma \) is about to call a function, and \( \sigma' \) is a possible return state after the function call for the case that the callee returns \( v \).

The semantic interpretation means that the source and the target is about to call equivalent functions with equivalent arguments, and for all future extension \( \alpha' \) of the current memory injection \( \alpha \), if the return values and return states are related by \( \alpha' \), then there exists a future extension \( \alpha'' \) of \( \alpha' \) for which \( \llbracket Q \rrbracket \) is satisfied for the return states.

In order to prove a Hoare triple, we need to guarantee that the source and target call equivalent functions with equivalent arguments, and in turn, we can rely on the fact that the callee returns equivalent states. Using this semantics interpretation of
Algorithm 4 CheckEquivBeh($P, I_{src}, I_{tgt}$: Command)

1: match $I_{src}, I_{tgt}$ with
2: // call
3: | ($x_{src} := \textnormal{call}_f x_{src} \text{args}_{src}$), ($x_{tgt} := \textnormal{call}_f x_{tgt} \text{args}_{tgt}$) ⇒ return ($f_{src} \sim_p f_{tgt}$) ∧ ($\text{args}_{src} \sim_p \text{args}_{tgt}$)
4: | (._ := call _, _), (_ := call _, _) ⇒ return false
5: // alloca
6: | ($p_{src} := \text{alloca} x_{src}$), ($p_{tgt} := \text{alloca} x_{tgt}$) ⇒ return $x_{src} \sim_p x_{tgt}$
7: | ($p_{src} := \text{alloca} x_{src}$), (lnop) ⇒ return true
8: | (_ := alloca _), (_ := alloca _) ⇒ return false
9: // store
10: | (store $p_{src} v_{src}$), (store $p_{tgt} v_{tgt}$) ⇒ return ($p_{src} \sim_p p_{tgt}$) ∧ ($v_{src} \sim_p v_{tgt}$)
11: | (store $p_{src} v_{src}$), (lnop) ⇒ return $\text{Priv}(p_{src}) \in P$
12: | (store _, _), (store _) ⇒ return false
13: // target is load
14: | ($v_{src} := \text{load} p_{src}$), ($v_{tgt} := \text{load} p_{tgt}$) ⇒ return $p_{src} \sim_p p_{tgt}$
15: | _, ($v_{tgt} := \text{load} p_{tgt}$) ⇒ return false
16: // target is div
17: | (_ := div _ $b_{src}$), (_ := div _ $b_{tgt}$) ⇒ return $b_{src} \sim_p b_{tgt}$
18: | _, (_ := div _ $b_{tgt}$) ⇒ return IsNonzero($P, b_{tgt}$)
19: // misc.
20: | _, _ ⇒ return true
21: end match

Algorithm 5 CalcPostAssnCmd($P$: Assn, $I_{src}, I_{tgt}$: Command): Assn

1: $P' := \text{Prune}(P, I_{src}, I_{tgt})$
2: ($P''_t, P''_s, M'') := \text{AddMemoryPreds}(I_{src}, I_{tgt}, P')$
3: $P'''_s := \text{AddLessdefPreds}(I_{src}, P''_s)$
4: $P'''_t := \text{AddLessdefPreds}(I_{tgt}, P''_t)$
5: return ReduceMaydiff($P'''_s, P'''_t, M''$)

calls, we proved semantics preservation of programs using the basic approach of parametric bisimulation [21]. See /coq/proof/SimulationLocal.v:164 and /coq/proof/AdequacyLocal.v:243 for more details.

H.2 Post-Assertion Computation for Commands

CheckEquivBeh Algorithm 4 is the CheckEquivBeh() algorithm for commands (see /coq/def/Postcond.v:769). Here, $x_{src} \sim_p y_{tgt}$ means one of the followings holds: (i) $x = y$ and $x$ is not in the maydiff set; (ii) ($x_{src} \equiv y_{src}$) $\in P_{src}$ and $y$ is not in the maydiff set; or (iii) $x$ is not in the maydiff set and ($x_{tgt} \equiv y_{tgt}$) $\in P_{tgt}$. Basically, this implies $x_{src} \sim_{\alpha} y_{tgt}$ holds for all $\alpha$ that is compatible with $P$. Also, $e_{src} \sim_p e'_{tgt}$ means $e$ and $e'$ are of the same expression kind, e.g., they are both add, and for all matching operands $x, x', x_{src} \sim_p x'_{tgt}$ holds. IsNonzero() performs an analysis for proving that the value is nonzero. For simplicity, we omit the details.

In essence, if the target instruction may emit an event or invoke undefined behavior, the source instruction should be similar to that. Note that we allow source load instruction with target lnop instruction, which indeed occurs in the validation of mem2reg.

CalcPostAssn Algorithm 5 is the post-assertion computation algorithm for commands $I_{src}, I_{tgt}$ (see /coq/def/Postcond.v:1058 for definition and /coq/proof/SoundPostcondCmd.v:309, /coq/proof/SoundPostcondCall.v:254 for the soundness of (PostAssn) for commands). At line 1, it removes predicates that no longer hold after executing the commands (Prune). Then at line 2, it adds Uniq and Priv predicates when necessary (AddMemoryPreds). At lines 3-4, it adds lessdef predicates to unary assertions for both source and target (AddLessdefPreds), and at line 5, finally tries to remove registers from the maydiff set.
Algorithm 6 Prune\(P:\) Assn, \(I_{src}, I_{tgt}:\) Command) Assn

1: \((P_{src}, P_{tgt}, M) := P\)
2: \(P'_{src} := \text{PruneU}(P_{src}, I_{src})\)
3: \(P'_{tgt} := \text{PruneU}(P_{tgt}, I_{tgt})\)
4: \(M' := M \cup \{\text{Def}(I_{src}), \text{Def}(I_{tgt})\}\)
5: \text{return} \((P'_{src}, P'_{tgt}, M')\)

**Prune** Algorithm 6 is the assertion pruning algorithm for commands \(I_{src}, I_{tgt}\) (see /coq/def/Postcond.v:355, /coq/def/Postcond.v:386). At lines 2-3, it removes predicates from unary assertions for both source and target (PruneU). Then at line 4, the left-hand sides of \(I_{src}, I_{tgt}\) are added to the maydiff set.

**PruneU** PruneU\((S, I)\) removes the following memory predicates from a unary assertion:
- If \(I\) defines a register \(r\), remove all predicates on \(r\) (see /coq/def/Postcond.v:373).
- If \(I\) is a store instruction \(*p := v, \text{remove all lessdef equations on } *q\) for which we cannot prove \(p \perp q\). We can prove \(p \perp q\) if we have it as a predicate, or \([p \neq q, \text{either } p \text{ or } q \text{ is unique, and the other is physical}]\) holds (see /coq/def/Postcond.v:341).
- If \(I\) is a call instruction, remove all lessdef equations on \(*q\) unless \(\text{Priv}(q)\) holds (see /coq/def/Postcond.v:409).
- Remove Uniq\((p)\) if \(p\) leaked, i.e., copied to another register, used as the instruction’s operand (see /coq/def/Postcond.v:347), or a function is called unless \(\text{Priv}(p)\) holds (see /coq/def/Postcond.v:411).

**AddMemoryPreds** AddMemoryPreds\((I_{src}, I_{tgt}, P)\) adds the following memory predicates to \(P\):
- If \((I_{src}, I_{tgt})\) are allocations \((p_{src} := \text{alloca}(\cdots)), (p_{tgt} := \text{alloca}(\cdots))\), add Uniq\((p_{src})\) to the source assertion. If \(p_{src} = p_{tgt}\), then remove \(p_{src}\) from the maydiff set (see /coq/def/Postcond.v:893).
- If \(I_{src}\) is an allocation \((p_{src} := \text{alloca}(\cdots))\) and \(I_{tgt}\) is 1nop, then add Uniq\((p_{src})\), Priv\((p_{src})\) to the source assertion (see /coq/def/Postcond.v:905).
- If \((I_{src}, I_{tgt})\) are call instructions \((x_{src} := \text{call}(\cdots)), (x_{tgt} := \text{call}(\cdots))\) and \(x_{src} = x_{tgt}\), then remove \(x_{src}\) from the maydiff set (see /coq/def/Postcond.v:927).

**AddLessdefPreds** AddLessdefPreds\((I, S)\) adds the following lessdef predicates to \(S\):
- If \(I\) is a side-effect-free operation \((x := \text{op args})\), add \((x \sqsubseteq \text{op args})\) and \((\text{op args} \sqsupseteq x)\) (see /coq/def/Postcond.v:936). Note that load is regarded as side-effect-free.
- If \(I\) is a store instruction \(*p := v, \text{add } *p \sqsubseteq v\) (see /coq/def/Postcond.v:939).
- If \(I\) is an allocation instruction \(*p := \text{alloca}(\cdots), \text{add } *p \sqsubseteq \text{undef}\) (see /coq/def/Postcond.v:942).

### H.3 Post-Assertion Computation for Phinodes

Phinodes do not emit any events so that any pair of phinodes behaves equivalently. Thus, for phinodes, CheckEquivBeh() checks nothing. As described in §4, CalcPostAssn() works for phinodes as follows (see /coq/def/Postcond.v:689 for the definition and /coq/proofof/SoundPostcondPhinodes.v:960 for the soundness of (PostAssn) for phinodes):
- Remove old registers, and copy predicates on physical registers into those about old ones (see /coq/def/Postcond.v:673).
- Remove predicates on those registers defined in the phinodes (see /coq/def/Postcond.v:674).
- Add predicates \(x \sqsubseteq y, y \sqsupseteq x\) for each assignment \(x := y\) that is performed in the phinodes (see /coq/def/Postcond.v:680).
- Tries to reduce the maydiff set (see /coq/def/Postcond.v:686).

**Misc.** CheckCFG is implemented in /coq/def/Validator.v:76 and /coq/def/Validator.v:129.

CheckInit is implemented in /coq/def/Validator.v:247, and its specification is proved in /coq/proofof/SimulationValid.v:1314.

CheckIncl is implemented in /coq/def/Hints.v:187, and the soundness of (Incl) is proved in /coq/proofof/SoundInclimplies.v:279.

### I Non-Arithmetic Inference Rules

In order to support defreg, gvn, and licm, we use 9 non-arithmetic inference rules. In the Coq development, we define the 9 rules in /coq/def/Infrules.v:392, and formally verified their soundness in /coq/proofof/SoundInfrules.v:52.

Each of the rules is based on one of the proof rules in Fig. 16. Here, \(rT_{src} \sim rT_{tgt}\) in premises means \(rT_{src} \sim_p rT_{tgt}\) for the current assertion \(P\). Also, \(e_{src} \sim e'_{tgt}\) in conclusions means \(e\) and \(e'\) are of the same expression kind, e.g., they are both add, and for all matching operands \(rT, rT'\), \(rT_{src} \sim_p rT_{tgt}\) holds for the current assertion \(P\). \(rT_{src} \sim rT_{tgt}\) in conclusions means you can remove \(rT\) from the maydiff set.
(transitivity)
\[
\frac{e_{src} \sqsupset e'_{src} \quad e'_{src} \sqsupset e''_{src}}{e_{src} \sqsupset e''_{src}}
\]

(transitivity_tgt)
\[
\frac{e_{tgt} \sqsupset e'_{tgt} \quad e'_{tgt} \sqsupset e''_{tgt}}{e_{tgt} \sqsupset e''_{tgt}}
\]

(substitute)
\[
\frac{vT_{src} \sqsupset vT'_{src}}{e_{src} \sqsupset e_{src}[vT_{src} \mapsto vT'_{src}]}
\]

(substitute_rev)
\[
\frac{vT_{src} \sqsupset vT'_{src}}{e_{src}[vT'_{src} \mapsto vT_{src}] \sqsupset e_{src}}
\]

(substitute_tgt)
\[
\frac{vT_{tgt} \sqsupset vT'_{tgt}}{e_{tgt} \sqsupset e_{tgt}[vT'_{tgt} \mapsto vT_{tgt}]}\]

(intro_ghost)
\[
\frac{e_{src} \sim e_{tgt}}{e_{src} \sqsupset \hat{g}_{src}, \hat{g}_{tgt} \sqsupset e_{tgt}} \quad \hat{g} \text{ not used}
\]

(intro_eq_tgt)
\[
\frac{e_{tgt} \sqsupset e_{tgt}}{e_{tgt} \sqsupset e_{tgt}}
\]

(reduce_maydiff_non_physical)
\[
\frac{rT_{src} \sim rT_{tgt}}{rT_{src} \sim rT_{tgt}} \quad rT \text{ is not physical (i.e. ghost or old) and not used}
\]

(reduce_maydiff_lessdef)
\[
\frac{rT_{src} \sqsupset e_{src} \quad e_{src} \sim e'_{tgt} \quad e'_{src} \sqsupset rT_{src}}{rT_{src} \sim rT_{tgt}}
\]

\[\]