Probabilistic programs use familiar notation of programming languages to specify probabilistic models. Suppose we are interested in estimating the distribution of the return expression r of a probabilistic program P. We are interested in slicing the probabilistic program P and obtaining a simpler program SLI(P) which retains only those parts of P that are relevant to estimating r, and elides those parts of P that are not relevant to estimating r. We desire that the SLI transformation be both correct and efficient. By correct, we mean that P and SLI(P) have identical estimates on r. By efficient, we mean that estimation over SLI(P) be as fast as possible.
We show that the usual notion of program slicing, which traverses control and data dependencies backward from the return expression r, is unsatisfactory for probabilistic programs, since it produces incorrect slices on some programs and sub-optimal ones on others. Our key insight is that in addition to the usual notions of control dependence and data dependence that are used to slice non-probabilistic programs, a new kind of dependence called observe dependence arises naturally due to observe statements in probabilistic programs.
We propose a new definition of SLI(P) which is both correct and efficient for probabilistic programs, by including observe dependence in addition to control and data dependences for computing slices. We prove correctness mathematically, and we demonstrate efficiency empirically. We show that by applying the SLI transformation as a pre-pass, we can improve the efficiency of probabilistic inference, not only in our own inference tool R2, but also in other systems for performing inference such as Church and Infer.NET.