Ph.D. dissertation by Sung-Hwan Lee


Submitted to the Department of Computer Science and Engineering, Seoul National University, Korea

Thesis committee



Abstract


Defining a relaxed memory model has been a major challenge for a couple of decades. The challenge stems from a sharp conflict between the two desiderata for relaxed memory models, usability and efficiency. Usability requires a model to be sensible and understandable, allowing programmers to use the model for developing and reasoning about concurrent programs. Efficiency requires a model to be efficiently implementable, i.e., it validates common compiler transformations and can be efficiently mapped to hardware. While the two desiderata are the key principles in designing a relaxed memory model, a significant challenge exists in balancing between them.
This dissertation thoroughly understands the desiderata for relaxed memory models, identifies inherent conflicts between them, and resolves the conflicts at a minimal price. The first part of this dissertation presents PS 2.0, the first relaxed memory model that provably supports compiler transformations based on global analyses. PS 2.0 redesigns the key components of the promising semantics (PS) by Kang et al. and validates global value optimization and register promotion, while preserving the known results for PS, such as data-race-freedom guarantees. PS 2.0 also resolves the problem of the inefficiency in compiling read-modify-write (RMW) operations of PS to Armv8, which requires an unintended extra fence. The second part investigates the problem of developing an in-order memory model that executes instructions following their program order, placing emphasis on amenability to reasoning and verification. We demonstrate that an in-order model can validate all the compiler optimizations performed on single-threaded code by utilizing the distinction between an in-order source model and an out-of-order intermediate representation model. For atomics, we propose a pragmatic solution for mapping relaxed writes, for which reordering with previous reads should be prevented, with negligible performance overhead.