HTM: Hardware Transactional Memory

transactional programming

thread-level speculation

security

Hardware lock elision (HLE)*

Over the last few years, hardware supports for transactional memory (TM) have been integrated in several mainstream commercial processors employed in a variety of computing platforms, ranging from commodity systems (Intel’s Haswell [33]), to servers (IBM’s POWER8 [20]) and super computers (IBM zEC12 [17]).

Processors equipped with hardware transactional memory (HTM) include assembly instructions that provide support for demarcating code blocks, which are guaranteed to be exe- cuted as atomic transactions. The HTM system is responsible for ensuring semantics equivalent to sequential execution of transactions. In order to maximize parallelism, though, transactions are executed in a speculative fashion, exploiting hardware facilities (typically extensions of the processor’s cache coherency protocol) to detect conflicts at run-time and abort/restart transactions that would violate consistency.

The availability of HTM in commercial processors enables a range of novel applications, ranging from transactional programming [1] to thread-level speculation [30] and security[12]. Hardware lock elision (HLE) is probably among the most exciting ones, as well as, arguably, one of the main drivers motivating the commercial adoption of HTM.

HLE allows for enhancing the concurrency of legacy lock- based code in a simple, yet effective way: by executing critical sections as speculative hardware transactions. Several recent studies, e.g., [10, 20], have demonstrated the potentiality of HLE to boost the parallelism of complex legacy applications [8, 29]. However, these studies also highlighted some relevant limitations of HLE approaches, which are inherently rooted to the restricted nature of current HTM implementations. In particular, existing HTM systems can only guarantee the correctness (and hence allow the commit) of transactions that perform a limited number of memory accesses. Transactions that exceed the hardware capacity have to be aborted and re-executed using a pessimistic lock-based synchronization. This can severely hamper performance and limit the current scope of applicability of HLE techniques.