Conferences >2019 IEEE International Sympo...

Enhancing Server Efficiency in the Face of Killer Microseconds

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fas...Show More

Metadata

Abstract:

We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads' state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice's QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.

Published in: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Date of Conference: 16-20 February 2019

Date Added to IEEE Xplore: 28 March 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/HPCA.2019.00037

Conference Location: Washington, DC, USA

References is not available for this document.

Contents

I. Introduction

We are entering the “killer microsecond” era in data center applications [1]. Due to advances in processor, memory, storage, and networking technologies, events that stall execution increasingly fall in a microsecond-scale latency range. Accesses to emerging storage-class memories [2]–[9], rack-scale memory disaggregation [10]–[14], 100+ gigabit network communication [15], and accelerator/GPU micro-offloads [16]–[18] are example program activities that incur microsecond delays.

Select All

L. Barroso, M. Marty, D. Patterson and P. Ranganathan, "Attack of the killer microseconds", Communications of the ACM, vol. 60, no. 4, pp. 48-54, 2017.

CrossRef Google Scholar

N. Agarwal and T. F. Wenisch, "Thermostat: Application-transparent page management for two-tiered main memory", Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 631-644, 2017.

CrossRef Google Scholar

J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, et al., "Better i/o through byte-addressable persistent memory", ACM SIGOPS 22nd symposium on Operating systems principles, pp. 133-146, 2009.

CrossRef Google Scholar

S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, et al., "System software for persistent memory", European Conference on Computer Systems, 2014.

CrossRef Google Scholar

A. Mirhosseini, A. Agrawal and J. Torrellas, "Survive: Pointer-based in-dram incremental checkpointing for low-cost data persistence and rollback-recovery", IEEE Computer Architecture Letters, vol. 16, no. 2, pp. 153-157, 2017.

Enhancing Server Efficiency in the Face of Killer Microseconds

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?