Conferences >2019 IEEE International Sympo...

Enhancing Server Efficiency in the Face of Killer Microseconds

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fas...Show More

Metadata

Abstract:

We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads' state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice's QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.

Published in: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Date of Conference: 16-20 February 2019

Date Added to IEEE Xplore: 28 March 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/HPCA.2019.00037

Conference Location: Washington, DC, USA

Amirhossein Mirhosseini

University of Michigan

Akshitha Sriraman

University of Michigan

Thomas F. Wenisch

University of Michigan

Contents

I. Introduction

We are entering the “killer microsecond” era in data center applications [1]. Due to advances in processor, memory, storage, and networking technologies, events that stall execution increasingly fall in a microsecond-scale latency range. Accesses to emerging storage-class memories [2]–[9], rack-scale memory disaggregation [10]–[14], 100+ gigabit network communication [15], and accelerator/GPU micro-offloads [16]–[18] are example program activities that incur microsecond delays.

Amirhossein Mirhosseini

University of Michigan

Akshitha Sriraman

University of Michigan

Thomas F. Wenisch

University of Michigan

References is not available for this document.

MIT Libraries

MIT Libraries

Enhancing Server Efficiency in the Face of Killer Microseconds

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Enhancing Server Efficiency in the Face of Killer Microseconds

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References