Overview

RDMA over Converged Ethernet (RoCE) promises low latency and low CPU utilization over commodity networks, and is attractive for cloud infrastructure services. Current implementations require Pri- ority Flow Control (PFC) that uses backpressure-based congestion control to provide lossless networking to RDMA. Unfortunately, PFC compromises network stability. As a result, RoCE’s adoption has been slow and requires complex network management. Recent e orts, such as DCQCN, reduce the risk to the network, but do not completely solve the problem.

We describe RoGUE, a new congestion control and recovery mechanism for RDMA over Ethernet that does not rely on PFC. RoGUE is implemented in software to support backward compat- ibility and accommodate network evolution, yet allows the use of RDMA for high performance, supporting both the RC and UC RDMA transports. Our experiments show that RoGUE achieves per- formance and CPU utilization matching or outperforming native RDMA protocols but gracefully tolerates congested networks.

Paper
RoGUE: RDMA over Generic Unconverged Ethernet, SoCC 2018
Talks

Slides presented at SoCC 2018