Overview

Computer networks must satisfy various policies like reachability, isolation, waypointing etc. There are multiple network features that need to be configured to satisfy these policies. Some of these features are routing protocols like OSPF and BGP, route filters, access control lists, etc. This collectively forms a network’s control plane. Configuring all of these features is complex and this makes control plane configuration error-prone. In ARC, we show how to identify these errors. CPR deals with repairing control plane to fix the identified errors. Repair can be extremely challenging as - (a) a repair may involve changes to multiple routers, (b) a repair that fixes one policy violation may trigger another violation for the same/different traffic class, and, (c) not all valid repairs are equally desirable (e.g. prefer repairs that adds fewer configuration lines). CPR first creates a digraph-based representation of a control plane’s semantics and then casts configuration repair as a MaxSMT problem. Currently, we are working on a new control plane repair framework which supports more policies and favors different kinds of repairs.

CPR

Paper
Automatically Repairing Network Control Planes Using an Abstract Representation, SOSP 2017
Paper Abstract

The forwarding behavior of computer networks is governed by the configuration of distributed routing protocols and access filters—collectively known as the network control plane. Unfortunately, control plane configurations are often buggy, causing networks to violate important policies: e.g., specific traffic classes (defined in terms of source and destination endpoints) should always be able to reach their destination, or always traverse a waypoint. Manually repairing these configurations is daunting because of their inter-twined nature across routers, traffic classes, and policies. Inspired by recent work in automatic program repair, we introduce CPR, a system that automatically computes correct, minimal repairs for network control planes. CPR casts configuration repair as a MaxSMT problem whose constraints are based on a digraph-based representation of a control plane’s semantics. Crucially, this representation must capture the dependencies between traffic classes arising from the cross-traffic-class nature of control plane constructs. The MaxSMT formulation must account for these dependencies whilst also accounting for all policies and preferring repairs that minimize the size (e.g., number of lines) of the configuration changes. Using configurations from 96 data center networks, we show that CPR produces repairs in less than a minute for 98% of the networks, and these repairs requiring changing the same or fewer lines of configuration than hand-written repairs in 79% of cases.

Talks

Slides presented at SOSP 2017.

Code

http://bitbucket.org/uw-madison-networking-research/arc