Video publishers (e.g., Netflix, Youtube) continually seek to improve their systems through the deployment of new ABR algorithms, and changes to their system (e.g., reduce the amount of client buffer, add a new video quality). Given data collected from real video streaming sessions, they must answer "what-if" questions that involve predicting what the performance would be if a change were made, a task also known as causal reasoning. While one approach for causal reasoning involves Randomized Control Trials (RCTs), RCTs must be deployed conservatively given they impact the performance of real users. In this paper, we present Veritas, the first framework that tackles causal reasoning for video streaming without requiring data collected through RCTs. Causal reasoning is challenging owing to the intrinsic network bandwidth acting as latent confounder, and owing to the cascaded effects that past ABR decisions have on the future. Integral to Veritas is an easy-to-interpret domain-specific ML model that allows necessary latent variables to be inferred from actual observations (chunk download times) while exploiting knowledge of TCP states (e.g., congestion window) to facilitate inference. Validations with data of real video sessions show Veritas acccurately tackles a wide range of what-if questions, often close to an ideal oracle, while not requiring RCT training data.
The tremendous success of Internet video has led to a growing interest in new forms of video content such as 360° video. Encoding and streaming the entire 360° view to the client is prohibitively expensive, and can consume 5-6X the bandwidth needed to stream only the relevant portions of the video to the user. While streaming only the user's viewport can help, it is challenging to accurately predict the viewport, and mispredictions may hurt interactive experience. In this paper, we present Dragonfly, a new 360° system that preserves interactive experience by avoiding playback stalls while maintaining high perceptual quality. A central contribution of Dragonfly is proactive skipping, where some viewport tiles can be prudently and deliberately skipped to optimize overall experience. Using a user study with 26 users and emulation-based experiments we show that Dragonfly has higher quality, and lower overheads, than state-of-the-art 360° streaming approaches.
Ensuring high quality Internet video delivery requires ABR algorithms whose performance depends on network-sensitive parameters. This paper presents Oboe, a system that auto-tunes ABR algorithm parameters to variable and heterogeneous network conditions. Oboe significantly improves state-of-the-art approaches including MPC (from Carnegie Mellon), and BOLA (an algorithm adopted by dash.js, a popular open-source video streaming software). Further, Oboe betters Pensieve, a recent reinforcement learning based ABR from MIT, by 24% on a composite video delivery metric by better specializing ABR behavior across network states. The paper has been released with bandwidth traces of video streaming sessions, and has already been cited over 280 times since publication.
This paper and my earlier paper (Parcel; CoNext14) have developed systems that reduces web latency over cellular networks, by performing execution redundantly at a cloud-based proxy to identify which objects must be pushed to the client. Such redundant execution is in contrast to prior works that completely eliminates client execution, reducing responsiveness on user interactions. Our system Nutshell scales cloud-based Javascript (JS) execution approaches by reducing unnecessary proxy execution exploiting two unique opportunities. First, only code needed to fetch objects to be pushed to the client needs to be executed. Second, even if code is aggressively eliminated, only the redundant proxy execution is affected. Built around these observations, Nutshell sustained 27% higher user requests per second compared to fully redundant execution for a range of web page popularity models, while preserving the latency benefits. NutShell achieves speedups in median page load times of 1.5X compared to HTTP/2 (an industry-driven protocol standard) on a live LTE network. A joint patent with AT&T has been awarded.
The conventional wisdom throughout the 1990's was that router level support (IP Multicast) was necessary to efficiently support multicast functionality. Despite tens of hundreds of publications through the 1990's, extensive Internet standardization efforts, and implementation by router vendors such as Cisco, IP Multicast saw limited success. The above paper made the argument that an end system overlay approach with no router support could efficiently support multicast functionality. The paper opened a new area of research in peer-to-peer multicast, won the ACM Sigmetrics Test of Time Award in 2011, and has over 3600 citations. In a susequent work, Sanjay and his colleagues developed a prototype overlay multicast system which was used to broadcast several tens of events including ACM Sigcomm and ACM SOSP (the experience described in a Usenix ATC 2004 paper).
When managing wide-area networks, network architects must decide how to balance multiple conflicting metrics, and ensure fair allocations to competing traffic while prioritizing critical traffic. The state of practice poses challenges since architects must precisely encode their intent into formal optimization models using abstract notions such as utility functions, and ad-hoc manually tuned knobs. In this paper, we present the first effort to synthesize optimal network designs with indeterminate objectives using an interactive program synthesis-based approach. The paper presents Comparative Synthesis, an interactive synthesis framework which produces near-optimal programs (network designs) without an objective explicitly given. We implemented Net10Q, a system based on our approach, and demonstrate its effectiveness on real-world network case studies, as well as a pilot user study comprising network researchers and practitioners. Both theoretical and experimental results show the promise of our approach.
Existing traffic engineering (TE) schemes either (i) optimally reroute traffic on each failure through centralized mechanisms, thereby achieving high efficiencies but with poor resilience since network links may be congested during failure recovery; or (ii) proactively guarantee the network remains congestion-free during failures by allocating bandwidth conservatively and using fast local repair mechanisms. Unfortunately, these approaches carry far less traffic than the network is capable. In this paper, we developed PCF, a novel set of congestion-free routing mechanisms that bridge this gap. PCF achieves these goals by better modeling network structure, and by carefully enhancing the flexibility of network response while ensuring that the performance under failures can be tractably modeled. All of PCF’s schemes involve relatively light-weight operations on failures. Our empirical results over 21 Internet topologies show PCF sustains higher throughput than the state-of-the-art congestion-free mechnasism (from Microsoft Research) by a factor of 1.11X to 1.5X on average across the topologies, while providing a benefit of 2.6X in some cases.
This paper has developed a novel approach to formally certify and synthesize network designs that are provably congestion-free over combinatorially many scenarios that a network may operate in (e.g., traffic demands, failures) while modeling flexible real-world network response strategies. The paper does so by establishing surprising connections between optimization-theoretic techniques, and mechanisms that networks use to respond to failures. This work significantly advances existing practice which uses ad-hoc simulation-based testing, and prior theory (e.g., robust optimization, oblivious routing) which only apply to highly restrictive network response, and to the worst-case. My subsequent work in Sigmetrics 2020 was the first to show how a network could be designed to be provably congestion-free over scenarios that occur a desired percentage of time rather than just consider the worst-case, which is conservative.
This paper presented a formal approach for verifying that low-level router configurations correctly implement Class of Service (CoS) policies for differential treatment of traffic (e.g., prioritizing video over other traffic). A tool implemented using Binary Decision diagrams was successfully used to discover anomalies in 150 real-world production networks. This work was done in collaboration with AT&T Research, and was awarded a joint patent. This paper was an early and influential work in the area of network verification, and the first to consider Class of Service policies, an important domain in networking.