Skip to content

Latest

More results

Systems

Arboretum: A Planner for Large-Scale Federated Analytics with Differential Privacy

View Publication

Author

Elizabeth Margolin (University of Pennsylvania), Karan Newatia (University of Pennsylvania), Tao Luo (University of Pennsylvania), Edo Roth (University of Pennsylvania), Andreas Haeberlen (University of Pennsylvania / Roblox)

Venue

SOSP 2023

Abstract

Federated analytics is a way to answer queries over sensitive data that is spread across multiple parties, without sharing the data or collecting it in a single place. Prior work has developed solutions that can scale to large deployments with millions of devices but, due to the distributed nature of federated analytics, these solutions can support only a limited class of queries - typically various forms of numerical queries, which can be answered with lightweight cryptographic primitives. Supporting richer queries, such as categorical queries, requires heavier cryptography, whose cost can quickly exceed even the resources of a powerful data center. In this paper, we present Arboretum, a new federated analytics system that can efficiently answer a broader range of queries, including categorical queries, in deployments with millions or even billions of participants. Arboretum achieves this by 1) automatically optimizing query plans to find highly efficient ways to answer each query, and by 2) including the participant devices in the computation. Our evaluation shows that Arboretum can match the cost of earlier systems that have been hand-optimized for particular kinds of queries, and that it can additionally support a range of new queries for which no efficient solution exists today.