CRT 27: Fast, Distributed Bayes for Everyone | CANSSI | Canadian Statistical Sciences Institute

CRT 27 Leads — Lead investigators (left to right, top to bottom): Alexandre Bouchard-Côté, Trevor Campbell, Philippe Gagnon, Liangliang Wang

Collaborative Research Team Projects – Project 27

Fast, Distributed Bayes for Everyone

Bayesian modelling is widely used in all fields of science and engineering to model reality while quantifying uncertainty. Yet the computational cost of Bayesian methods still prevents the development of more ambitious scientific models and limits the accuracy of their predictions. This project will develop foundational methods in large-scale distributed Bayesian inference and will release these in an open-source package, Pigeons (prototype available here), that will enable statisticians to harness distributed/cloud computing to perform Bayesian computation at scale.

Research Category:
Region: National
Date: 2025-2028

Why Are New Foundational Methods in Large-scale Distributed Bayesian Inference Needed?

Given the optimality properties of Bayesian methods, why are they merely “widely used” rather than used for all prediction problems? The key reason is simple: their computational cost.

For difficult problems, despite recent advances in computational Bayesian methods, there is still significant room for improvement, especially with regard to two main areas: (1) leveraging modern, massively parallel distributed and cloud computing platforms, and (2) making Bayesian inference using such platforms available to end-users without any expertise in distributed computing.

We predict that the ability to massively parallelize Bayesian computation without requiring specialized user knowledge will be transformational: tasks that would previously take several hours can be reduced to tens of seconds, enabling rapid model development for a considerably expanded class of problems. Initial testing of our novel platform on a cancer genomic phylogenetic tree reconstruction problem has already demonstrated promising results: by utilizing distributed computing, we achieved a speed-up of more than 700 times compared to non-distributed inference.

Research Objectives and Projects

The ultimate goal of our project is to enable users to leverage the power of large-scale distributed computing platforms for Bayesian inference involving complicated posterior distribution, without needing any expertise in distributed computing.

At a high level, our proposal will consist of four methodology projects addressing key challenges to distributed Bayesian inference, along with two cross-projects through which all trainees and team leaders will interact:

Cross-project 1: Applying Distributed Bayesian Computing to Real-world Scientific Problems. The goal of this cross-project is to refine Method Projects 1–4 to ensure their alignment with the actual needs of practitioners.

Cross-project 2: Integrating New Methodologies into an Open-source Library. Starting in year 1, new PhD students will join the development team of our open-source software library for distributed Bayesian computing (Pigeons). The goal is that by the end of their PhDs, the students will have contributed their methodological work to this software library, which already has a healthy user community.

Method Project 1: Enabling the Full Generality of Bayesian Inference in a Distributed, Strongly Scaling Framework. We are developing software that eliminates the limitations of many popular existing packages and building software bridges that enable the user to express their problem in the modelling language of their choice.

Method Project 2: Automated Tuning in a Distributed Environment. Bayesian inference algorithms usually include parameters that can be set (tuned) to adapt the method to a particular problem. This project will adapt automated tuning methods designed for the single-machine, local computation setting to the distributed environment.

Method Project 3: Advanced Distributed Swapping Methods. We will investigate alternative inter-chain communication schemes, develop distributed versions of these advanced swap schemes, incorporate them into our open source software, and analyze their scalability.

Method Project 4: Distributed Particle Methods. We will adapt the work we have done in distributed non-reversible parallel tempering (NRPT) to the annealed Sequential Monte Carlo (SMC) domain. PT and SMC methods can also be combined to obtain different performance trade-offs. We will work with applied collaborators to identify regimes where options are currently lacking.

People Behind the Project

Project Team

Alexandre Bouchard-Côté | University of British Columbia
Trevor Campbell | University of British Columbia
Philippe Gagnon | Université de Montréal
Liangliang Wang | Simon Fraser University

Collaborators

Abigail Azari | University of British Columbia and Incoming Faculty, University of Alberta
Cindy Feng | Dalhousie University
Saifuddin Syed | University of Oxford
William Thompson | National Research Council of Canada
Paul Tiede | Harvard University and Smithsonian Astrophysical Observatory

Explore More Stories

National Region

2023 – 2026 Collaborative Research Teams CRT #25: Statistical Tools for Spatio-temporal Sensor-based Traffic Data

National Region

2022 – 2025 Collaborative Research Teams CRT #23: Synthetic Data and Risk Measures for Statistical Disclosure Control

National Region

2022 – 2025 Collaborative Research Teams CRT #22: Advancing Statistical Methods for the Analysis of Complex Biologging Data Collected from Humans and Animals

National Region

2021 – 2024 Collaborative Research Teams CRT #21: Natural Catastrophes: Are Canadian Insurers Ready for “The Big One”?