TL;DR: We introduce a novel diffusion model-based framework, DYffusion, for large-scale probabilistic forecasting. We propose to couple the diffusion steps with the physical timesteps of the data, leading to temporal forward and reverse processes that we represent through a stochastic interpolator and a deterministic forecaster network, respectively. These design choices effectively address the challenges of generating stable, accurate and probabilistic rollout forecasts.

![DYffusion Diagram](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExOXpvdHB5bGY1aWltbTdoYTdxNW03bmdxaG9tMDN6dGY1ZTZ2OWU5ZCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/h7yQszDENzsSiIUOpJ/giphy.gif) *DYffusion forecasts a sequence of* $h$ *snapshots* $\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_h$ *given the initial conditions* $\mathbf{x}_0$ *similarly to how standard diffusion models are used to sample from a distribution.*

Motivation for our work

Obtaining accurate and reliable probabilistic forecasts is an important component of policy formulation, risk management, resource optimization, and strategic planning with a wide range of applications from climate simulations and fluid dynamics to financial markets and epidemiology. Often, accurate long-range probabilistic forecasts are particularly challenging to obtain. When they exist, physics-based methods typically hinge on computationally expensive numerical simulations. In contrast, data-driven methods are much more efficient and have started to have real-world impact in fields such as global weather forecasting.

Generative modeling, and especially diffusion models, have shown great success in other fields such as natural image generation, and video synthesis. Diffusion models iteratively transform data back and forth between an initial distribution and the target distribution over multiple diffusion steps. The standard approach (e.g. see this excellent blog post) is to corrupt the data with increasing levels of Gaussian noise in the forward process, and to train a neural network to denoise the data in the reverse process. Due to the need to generate data from noise over several sequential steps, diffusion models are expensive to train and, especially, to sample from. Recent works such as Cold Diffusion, by which our work was especially inspired, have proposed to use alternative data corruption processes like blurring.

Problem: Common approaches for large-scale spatiotemporal problems tend to be deterministic and autoregressive. As such, they are often unable to capture the inherent uncertainty in the data, produce unphysical predictions, and are prone to error accumulation for long-range forecasts. It is natural to ask how we can efficiently leverage diffusion models for large-scale spatiotemporal problems and incorporate the temporality of the data into the diffusion model.

DYffusion presents a natural solution for both these issues, by designing a temporal diffusion model (leads to naturally training to forecast multiple steps) and embedding it into the “generalized diffusion model” framework so that by taking inspiration from existing diffusion models we can build a strong probabilistic forecasting model.

Gaussian diffusion DYffusion

DYffusion is the first diffusion model that relies on task-informed forward and reverse processes. All other existing diffusion models, albeit more general, use data corruption-based processes. As a result, our work provides a new perspective on designing a capable diffusion model, and may lead to a whole family of task-informed diffusion models.

For more details, please check out our NeurIPS 2023 paper, and our code on GitHub.