publications
(*) denotes equal contribution
2024
- NeurIPSSpotlightIn Advances in Neural Information Processing Systems, 2024.Spotlight Presentation; Best Paper Award at ICML ML4ESM workshop 2024
Data-driven deep learning models are transforming global weather forecasting. It is an open question if this success can extend to climate modeling, where the complexity of the data and long inference rollouts pose significant challenges. Here, we present the first conditional generative model that produces accurate and physically consistent global climate ensemble simulations by emulating a coarse version of the United States’ primary operational global forecast model, FV3GFS. Our model integrates the dynamics-informed diffusion framework (DYffusion) with the Spherical Fourier Neural Operator (SFNO) architecture, enabling stable 100-year simulations at 6-hourly timesteps while maintaining low computational overhead compared to single-step deterministic baselines. The model achieves near gold-standard performance for climate model emulation, outperforming existing approaches and demonstrating promising ensemble skill. This work represents a significant advance towards efficient, data-driven climate simulations that can enhance our understanding of the climate system and inform adaptation strategies.
@inproceedings{cachay2023probemulation, title = {Probabilistic Emulation of a Global Climate Model with {Spherical DYffusion}}, author = {R{\"u}hling Cachay, Salva and Henn, Brian and Watt-Meyer, Oliver and Bretherton, Christopher S. and Yu, Rose}, booktitle = {Advances in Neural Information Processing Systems}, media = {https://today.ucsd.edu/story/accelerating-climate-modeling-with-generative-ai}, year = {2024}, }
2023
- NeurIPSIn Advances in Neural Information Processing Systems, 2023.
While diffusion models can successfully generate data and make predictions, they are predominantly designed for static images. We propose an approach for efficiently training diffusion models for probabilistic spatiotemporal forecasting, where generating stable and accurate rollout forecasts remains challenging, Our method, DYffusion, leverages the temporal dynamics in the data, directly coupling it with the diffusion steps in the model. We train a stochastic, time-conditioned interpolator and a forecaster network that mimic the forward and reverse processes of standard diffusion models, respectively. DYffusion naturally facilitates multi-step and long-range forecasting, allowing for highly flexible, continuous-time sampling trajectories and the ability to trade-off performance with accelerated sampling at inference time. In addition, the dynamics-informed diffusion process in DYffusion imposes a strong inductive bias and significantly improves computational efficiency compared to traditional Gaussian noise-based diffusion models. Our approach performs competitively on probabilistic forecasting of complex dynamics in sea surface temperatures, Navier-Stokes flows, and spring mesh systems.
@inproceedings{cachay2023dyffusion, title = {{DYffusion:} A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting}, author = {R{\"u}hling Cachay, Salva and Zhao, Bo and Joren, Hailey and Yu, Rose}, booktitle = {Advances in Neural Information Processing Systems}, url = {https://openreview.net/forum?id=WRGldGm5Hz}, year = {2023}, }
2021
- NeurIPSIn Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
Numerical simulations of Earth’s weather and climate require substantial amounts of computation. This has led to a growing interest in replacing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has made them a popular target for neural network-based emulators. However, prior work is hard to compare due to the lack of a comprehensive dataset and standardized best practices for ML benchmarking. To fill this gap, we build a large dataset, ClimART, with more than \emph10 million samples from present, pre-industrial, and future climate conditions, based on the Canadian Earth System Model. ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed. We also present several novel baselines that indicate shortcomings of datasets and network architectures used in prior work.
@inproceedings{cachay2021climart, title = {{ClimART}: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models}, author = {R{\"u}hling Cachay*, Salva and Ramesh*, Venkatesh and Cole, Jason N. S. and Barker, Howard and Rolnick, David}, booktitle = {Advances in Neural Information Processing Systems Datasets and Benchmarks Track}, url = {https://openreview.net/forum?id=FZBtIpEAb5J}, year = {2021}, }
- NeurIPSSalva Rühling Cachay, Benedikt Boecking, and Artur DubrawskiIn Advances in Neural Information Processing Systems, 2021.
Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources – making assumptions that rarely hold in practice – followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.
@inproceedings{cachay2021endtoend, title = {End-to-End Weak Supervision}, author = {R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur}, booktitle = {Advances in Neural Information Processing Systems}, year = {2021}, }
- arXivSalva Rühling Cachay, Emma Erickson, Arthur Fender C. Bucker, Ernest Pokropek, Willa Potosnak, Suyash Bire, Salomey Osei, and Björn LütjensarXiv:2310.14189, 2021.
Deep learning-based models have recently outperformed state-of-the-art seasonal forecasting models, such as for predicting El Ni no-Southern Oscillation (ENSO). However, current deep learning models are based on convolutional neural networks which are difficult to interpret and can fail to model large-scale atmospheric patterns. In comparison, graph neural networks (GNNs) are capable of modeling large-scale spatial dependencies and are more interpretable due to the explicit modeling of information flow through edge connections. We propose the first application of graph neural networks to seasonal forecasting. We design a novel graph connectivity learning module that enables our GNN model to learn large-scale spatial interactions jointly with the actual ENSO forecasting task. Our model, \graphino, outperforms state-of-the-art deep learning-based models for forecasts up to six months ahead. Additionally, we show that our model is more interpretable as it learns sensible connectivity structures that correlate with the ENSO anomaly pattern.
@article{cachay2021world, title = {The World as a Graph: Improving El Ni\~no Forecasts with Graph Neural Networks}, author = {R{\"u}hling Cachay, Salva and Erickson, Emma and Fender C. Bucker, Arthur and Pokropek, Ernest and Potosnak, Willa and Bire, Suyash and Osei, Salomey and Lütjens, Björn}, journal = {arXiv:2310.14189}, year = {2021}, }
- ICLR WorkshopOralSalva Rühling Cachay, Benedikt Boecking, and Artur DubrawskiIn ICLR Weakly Supervised Learning workshop and NeurIPS 2020 LatinX in AI workshop , 2021.Oral Presentation
Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emphlabeling functions (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.
@inproceedings{cachay2021dependency, title = {Dependency Structure Misspecification in Multi-Source Weak Supervision Models}, author = {R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur}, booktitle = {ICLR Weakly Supervised Learning workshop and NeurIPS 2020 LatinX in AI workshop }, year = {2021}, }