Neural Stochastic Processes for Satellite Precipitation Refinement

Abstract

Accurate precipitation estimation is critical for flood forecasting, water resource management, and disaster preparedness. Satellite products provide global hourly coverage but contain systematic biases; ground-based gauges are accurate at point locations but too sparse for direct gridded correction. Existing methods fuse these sources by interpolating gauge observations onto the satellite grid, but treat each time step independently and therefore discard temporal structure in precipitation fields.

We propose Neural Stochastic Process (NSP), a model that pairs a Neural Process encoder conditioning on arbitrary sets of gauge observations with a latent Neural SDE on a 2D spatial representation. NSP is trained under a single variational objective with simulation-free cost. We also introduce QPEBench, a public benchmark of 43,756 hourly samples over the Contiguous United States (2021–2025) with four aligned data sources and six evaluation metrics. On QPEBench, NSP outperforms thirteen baselines across all six metrics and surpasses JAXA’s operational gauge-calibrated product.

Overview

Sparse-context encoder

For each hour, the encoder ingests the satellite estimate, elevation, and an arbitrary, time-varying set of gauge observations encoded as a sparse map and a binary mask. It outputs a spatially structured Gaussian posterior over a 2D latent field, enabling a single model to accommodate the irregular gauge network without architectural changes.
Latent Neural SDE prior

A convolutional Neural SDE defines the latent prior transition. Training uses a closed-form transition KL derived from Girsanov’s theorem, so no SDE solver calls are required. The SDE acts as a temporal regularizer at training time and is bypassed at inference for efficiency.
Residual decoder

The decoder predicts a residual correction in log-precipitation space and a heteroscedastic predictive variance, conditioned on the latent field, the satellite estimate, and elevation. The refined precipitation field is then obtained by applying the residual on top of the satellite prior.

Architecture diagram for the NSP training and inference flows. — Architecture of NSP. (a) During training, gauge observations are split into context and target sets; the shared encoder maps consecutive inputs to z_t and z_t+1, and a transition loss aligns the Neural SDE prior with the encoder posterior. (b) At inference, all available gauges serve as context and a single encoder–decoder pass predicts the refined field without SDE integration.

Benchmark

QPEBench combines four aligned data sources on a common 260×590 grid at 0.1° resolution (~11 km), spanning five years (January 2021 – December 2025). The benchmark covers the contiguous United States with hourly samples and an independent radar reference that is strictly excluded from training.

Satellite (GSMaP MVK)

JAXA’s GSMaP MVK product provides the gauge-uncorrected hourly satellite precipitation estimate. We chose GSMaP over IMERG because the MVK and gauge-calibrated GC variants share the same retrieval algorithm, enabling a controlled comparison of gauge fusion methods under the same satellite baseline.
Elevation (ETOPO 2022)

Elevation is taken from ETOPO 2022 and resampled to the common 0.1° grid. It is concatenated with the satellite and gauge inputs at every time step.
Gauge observations (Synoptic Data API)

Hourly reports from 11,879 unique stations are obtained through the Synoptic Data API, totalling more than 423 million records. After quality filtering, an average of about 7,300 gauges per hour are retained.
Radar reference (NOAA MRMS)

NOAA’s Multi-Radar Multi-Sensor (MRMS) Radar-Only QPE provides spatially dense, gauge-independent precipitation estimates and is strictly excluded from training. Models receive only satellite, elevation, and gauge inputs; the radar field serves as evaluation reference.

Results

Table 1 compares NSP against thirteen baselines on QPEBench. Mean ± standard deviation are reported over three-fold time-series cross-validation; the best score in each column is in bold. NSP achieves the best performance on all six metrics, including a 4.2% RMSE_r reduction over GSMaP GC and a 39.1% RMSE_g reduction over the second-best method, while preserving spatial structure as measured by FSS_R.

Table 1. Quantitative comparison on the CONUS test folds.

Method	RMSE_r ↓	MAE_r ↓	RMSE_g ↓	MAE_g ↓	r_r,coll ↑	FSS_R ↑
Quantile mapping	4.073 ± 0.184	2.012 ± 0.063	0.885 ± 0.027	0.173 ± 0.006	0.288 ± 0.014	0.479 ± 0.010
EMOS	3.885 ± 0.127	1.844 ± 0.078	1.115 ± 0.096	0.179 ± 0.019	0.305 ± 0.013	0.478 ± 0.029
XGBoost	3.741 ± 0.129	1.827 ± 0.077	1.062 ± 0.088	0.179 ± 0.020	0.284 ± 0.016	0.487 ± 0.008
GWR	3.739 ± 0.230	1.627 ± 0.081	0.662 ± 0.021	0.155 ± 0.004	0.376 ± 0.016	0.229 ± 0.017
Cokriging	3.706 ± 0.121	1.789 ± 0.069	1.065 ± 0.081	0.220 ± 0.015	0.279 ± 0.016	0.461 ± 0.009
GSMaP	3.638 ± 0.126	1.826 ± 0.076	1.017 ± 0.086	0.179 ± 0.020	0.288 ± 0.014	0.483 ± 0.014
Kriging	3.204 ± 0.076	1.645 ± 0.027	0.774 ± 0.022	0.177 ± 0.005	0.006 ± 0.003	0.007 ± 0.002
ConvCNP	3.135 ± 0.098	1.596 ± 0.046	0.684 ± 0.018	0.100 ± 0.003	0.475 ± 0.005	0.053 ± 0.011
U-Net	3.123 ± 0.057	1.563 ± 0.038	0.666 ± 0.028	0.100 ± 0.004	0.461 ± 0.021	0.064 ± 0.052
CNP	3.115 ± 0.077	1.575 ± 0.056	0.699 ± 0.033	0.116 ± 0.006	0.340 ± 0.018	0.014 ± 0.011
ViT	3.059 ± 0.108	1.530 ± 0.079	0.687 ± 0.022	0.116 ± 0.016	0.411 ± 0.003	0.118 ± 0.019
IDW	2.987 ± 0.070	1.474 ± 0.020	0.645 ± 0.018	0.145 ± 0.004	0.352 ± 0.010	0.157 ± 0.002
Linear regression	2.967 ± 0.064	1.457 ± 0.020	0.709 ± 0.019	0.170 ± 0.005	0.299 ± 0.017	0.189 ± 0.012
GSMaP GC	2.942 ± 0.085	1.473 ± 0.039	0.737 ± 0.039	0.147 ± 0.007	0.375 ± 0.022	0.490 ± 0.023
NSP (Ours)	2.818 ± 0.062	1.444 ± 0.026	0.393 ± 0.047	0.076 ± 0.013	0.478 ± 0.021	0.527 ± 0.022

Qualitative Results

Qualitative comparison of refinement methods on two test timesteps. — Qualitative comparisons for two timesteps from the test fold. (a, b) On 2025-03-24 16:00 UTC, NSP recovers the narrow precipitation band over the southeastern United States more faithfully than IDW, linear regression, and GSMaP GC. (c) On 2024-10-31 10:00 UTC, NSP captures the elongated rain band along the central United States while linear regression collapses the field.

Regional zoom over the southeastern United States. NSP recovers a narrow, well-localized precipitation band that better matches the radar reference than the satellite baselines.

Interactive Demo

The static demo lets reviewers browse the March 2025 CONUS test month at six-hour cadence. Each snapshot shows real model inputs and outputs used in the supplementary material: GSMaP MVK, ETOPO 2022 elevation, the NSP refined field, and the MRMS radar reference. The gauge channel that the model also consumes is not shown or shipped — the Synoptic Data API terms of service prohibit redistribution of the underlying station readings.

Open the interactive demo

BibTeX

@inproceedings{anon2026nsp,
  title  = {Neural Stochastic Processes for Satellite Precipitation Refinement},
  author = {Anonymous},
  note   = {Under review},
  year   = {2026}
}

A final citation entry will be released once review is complete.

Abstract

Overview

Sparse-context encoder

Latent Neural SDE prior

Residual decoder

Benchmark

Satellite (GSMaP MVK)

Elevation (ETOPO 2022)

Gauge observations (Synoptic Data API)

Radar reference (NOAA MRMS)

Results

Qualitative Results

Interactive Demo

BibTeX