Neural Stochastic Processes
for Satellite Precipitation Refinement

Under Review

Currently, the paper is under review and we will set the links once it is published. For now, our code and a static demo are provided as anonymous supplementary materials.

Overview diagram of the proposed Neural Stochastic Process pipeline.
An overview of the proposed Neural Stochastic Process (NSP). Satellite estimates and elevation are fused with sparse gauge observations through an encoder, evolved as a latent stochastic process, and decoded into a calibrated precipitation field. The radar reference is shown for evaluation only.

Abstract

Accurate precipitation estimation is critical for flood forecasting, water resource management, and disaster preparedness. Satellite products provide global hourly coverage but contain systematic biases; ground-based gauges are accurate at point locations but too sparse for direct gridded correction. Existing methods fuse these sources by interpolating gauge observations onto the satellite grid, but treat each time step independently and therefore discard temporal structure in precipitation fields.

We propose Neural Stochastic Process (NSP), a model that pairs a Neural Process encoder conditioning on arbitrary sets of gauge observations with a latent Neural SDE on a 2D spatial representation. NSP is trained under a single variational objective with simulation-free cost. We also introduce QPEBench, a public benchmark of 43,756 hourly samples over the Contiguous United States (2021–2025) with four aligned data sources and six evaluation metrics. On QPEBench, NSP outperforms thirteen baselines across all six metrics and surpasses JAXA’s operational gauge-calibrated product.

Overview

  1. Sparse-context encoder

    For each hour, the encoder ingests the satellite estimate, elevation, and an arbitrary, time-varying set of gauge observations encoded as a sparse map and a binary mask. It outputs a spatially structured Gaussian posterior over a 2D latent field, enabling a single model to accommodate the irregular gauge network without architectural changes.

  2. Latent Neural SDE prior

    A convolutional Neural SDE defines the latent prior transition. Training uses a closed-form transition KL derived from Girsanov’s theorem, so no SDE solver calls are required. The SDE acts as a temporal regulariser at training time and is bypassed at inference for efficiency.

  3. Residual decoder

    The decoder predicts a residual correction in log-precipitation space and a heteroscedastic predictive variance, conditioned on the latent field, the satellite estimate, and elevation. The refined precipitation field is then obtained by applying the residual on top of the satellite prior.

Architecture diagram for the NSP training and inference flows.
Architecture of NSP. (a) During training, gauge observations are split into context and target sets; the shared encoder maps consecutive inputs to zt and zt+1, and a transition loss aligns the Neural SDE prior with the encoder posterior. (b) At inference, all available gauges serve as context and a single encoder–decoder pass predicts the refined field without SDE integration.

Benchmark

QPEBench combines four aligned data sources on a common 260×590 grid at 0.1° resolution (~11 km), spanning five years (January 2021 – December 2025). The benchmark covers the contiguous United States with hourly samples and an independent radar reference that is strictly excluded from training.

  1. Satellite (GSMaP MVK)

    JAXA’s GSMaP MVK product provides the gauge-uncorrected hourly satellite precipitation estimate. We chose GSMaP over IMERG because the MVK and gauge-calibrated GC variants share the same retrieval algorithm, enabling a controlled comparison of gauge fusion methods under the same satellite baseline.

  2. Elevation (ETOPO 2022)

    Elevation is taken from ETOPO 2022 and resampled to the common 0.1° grid. It is concatenated with the satellite and gauge inputs at every time step.

  3. Gauge observations (Synoptic Data API)

    Hourly reports from 11,879 unique stations are obtained through the Synoptic Data API, totalling more than 423 million records. After quality filtering, an average of about 7,300 gauges per hour are retained.

  4. Radar reference (NOAA MRMS)

    NOAA’s Multi-Radar Multi-Sensor (MRMS) Radar-Only QPE provides spatially dense, gauge-independent precipitation estimates and is strictly excluded from training. Models receive only satellite, elevation, and gauge inputs; the radar field serves as evaluation reference.

Results

Table 1 compares NSP against thirteen baselines on QPEBench. Mean ± standard deviation are reported over three-fold time-series cross-validation; the best score in each column is in bold. NSP achieves the best performance on all six metrics, including a 4.2% RMSEr reduction over GSMaP GC and a 39.1% RMSEg reduction over the second-best method, while preserving spatial structure as measured by FSSR.

Table 1. Quantitative comparison on the CONUS test folds.

Method RMSEr MAEr RMSEg MAEg rr,coll FSSR
Quantile mapping4.073 ± 0.1842.012 ± 0.0630.885 ± 0.0270.173 ± 0.0060.288 ± 0.0140.479 ± 0.010
EMOS3.885 ± 0.1271.844 ± 0.0781.115 ± 0.0960.179 ± 0.0190.305 ± 0.0130.478 ± 0.029
XGBoost3.741 ± 0.1291.827 ± 0.0771.062 ± 0.0880.179 ± 0.0200.284 ± 0.0160.487 ± 0.008
GWR3.739 ± 0.2301.627 ± 0.0810.662 ± 0.0210.155 ± 0.0040.376 ± 0.0160.229 ± 0.017
Cokriging3.706 ± 0.1211.789 ± 0.0691.065 ± 0.0810.220 ± 0.0150.279 ± 0.0160.461 ± 0.009
GSMaP3.638 ± 0.1261.826 ± 0.0761.017 ± 0.0860.179 ± 0.0200.288 ± 0.0140.483 ± 0.014
Kriging3.204 ± 0.0761.645 ± 0.0270.774 ± 0.0220.177 ± 0.0050.006 ± 0.0030.007 ± 0.002
ConvCNP3.135 ± 0.0981.596 ± 0.0460.684 ± 0.0180.100 ± 0.0030.475 ± 0.0050.053 ± 0.011
U-Net3.123 ± 0.0571.563 ± 0.0380.666 ± 0.0280.100 ± 0.0040.461 ± 0.0210.064 ± 0.052
CNP3.115 ± 0.0771.575 ± 0.0560.699 ± 0.0330.116 ± 0.0060.340 ± 0.0180.014 ± 0.011
ViT3.059 ± 0.1081.530 ± 0.0790.687 ± 0.0220.116 ± 0.0160.411 ± 0.0030.118 ± 0.019
IDW2.987 ± 0.0701.474 ± 0.0200.645 ± 0.0180.145 ± 0.0040.352 ± 0.0100.157 ± 0.002
Linear regression2.967 ± 0.0641.457 ± 0.0200.709 ± 0.0190.170 ± 0.0050.299 ± 0.0170.189 ± 0.012
GSMaP GC2.942 ± 0.0851.473 ± 0.0390.737 ± 0.0390.147 ± 0.0070.375 ± 0.0220.490 ± 0.023
NSP (Ours) 2.818 ± 0.062 1.444 ± 0.026 0.393 ± 0.047 0.076 ± 0.013 0.478 ± 0.021 0.527 ± 0.022

Qualitative Results

Qualitative comparison of refinement methods on two test timesteps.
Qualitative comparisons for two timesteps from the test fold. (a, b) On 2025-03-24 16:00 UTC, NSP recovers the narrow precipitation band over the southeastern United States more faithfully than IDW, linear regression, and GSMaP GC. (c) On 2024-10-31 10:00 UTC, NSP captures the elongated rain band along the central United States while linear regression collapses the field.
Regional zoom over the southeastern United States.
Regional zoom over the southeastern United States. NSP recovers a narrow, well-localised precipitation band that better matches the radar reference than the satellite baselines.

Interactive Demo

The static demo lets reviewers browse the March 2025 CONUS test month at six-hour cadence. Each snapshot shows real model inputs and outputs used in the supplementary material: GSMaP MVK, ETOPO 2022 elevation, the NSP refined field, and the MRMS radar reference. The gauge channel that the model also consumes is not shown or shipped — the Synoptic Data API terms of service prohibit redistribution of the underlying station readings.

Open the interactive demo

BibTeX

@inproceedings{anon2026nsp,
  title  = {Neural Stochastic Processes for Satellite Precipitation Refinement},
  author = {Anonymous},
  note   = {Under review},
  year   = {2026}
}

A final citation entry will be released once review is complete.