Improved observation of the global water cycle with satellite remote sensing and neural network modeling

Une thèse présentée pour l’obtention du grade de Docteur

Sorbonne Université
École Doctorale des Sciences de l’Environnement d’Île de France (Nº 129)

par

Matthew G. Heberger

Laboratoire d’Etudes du Rayonnement et de la Matière en Astrophysique et Atmosphères, UMR 8112
Observatoire de Paris
Présentée et soutenue publiquement le 12 janvier 2024
devant un jury composé de:

Hélène CHEPFER Sorbonne Université Présidente du jury
Aaron BOONE Météo France, Toulouse Rapporteur
Frédéric FRAPPART INRAE, Villenave d’Ornon Rapporteur
Hélène BROGNIEZ Université Paris-Saclay Examinatrice
Ming PAN Univ. of California at San Diego Examinateur
Fabrice PAPA IRD, Brasilia, Brazil Examinateur
Filipe AIRES LERMA/CNRS, Paris Directeur de thèse

image image image

Acknowledgements

I had one of the the best advisors a doctoral candidate could hope for. Thank you, Filipe Aires, for having confidence in a “non-traditional student” and for your constant support and guidance. I also owe a big thanks to my “lab partner” Victor Pellet for many helpful conversations, coding advice, and incredibly valuable feedback on my writing.

I offer my sincere gratitude to the members of my jury (thesis committee) for your time, energy, and expertise : Hélène Chepfer, Aaron Boone, Frédéric Frappart, Hélène Brogniez, Ming Pan, and Fabrice Papa. To Drs. Pan and Papa, thank you for accompanying me since year one as part of my Comité de Suivi, and for your constructive input and encouragement.

I want to express gratitude to the Government of France and to the administration of President Emmanuel Macron for supporting climate research and for welcoming international researchers to France when they felt less than welcome in their home countries. It has been an extraordinary honor and privilege to be a student at Sorbonne University and to do research at the Paris Observatory, both renowned institutions.

I humbly thank the many people that make these institutions run and who are too frequently unacknowledged – technicians, administrators, janitors, groundskeepers, guardians, cooks and dishwashers in the cantine, and many more. We could not do science without you! I stand in solidarity with you in your ongoing struggle for recognition, rights, and fair pay. Together, let us acknowledge the intrinsic value of each individual’s contribution, and work to create a society where the fruits of labor are equitably shared, and the principles of égalité and fraternité prevail.

Finally, my dearest thanks go to my family, Michelle and Gabriel, for your unconditional love, support, and patience.

Declaration

This research was carried out under the direction of Dr. Filipe Aires from 2021 to 2023 at the Paris Observatory, within the Laboratoire d’Etudes du Rayonnement et de la Matière en Astrophysique et Atmosphères, or LERMA.

The supervision of the thesis was done by the Doctoral School of Environmental Sciences of Ile-de-France #129 at Sorbonne University.

Portions of the research described herein were funded by:

© Copyright 2023, Matthew Heberger.

This work is licensed under Creative Commons License CC BY-NC 4.0: Attribution, Non Commercial, 4.0.

Résumé en français

Titre: Amélioration de l’observation du cycle de l’eau à l’échelle globale grâce à la télédétection par satellite et à la modélisation par réseaux de neurones

Résumé : La télédétection satellite est couramment utilisée pour suivre le cycle de l’eau depuis les bassins fluviaux jusqu’à l’échelle planétaire. Pourtant, il est difficile d’obtenir un bilan d’eau à l’équilibre en utilisant ces données de télédétection, ce qui met en évidence les erreurs et incertitudes liées aux données d’observation de la Terre. Ce travail de thèse vise à améliorer les estimations des précipitations, de l’évapotranspiration, du débit des rivières et du changement du contenu total en eau à l’échelle planétaire en utilisant une combinaison de méthodes analytiques (interpolation optimale, OI) et de méthodes de modélisation statistique, en particulier les réseaux neuronaux (NN). Ces modèles ont été entraînés sur un ensemble de 1358 bassins fluviaux, validés sur un ensemble indépendant de 340 bassins et évalués avec des mesures in situ pour la précipitations, l’évapotranspiration et le débit des rivières. Les modèles sont ensuite utilisés pour faire des prévisions à l’échelle du pixel à la résolution de 0,5° pour une couverture quasi globale. Les ensembles de données ainsi corrigés améliorent le bilan d’eau pour les bassins de validation : la moyenne et l’écart-type du résidu sont de 11 ± 44 mm/mois pour les données non corrigées et de 0,03 ± 24 mm/mois après calibration par les modèles NN. En outre, cette approche nous permet de faire des estimations plus précises des composantes manquantes du cycle de l’eau, par exemple pour estimer l’évapotranspiration dans les zones non instrumentées, ou pour prédire le débit des rivières dans des bassins non jaugés. Les résultats peuvent également indiquer aux producteurs de données là où leurs produits semblent incohérents par rapport à d’autres produits et où un étalonnage plus poussé pourrait apporter des améliorations. Enfin, cette recherche montre le fort potentiel de l’utilisation des réseaux neuronaux et de l’apprentissage machine pour l’intégration des données satellites et l’étude du cycle de l’eau.

Mots clés : observation de la terre, télédétection, cycle de l’eau, hydrologie à grand échantillon, optimisation, calibration, apprentissage automatique, régression et classification, réseaux neuronaux, précipitations, évaporation, écoulement des eaux, débit des rivières.

Abstract

Satellite remote sensing is commonly used to observe the hydrologic cycle at spatial scales ranging from river basins to the globe. Yet, it remains difficult to obtain a balanced water budget using remote sensing data, which highlights the errors and uncertainties in earth observation (EO) data. This research aimed to improve estimates of precipitation, evapotranspiration, runoff, and total water storage change at the global scale using a combination of analytical methods (optimal interpolation, OI) and statistical modeling methods including neural networks (NN). Models were trained on a set of 1,358 river basins and validated them on an independent set of 340 basins and in-situ observations of precipitation, evapotranspiration, and river discharge. The models are extended to make pixel-scale predictions in 0.5° grid cells for near-global coverage. Calibrated datasets result in lower water budget residuals in validation basins: the mean and standard deviation of the imbalance is 11 ± 44 mm/mo when calculated with uncorrected EO data. After calibration by the NN models, it is significantly improved to 0.03 ± 24 mm/mo. The results allow us to make more accurate estimates of missing water cycle components, for example to estimate evapotranspiration in un-instrumented areas, or to predict discharge in ungaged basins. The results can also indicate to data producers where their products seem incoherent with other datasets and where enhanced calibration could lead to improvements. Finally, this research demonstrates the use of neural networks and machine learning for the integration of satellite data and for the study of the water cycle.

Keywords: earth observation, remote sensing, water cycle, large-sample hydrology, optimization, calibration, machine learning, regression and classification, neural networks, precipitation, evaporation, runoff, river discharge.

Acronyms

AI Artificial intelligence
AI Aridity Index
AGU American Geophysical Union
Aqua Satellite launched by NASA in 2002 to study the water cycle
AVHRR Advanced Very High Resolution Radiometer
CAMELS Catchment Attributes and Meteorology for Large-sample Studies
CARAVAN A dataset of catchment attributes and meteorology, combines several CAMELS datasets
CI Confidence Interval
CI Cyclostationarity Index
CCI Climate Change Initiative, short name for the ESA Programme on Global Monitoring of Essential Climate Variables
CDF Cumulative distribution function
CDR Climate Data Record
CGIAR (formerly) Consultative Group for International Agricultural Research
CMORPH CPC Morphing Technique, a global precipitation dataset
CPC Climate Prediction Center, an office of the US National Weather Service
CMG Climate Modeling Grid
CSIRO Australia’s Commonwealth Scientific and Industrial Research Organisation
CSR Center for Space Research at the University of Texas at Austin
ECMWF European Centre for Medium-Range Weather Forecasts
ED 129 L’École Doctorale des Sciences de l’Environnement d’Île de France
EGU European Geophysical Union
ENSO El Niño Southern Oscillation
EO Earth Observation
ERA5 Fifth generation ECMWF atmospheric reanalysis of the global climate
ESA European Space Agency
ET Evapotranspiration
EVI Enhanced vegetation index
FAPAR Fraction of absorbed photosynthetically active radiation
GBM Gradient boosting machine
GDAL Geospatial Data Abstraction Library
GHCN Global Historical Climatology Network
GIEMS Global Inundation Extent from Multi-satellites
GIS Geographic Information System (software)
GLDAS Global Land Data Assimilation System
GLEAM Global Land Evaporation Amsterdam Model (Miralles et al. 2011)
GPCP Global Precipitation Climatology Project (Adler et al. 2018)
GPM-Imerg Global Precipitation Monitoring, Integrated Multi-satellitE Retrievals (Huffman et al. 2020)
GRACE Gravity Recovery and Climate Experiment
GRDC Global Runoff Data Center
GRUN Global gridded runoff dataset (Ghiggi et al. 2019)
GSFC Goddard Space Flight Center
GSIM Global Streamflow Indices and Metadata Archive (Do et al. 2018)
HBV Hydrologiska Byråns Vattenbalansavdelning (a hydrologic simulation model from Sweden)
HDF Hierarchical Data Format
HydroSHEDS Hydrological Data and Maps Based on Shuttle Elevation Derivatives at Multiple Scales (Lehner, Verdin, and Jarvis 2008)
IAHS International Association of Hydrological Sciences
IDW Inverse distance weighted
IQR Interquartile range
IVW Inverse-variance weighting
JPL Jet Propulsion Laboratory
KGE Kling-Gupta Efficiency
LAI Leaf area index
LERMA Laboratoire d’Etudes du Rayonnement et de la Matière en Astrophysique et Atmosphères. (Laboratory for the Study of Radiation and Matter in Astrophysics and Atmospheres)
LISFLOOD Hydrologic model created by the European Union Joint Research Center
LSH Large sample hydrology
LSTM Long short-term memory
LWE Liquid water equivalent
MERIT Multi-Error Removed Improved-Terrain (Yamazaki et al. 2017)
ML Machine learning
MLP Multi-Layer Perceptron
MSE Mean squared error
MODIS Moderate Resolution Imaging Spectroradiometer
NaN Not a number
NASA (US) National Aeronautics and Space Agency
NetCDF Network Common Data Form
NDVI Normalized Difference Vegetation Index
NetCDF Network Common Data Form (format for environmental data)
NN Neural network
NOAA US National Oceanic and Atmospheric Agency
NSE Nash-Sutcliffe Model Efficiency
OI Optimal interpolation
OLS Ordinary least squares
PDF Probability distribution function
PERSIANN Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks
RMSD Root mean square difference
RMSE Root mean square error
RTO Regression through the origin
SLR Single linear regression
SMAP Soil Moisture Active Passive (a NASA satellite mission)
SM2RAIN an algorithm for estimating rainfall from soil moisture data (Massari 2020)
SMOS Soil Moisture and Ocean Salinity (an ESA satellite mission)
SW Simple weighted mean
SSE Sum of squared errors
SWOT Surface Water Ocean Topography Mission
Terra A multi-national earth observation satellite launched in December 1999
TMPA Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis
TWCS Total Water Storage Change
TWS Total Water Storage
UMR Unité Mixte de Recherche (Joint research unit)
USGS United States Geological Survey
WC Water cycle
WMO World Meteorological Organization