【问题标题】:Merging xarray datasets with same extent but different spatial resolution合并具有相同范围但不同空间分辨率的 xarray 数据集
【发布时间】:2020-02-18 00:09:28
【问题描述】:

我有一个基于卫星的太阳诱导荧光 (SIF) 数据集和一个模拟降水数据集。我想在我的研究区域中按像素将降水量与 SIF 进行比较。我的两个数据集属于同一区域,但空间分辨率略有不同。当我取整个区域的平均值时,我可以成功地跨时间绘制这些值并相互比较,但我正在努力在每个像素的基础上创建一个散点图。

老实说,在寻找 precip 对 SIF 的影响时,我不确定这是否是比较这两个值的最佳方法,因此我对不同方法的想法持开放态度。至于合并数据,我目前正在使用xr.combine_by_coords,但它给了我一个我在下面描述的错误。我也可以通过将 netcdfs 转换为 geotiffs 然后使用 rasterio 来扭曲它们来做到这一点,但这似乎是一种低效的比较方式。到目前为止,这是我所拥有的:

import netCDF4
import numpy as np
import dask
import xarray as xr

rainy_bbox = np.array([
    [-69.29519955115512,-13.861261028444734],
    [-69.29519955115512,-12.384786628185896],
    [-71.19583431678012,-12.384786628185896],
    [-71.19583431678012,-13.861261028444734]])
max_lon_lat = np.max(rainy_bbox, axis=0)
min_lon_lat = np.min(rainy_bbox, axis=0)

# this dataset is available here: ftp://fluo.gps.caltech.edu/data/tropomi/gridded/
sif = xr.open_dataset('../data/TROPO_SIF_03-2018.nc')

# the dataset is global so subset to my study area in the Amazon
rainy_sif_xds = sif.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))  

# this data can all be downloaded from NASA Goddard here either manually or with wget but you'll need an account on https://disc.gsfc.nasa.gov/: https://pastebin.com/viZckVdn
imerg_xds = xr.open_mfdataset('../data/3B-DAY.MS.MRG.3IMERG.201803*.nc4')

# spatial subset
rainy_imerg_xds = imerg_xds.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))  

# I'm not sure the best way to combine these datasets but am trying this
combo_xds = xr.combine_by_coords([rainy_imerg_xds, rainy_xds])

目前我在最后一行收到了一个看似无用的RecursionError: maximum recursion depth exceeded in comparison。当我添加参数join='left' 时,来自rainy_imerg_xds 数据集的数据在combo_xds 中,当我添加join='right' 时,rainy_xds 数据存在,如果我添加join='inner',则不存在数据。我以为这个函数有一些内部插值,但似乎没有。

【问题讨论】:

    标签: python geospatial netcdf python-xarray


    【解决方案1】:

    这个documentation from xarray 非常简单地概述了这个问题的解决方案。 xarray 允许您在多个维度中进行插值,并指定另一个 Dataset 的 x 和 y 维度作为输出维度。所以在这种情况下,它完成了

    # interpolation based on http://xarray.pydata.org/en/stable/interpolation.html
    # interpolation can't be done across the chunked dimension so we have to load it all into memory
    rainy_sif_xds.load()
    
    #interpolate into the higher resolution grid from IMERG
    interp_rainy_sif_xds = rainy_sif_xds.interp(lat=rainy_imerg_xds["lat"], lon=rainy_imerg_xds["lon"])
    
    # visualize the output
    rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Initial') +\
    interp_rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Interpolated')
    

    # now that our coordinates match, in order to actually merge we need to convert the default CFTimeIndex to datetime to merge dataset with SIF data because the IMERG rainfall dataset was CFTime and the SIF was datetime
    rainy_imerg_xds['time'] = rainy_imerg_xds.indexes['time'].to_datetimeindex()
    
    # now the merge can easily be done with
    merged_xds = xr.combine_by_coords([rainy_imerg_xds, interp_rainy_sif_xds], coords=['lat', 'lon', 'time'], join="inner")
    
    # now visualize the two datasets together // multiply SIF by 30 because values are so ow
    merged_xds.HQprecipitation.rolling(time=7, center=True).sum().mean(dim=('lat', 'lon')).hvplot().relabel('Precip') * \
    (merged_xds.dcSIF.mean(dim=('lat', 'lon'))*30).hvplot().relabel('SIF')
    

    【讨论】:

      猜你喜欢
      • 2016-11-03
      • 2016-05-05
      • 2014-02-16
      • 1970-01-01
      • 1970-01-01
      • 2020-08-25
      • 2011-01-02
      • 2021-04-18
      • 2013-11-22
      相关资源
      最近更新 更多