【问题标题】:R - Find common elements by group [duplicate]R - 按组查找共同元素[重复]
【发布时间】:2020-12-24 06:10:19
【问题描述】:

我正在使用以下数据集

library(data.table)
dat <- fread("https://www.dropbox.com/s/kj66h9shv6zge91/mydat.csv?dl=1")

看起来像这样:

            source_id experiment_id variable_id
   1: CESM2-WACCM-FV2    historical          pr
   2: CESM2-WACCM-FV2    historical          pr
   3: CESM2-WACCM-FV2    historical         tas
   4: CESM2-WACCM-FV2    historical         tas
   5:     FGOALS-f3-L    historical          pr
  ---                                          
5657:      MRI-ESM2-0        ssp585          pr
5658:     CESM2-WACCM        ssp585          pr
5659:     CESM2-WACCM        ssp585         tas
5660:     CESM2-WACCM        ssp585         tas
5661:     CESM2-WACCM        ssp585      tasmax

对于每个variable_id,我正在尝试查找source_id 中同时存在于experiment_id 的所有元素中的元素列表(例如“历史”、“ssp126”、“ssp245”、“ssp370 ", "ssp585")。

关于如何到达那里的任何想法?看起来像一个简单的问题,但我在 SO 上找不到适用于字符而不是数值的充分答案。

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    也许这会有所帮助:

    by(dat, dat$variable_id, function(x) 
            Reduce(intersect, split(x$source_id, x$experiment_id)))
    
    #dat$variable_id: pr
    # [1] "BCC-CSM2-MR" "MRI-ESM2-0"  "CESM2-WACCM" "INM-CM5-0"  "INM-CM4-8"    
    # [6] "MPI-ESM1-2-HR" "CMCC-CM2-SR5"  "NorESM2-MM"  "EC-Earth3"  "EC-Earth3-Veg"
    #[11] "GFDL-ESM4"    
    #-------------------------------------------------------------------------- 
    #dat$variable_id: tas
    # [1] "BCC-CSM2-MR"   "MRI-ESM2-0"    "CESM2-WACCM"   "AWI-CM-1-1-MR" "INM-CM4-8"
    # [6] "INM-CM5-0"     "MPI-ESM1-2-HR" "CMCC-CM2-SR5"  "NorESM2-MM"    "EC-Earth3"
    #[11] "EC-Earth3-Veg" "GFDL-ESM4"    
    #-------------------------------------------------------------------------- 
    #dat$variable_id: tasmax
    # [1] "BCC-CSM2-MR"   "MRI-ESM2-0"    "AWI-CM-1-1-MR" "INM-CM4-8"     "INM-CM5-0"
    # [6] "MPI-ESM1-2-HR" "NorESM2-MM"    "EC-Earth3"     "EC-Earth3-Veg" "GFDL-ESM4"
    #-------------------------------------------------------------------------- 
    #dat$variable_id: tasmin
    # [1] "BCC-CSM2-MR"   "MRI-ESM2-0"    "AWI-CM-1-1-MR" "INM-CM4-8"     "INM-CM5-0"
    # [6] "MPI-ESM1-2-HR" "NorESM2-MM"    "EC-Earth3"     "EC-Earth3-Veg" "GFDL-ESM4"
    

    对于每个variable_id,这将返回所有experiment_id 中存在的常见source_id


    如果您想找出每个variable_id 和每个experiment_id 的共同source_id

    Reduce(intersect, split(dat$source_id, list(dat$variable_id, dat$experiment_id)))
    
    #[1] "BCC-CSM2-MR"   "MRI-ESM2-0"    "INM-CM5-0"     "INM-CM4-8"    
    #[5] "MPI-ESM1-2-HR" "NorESM2-MM"    "EC-Earth3"     "EC-Earth3-Veg"
    #[9] "GFDL-ESM4"    
    

    【讨论】:

    • 太棒了,按预期工作!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-10-14
    • 2020-03-26
    • 1970-01-01
    • 2014-12-25
    • 2019-09-12
    相关资源
    最近更新 更多