【问题标题】:merging two dataframes in R - values are replaced by NA在 R 中合并两个数据帧 - 值被替换为 NA
【发布时间】:2020-10-26 13:41:52
【问题描述】:

我有一个data.table,其中列出了clientID 和每个month。年。

> summaryClaims
   clientID monthOfClaim      amt
1:        A            5  5292.19
2:        A            6   287.89
3:        B            2  9469.10
4:        C            6 16607.33

然后我有第二个data.table,其中列出了发生付款的clientIDmonth。此表仅列出发生付款的月份。

claimsCombo
    clientID monthOfClaim
 1:        A            1
 2:        A            2
 3:        A            3
 4:        A            4
 5:        A            5
 6:        A            6
 7:        A            7
 8:        A            8
 9:        A            9
10:        A           10
11:        A           11
12:        A           12
13:        B            1
14:        B            2
15:        B            3
16:        B            4
17:        B            5
18:        B            6
19:        B            7
20:        B            8
21:        B            9
22:        B           10
23:        B           11
24:        B           12
25:        C            1
26:        C            2
27:        C            3
28:        C            4
29:        C            5
30:        C            6
31:        C            7
32:        C            8
33:        C            9
34:        C           10
35:        C           11
36:        C           12

我想要一张包含过去 4 个月累计付款的表格。然而,奇怪的事情正在发生。我收到NA 付款发生的几个月。为什么?

>   claimsMonthly <- merge(claimsCombo, summaryClaims, by = c("clientID", "monthOfClaim"), all.x = TRUE)
> claimsMonthly
    clientID monthOfClaim amt
 1:        A            1  NA
 2:        A            2  NA
 3:        A            3  NA
 4:        A            4  NA
 5:        A            5  NA
 6:        A            6  NA
 7:        A            7  NA
 8:        A            8  NA
 9:        A            9  NA
10:        A           10  NA
11:        A           11  NA
12:        A           12  NA
13:        B            1  NA
14:        B            2  NA
15:        B            3  NA
16:        B            4  NA
17:        B            5  NA
18:        B            6  NA
19:        B            7  NA
20:        B            8  NA
21:        B            9  NA
22:        B           10  NA
23:        B           11  NA
24:        B           12  NA
25:        C            1  NA
26:        C            2  NA
27:        C            3  NA
28:        C            4  NA
29:        C            5  NA
30:        C            6  NA
31:        C            7  NA
32:        C            8  NA
33:        C            9  NA
34:        C           10  NA
35:        C           11  NA
36:        C           12  NA

数据:

structure(list(clientID = c("A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C"), monthOfClaim = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 
6, 7, 8, 9, 10, 11, 12)), row.names = c(NA, -36L), class = c("data.table", 
"data.frame")

structure(list(clientID = c("A", "A", "B", "C"), monthOfClaim = c(4.99999999999909, 
6.00000000000091, 1.99999999999909, 6.00000000000091), amt = c(5292.19, 
287.89, 9469.1, 16607.33)), row.names = c(NA, -4L), class = c("data.table", 
"data.frame")

【问题讨论】:

  • 查看summaryClaims 中的月份。它们不是整数。
  • 如果在尝试通过具有不同类的变量合并 data.table 时出现警告消息,那会有用吗?
  • @Edward 如果您将此作为答案,我会接受您的,因为您是第一个提供正确解决方案的人
  • 没关系。选择一个已经发布的。 ☺

标签: r merge data.table


【解决方案1】:

这是因为claimsCombo$monthOfClaimsummaryClaims$monthOfClaim 不完全相等。使用round()summaryClaims$monthOfClaim 转换为整数。

require(dplyr)
summaryClaims$monthOfClaim<-round(summaryClaims$monthOfClaim,1)
claimsMonthly<-left_join(claimsCombo,summaryClaims)

> claimsMonthly
   clientID monthOfClaim      amt
1         A            1       NA
2         A            2       NA
3         A            3       NA
4         A            4       NA
5         A            5  5292.19
6         A            6   287.89
7         A            7       NA
8         A            8       NA
9         A            9       NA
10        A           10       NA

使用as_tibble() 可以看出区别。

> as_tibble(claimsCombo)
# A tibble: 36 x 2
   clientID monthOfClaim
   <chr>           <dbl>
 1 A                   1
 2 A                   2
 3 A                   3
 4 A                   4
 5 A                   5
 6 A                   6
 7 A                   7
 8 A                   8
 9 A                   9
10 A                  10

> as_tibble(summaryClaims)
# A tibble: 4 x 3
  clientID monthOfClaim    amt
  <chr>           <dbl>  <dbl>
1 A                5.00  5292.
2 A                6.     288.
3 B                2.00  9469.
4 C                6.   16607.

【讨论】:

    【解决方案2】:

    你必须格式化你的键变量:

    #Format variables
    claimsCombo$monthOfClaim <- as.integer(claimsCombo$monthOfClaim)
    summaryClaims$monthOfClaim <- as.integer(summaryClaims$monthOfClaim)
    #Merge
    claimsMonthly <- merge(claimsCombo, summaryClaims, by = c("clientID", "monthOfClaim"), all.x = TRUE)
    #Output
    head(claimsMonthly)
    
      clientID monthOfClaim     amt
    1        A            1      NA
    2        A            2      NA
    3        A            3      NA
    4        A            4 5292.19
    5        A            5      NA
    6        A            6  287.89
    

    【讨论】:

    • 检查as.integer(4.99999999999909)
    猜你喜欢
    • 2016-03-01
    • 2021-06-27
    • 2021-12-06
    • 2018-12-11
    • 2017-01-30
    • 1970-01-01
    • 1970-01-01
    • 2017-10-28
    • 2017-06-25
    相关资源
    最近更新 更多