【问题标题】:Pivot_wider() function in tidyrtidyr 中的 Pivot_wider() 函数
【发布时间】:2021-07-24 00:54:10
【问题描述】:

我试图了解tidyrpivot_wider 函数的工作原理。我有bookings 数据和属性数据,我正在尝试确定属性是否对商务旅客和游客都有吸引力

我要完成的步骤是:

  • 首先,将for_business 列转换为具有"business""tourist" 水平的因子。
  • 针对每家酒店以及商务旅客和游客分别计算平均评分。
  • 然后,计算商务旅行者和游客之间的平均评分差异。

代码:

bookings %>%
  mutate(for_business = factor(for_business, labels = c("business", "tourist"))) %>%
  select(property_id, for_business) %>%
  mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
  mutate(diff = business - tourist) %>%
  summarise(avg_diff = mean(diff, na.rm = TRUE))

在此我面临错误:

Error: Problem with `mutate()` input `avg_review_score`. x object 'review_score' not found i Input `avg_review_score` is `mean(review_score, na.rm = TRUE)`.
> dput(head(bookings))
structure(list(booker_id = c("215934017ba98c09f30dedd29237b43dad5c7b5f", 
"7f590fd6d318248a48665f7f7db529aca40c84f5", "10f0f138e8bb1015d3928f2b7d828cbb50cd0804", 
"7b55021a4160dde65e31963fa55a096535bcad17", "6694a79d158c7818cd63831b71bac91286db5aff", 
"d0358740d5f15e85523f94ab8219f25d8c017347"), property_id = c(2668, 
4656, 4563, 4088, 2188, 4171), room_nights = c(4, 5, 6, 7, 4, 
2), price_per_night = c(91.4669561442773, 106.504997616816, 86.9913739625713, 
92.3656155139053, 104.838941902747, 109.981876495045), checkin_day = c("mon", 
"tue", "wed", "fri", "tue", "fri"), for_business = c(FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE), status = c("cancelled", "cancelled", 
"stayed", "stayed", "stayed", "cancelled"), review_score = c(NA, 
NA, 6.25812265672399, 5.953597754693, 6.43474489539585, NA)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
> dput(head(properties))
structure(list(property_id = c(2668, 4656, 4563, 4088, 2188, 
4171), destination = c("Brisbane", "Brisbane", "Brisbane", "Brisbane", 
"Brisbane", "Brisbane"), property_type = c("Hotel", "Hotel", 
"Apartment", "Apartment", "Apartment", "Apartment"), nr_rooms = c(32, 
39, 9, 9, 4, 5), facilities = c("airport shuttle,free wifi,garden,breakfast,pool,on-site restaurant", 
"on-site restaurant,pool,airport shuttle,breakfast,bbq,free wifi,spa", 
"laundry", "kitchen,laundry,free wifi", "parking,kitchen,bbq,free wifi,game console", 
"kitchen,pool,laundry,parking,free wifi,garden")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

    标签: r tidyr


    【解决方案1】:

    错误基于select 步骤,我们仅选择两列,而下一个mutate 步骤需要所选数据集中不存在的列。相反,最好将该列也包含在select

    bookings %>%
      mutate(for_business = factor(for_business, levels = c(FALSE, TRUE), 
          labels = c("business", "tourist"))) %>%
     select(property_id, for_business, review_score) %>%
      mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
      ungroup() %>%
      pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
      mutate(diff = business - tourist) %>%
      summarise(avg_diff = mean(diff, na.rm = TRUE))
    

    【讨论】:

    • 根据您的建议,我仍然遇到同样的错误。
    • @RanjiRaj 我没有收到此错误Error: Problem with mutate()` 输入avg_review_score。 x object 'review_score'` 基于您的数据
    • @RanjiRaj 我在 pivot_wider 中得到一个不同的错误,因为tourist 值不存在于数据中,因为您只显示了head
    • 我已经用另一个数据框的dput() 更新了我的问题。问题是我之前执行了一系列其他操作。不确定是否与此数据冲突。
    • @RanjiRaj 与属性 ddata 的代码相同。列名不同。
    猜你喜欢
    • 2020-02-22
    • 1970-01-01
    • 2019-12-31
    • 2021-01-14
    • 2020-10-15
    • 2020-06-18
    • 1970-01-01
    • 2020-05-21
    • 2020-06-09
    相关资源
    最近更新 更多