【问题标题】:R - rbind logically [duplicate]R - rbind 逻辑[重复]
【发布时间】:2019-06-26 13:27:28
【问题描述】:

我有这个数据框:

source_data <- data.frame(
    "date" = c("2018-01-01", "2018-01-01", "2018-02-01", "2018-02-01"), 
    "nr" = c(0, 1, 0, 1),
    "marketing_fees" = c(500, 600, 800, 900),
    "services_paid" = c(40, 50, 10, 30),
    stringsAsFactors = F)

结果应该是这样的

result <- data.frame(
  "date" = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-02-01", "2018-02-01", "2018-02-01", "2018-02-01"), 
  "nr" = c(0, 0, 1, 1, 0, 0, 1, 1),
  "income" = c(500, 40, 600, 50, 800, 10, 900, 30),
  "source" = c("marketing", "services", "marketing", "services", "marketing", "services", "marketing", "services"),
  stringsAsFactors = F)

我只能这样做

result <- rbind(
  source_data %>% 
    filter(date == "2018-01-01") %>% 
    select(date, nr, income = marketing_fees) %>% 
    mutate(source = "marketing"),

  source_data %>% 
    filter(date == "2018-01-01") %>% 
    select(date, nr, income = services_paid) %>% 
    mutate(source = "services"),

  source_data %>% 
    filter(date == "2018-02-01") %>% 
    select(date, nr, income = marketing_fees) %>% 
    mutate(source = "marketing"),

  source_data %>% 
    filter(date == "2018-02-01") %>% 
    select(date, nr, income = services_paid) %>% 
    mutate(source = "services")
)

上面的代码不仅丑陋,有很多重复的部分,我不能再这样使用它,因为我的数据框有大约 50 列和大量数据。没有这么多重复代码,如何实现结果数据框?

【问题讨论】:

  • 据我所知,这是重塑和一些基本的文本处理。将发布答案作为证明。
  • 请注意,这类似于here提到的重新打开的逻辑我看不出有什么区别。规则应该与每个人相似,而不是针对其他人

标签: r reshape rbind


【解决方案1】:

我们可以使用gather将'wide'重塑为'long'然后separate列名只返回前缀部分

library(tidyverse)
source_data %>% 
    gather(source, income, marketing_fees:services_paid) %>% 
    separate(source, into = c('source', 'extra')) %>%
    select(-extra) %>% 
    arrange(date, nr)
#        date nr    source income
#1 2018-01-01  0 marketing    500
#2 2018-01-01  0  services     40
#3 2018-01-01  1 marketing    600
#4 2018-01-01  1  services     50
#5 2018-02-01  0 marketing    800
#6 2018-02-01  0  services     10
#7 2018-02-01  1 marketing    900
#8 2018-02-01  1  services     30

【讨论】:

    【解决方案2】:
    library(data.table)
    library(magrittr)
    result2 <- melt(
      setDT(source_data), 
      id.vars = c("date", "nr"), 
      value.name = "income", 
      variable.name = "source"
    )[, source := sub("_.*", "", source)][order(date, nr)]°
    
             date nr    source income
    1: 2018-01-01  0 marketing    500
    2: 2018-01-01  0  services     40
    3: 2018-01-01  1 marketing    600
    4: 2018-01-01  1  services     50
    5: 2018-02-01  0 marketing    800
    6: 2018-02-01  0  services     10
    7: 2018-02-01  1 marketing    900
    8: 2018-02-01  1  services     30
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-08-23
      • 1970-01-01
      • 2014-05-11
      • 2020-01-27
      • 2022-01-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多