【问题标题】:Run logical test or calculation vs first of kind in separate index column在单独的索引列中运行逻辑测试或计算与第一个类型
【发布时间】:2023-12-14 18:31:01
【问题描述】:

我有一个带有列索引的大型数据框,它重复分配给特定行活动的数值。我希望能够运行引用此索引列的计算并计算从包含该参考值的第一个日期作为单独列的天数以及单独列执行逻辑测试该值包含在单独列中匹配该列中该索引值的第一个值。我一直在使用 dplyr 并拥有以下脚本:

 test <- InsiderList3 %>%
  group_by(`Insider CIK`) %>%
  mutate(rf.diff =  first(`Transaction Date`)-`Transaction Date`) %>%
  mutate(IssuerCheck =  first(`Issuer`) ==Issuer)

标记为“Insider CIK”的列是索引,所有其他列的信息都与此相关联,直到弹出下一个索引值并重复该过程。有一个单独的日期列和标识公司的信息。

前 20 行样本的输入:

   dput(head(InsiderList3[c('Insider CIK', 'Transaction Date', 'Issuer')], 75))
structure(list(`Insider CIK` = c("0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001008134", "0001008134", 
"0001008134", "0001008134", "0001008134", "0001009891", "0001012859", 
"0001012859", "0001012859", "0001012859"), `Transaction Date` = structure(c(18358, 
18358, 18101, 18065, 18065, 18039, 17729, 17700, 17674, 17674, 
17345, 17345, 17326, 17014, 17014, 17014, 17014, 17014, 17014, 
17001, 16964, 16964, 16598, 16590, 16582, 16582, 16409, 16288, 
16288, 16245, 16245, 16217, 16161, 16072, 16052, 15967, 15880, 
15869, 15771, 15710, 15710, 15687, 15603, 15523, 15354, 15354, 
15030, 14979, 14840, 14049, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 18358, 18358, 
18358, 18261), class = "Date"), Issuer = c("TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"SANDRIDGE ENERGY INC", "SANDRIDGE ENERGY INC", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "Seventy Seven Energy Inc.", 
"Seventy Seven Energy Inc.", "Seventy Seven Energy Inc.", "Seventy Seven Energy Inc.", 
"Seventy Seven Energy Inc.", "Seventy Seven Energy Inc.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "Seventy Seven Energy Inc.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"Seventy Seven Energy Inc.", "Seventy Seven Energy Inc.", "Seventy Seven Energy Inc.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", 
"CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", "TRANSATLANTIC PETROLEUM LTD.", 
"CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", 
"CHESAPEAKE ENERGY CORP", "TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "TRANSATLANTIC PETROLEUM LTD.", 
"TRANSATLANTIC PETROLEUM LTD.", "QUEST RESOURCE CORP", "QUEST RESOURCE CORP", 
"CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", 
"CHESAPEAKE ENERGY CORP", "CHESAPEAKE ENERGY CORP", "TRANSATLANTIC PETROLEUM LTD.", 
"CHESAPEAKE ENERGY CORP", "Seventy Seven Energy Inc.", "CHESAPEAKE OILFIELD OPERATING LLC", 
"TRANSATLANTIC PETROLEUM LTD.", "QUEST RESOURCE CORP", "CHESAPEAKE ENERGY CORP", 
"CHESAPEAKE ENERGY CORP", "CVR ENERGY INC", "CHESAPEAKE ENERGY CORP", 
"SANDRIDGE ENERGY INC", "TRANSATLANTIC PETROLEUM LTD.", "Seventy Seven Energy Inc.", 
"CHESAPEAKE ENERGY CORP", NA, "NATIONAL HEALTHCARE CORP", "NATIONAL HEALTHCARE CORP", 
"NATIONAL HEALTHCARE CORP", "NATIONAL HEALTHCARE CORP")), row.names = c(NA, 
75L), class = "data.frame")

感谢您的帮助。

【问题讨论】:

  • first(Issuer) =Issuer 需要==
  • 可能发帖head(InsiderList3[c('Insider CIK', '交易日期', 'Issuer')], 20)?它只有 3 列和 20 行。
  • 我已经进行了更改,但不幸的是它仍然无法正常工作。每个后续日期计算哪个应该相对于第一个而不是第一个整体不起作用(我有一些负值)。逻辑测试也与表中的第一个值相关,而不是与索引中的下一个值相关。
  • 感谢您的帮助。

标签: r dplyr tidyr


【解决方案1】:

也许我遗漏了一些东西,但这不只是按'Transaction date' 排序的问题吗?

InsiderList3 %>%
  group_by(`Insider CIK`) %>%
  arrange(`Transaction Date`) %>%
  mutate(rf.diff =  first(`Transaction Date`) - `Transaction Date`,
         IssuerCheck =  first(`Issuer`) == Issuer)

【讨论】:

  • 不幸的是,这不是因为您想让它们按时间顺序排列。我将在 dput 中添加更大的部分。
  • @js80 也许是min 而不是first
  • 不幸的是,使用 min 会删除第一个 CIK 编号的事务日志并为 mutate 变量生成 NA。因为日期差异应该基于给定 CIK 编号的第一个交易日期,所以它不应该是负数。 0 应该是与该 CIK 编号相关的报告交易的最近日期。
最近更新 更多