【问题标题】:Filter Group on Conditional Only One Value in Different Field在不同字段中仅按条件筛选一个值的组
【发布时间】:2020-08-11 15:23:24
【问题描述】:

我正在尝试对 data.frame 完成最终过滤以排除组,在这种情况下,这些组是“Insider CIK”编号,与该组相关的“交易类型”仅是三个选项之一: “P-购买”、“S-销售”、“M-豁免”。一个组的这些值的组合我想保持它们都相同的情况。第二个按组仅过滤一个条目的部分根据需要工作。

这是我的脚本和输入:

test12 <- test12 %>% group_by(`Insider CIK`) %>% filter(all(c("P-Purchase", "S-Sale", "M-Exempt") %in% `Transaction Type`) | n()>1)


structure(list(`Insider CIK` = c("0001418814", "0001418814", 
"0001418814"), `Insider Full Name and CIK` = c("ValueAct Holdings, L.P. (0001418814)", 
"ValueAct Holdings, L.P. (0001418814)", "ValueAct Holdings, L.P. (0001418814)"
), `Acquistion or Disposition` = structure(c(1L, 1L, 1L), .Label = c("A", 
"D", "-"), class = "factor"), `Transaction Date` = structure(c(18334, 
18333, 18332), class = "Date"), `Deemed Execution Date` = structure(c(1L, 
1L, 1L), .Label = "Â", class = "factor"), Issuer = c("HAWAIIAN ELECTRIC INDUSTRIES INC", 
"HAWAIIAN ELECTRIC INDUSTRIES INC", "HAWAIIAN ELECTRIC INDUSTRIES INC"
), Form = structure(c(1L, 1L, 1L), .Label = c("4", "3"), class = "factor"), 
    `Transaction Type` = c("P-Purchase", "P-Purchase", "P-Purchase"
    ), `Direct or Indirect Ownership` = structure(c(2L, 2L, 2L
    ), .Label = c("--D", "--I"), class = "factor"), `Number of Securities Transacted` = c(542252, 
    400060, 755600), `Issuer CIK` = structure(c(4L, 4L, 4L), .Label = c("0000750574", 
    "0000007431", "0000100726", "0000354707", "0000885590", "0001101215", 
    "0001137789", "0001655075", "0001739445", "0001512499", "0000874761", 
    "0001140536", "0001308161", "0001099800", "0001280776", "0001314102", 
    "0001389072", "0001642545"), class = "factor"), `Security Name` = structure(c(3L, 
    3L, 3L), .Label = c("common stock", "Common Shares, no par value", 
    "Common Stock", "Forward Purchase Contract", "Ordinary Shares", 
    "Physically Settled Forwards", "Series A Non-Voting Convertible Preferred Stock", 
    "Class A Common Stock", "Class B Common Stock", "Deferred Stock Units", 
    "Forward purchase contract", "Ordinary Shares, nominal value $0.000304635", 
    "Ordinary Shares, nominal value $0.000304635 per share", 
    "Units", "Employee Stock Option (Right to Acquire)", "Performance Rights", 
    "Common stock", "Employee Stock Option (right to buy)", "Restricted Stock Unit", 
    "Restricted Stock Units", "Senior Convertible Preferred Stock", 
    "Stock Option (right to buy)", "Stock Option (Right to Buy)", 
    "Stock Options (Right to Buy)"), class = "factor"), `Days Since Most Recent Filing` = structure(c(0, 
    1, 2), class = "difftime", units = "days"), firstTransactionDate = structure(c(18334, 
    18334, 18334), class = "Date")), row.names = c(NA, -3L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), groups = structure(list(`Insider CIK` = "0001418814", 
    .rows = list(1:3)), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")))

感谢您的帮助。

【问题讨论】:

  • 你能显示预期的输出吗?该示例仅显示 3 行
  • 在这种情况下的预期输出将为空,因为对于提供的组 ID,交易类型仅为“P-Purchase”
  • 你需要test12 %&gt;% group_by(Insider CIK) %&gt;% filter(!(all(c("P-Purchase", "S-Sale", "M-Exempt") %in% Transaction Type) | n()&gt;1))

标签: r dplyr tidyr


【解决方案1】:

我们不需要n() == 1 条件,因为每行不可能有多个Transaction Type

library(dplyr)
test12 %>% 
      group_by(`Insider CIK`) %>%
      filter(all(c("P-Purchase", "S-Sale", "M-Exempt") %in% `Transaction Type`))

或者,如果 OP 想要 n() == 1,那就是 |

test12 %>% 
      group_by(`Insider CIK`) %>%
      filter(all(c("P-Purchase", "S-Sale", "M-Exempt") %in% `Transaction Type`) |n() == 1)

【讨论】:

  • 我想保留现有条件,即对于只有一个条目的组 ID,我也希望删除这些条目。
  • @js80 你需要filter(all(c("P-Purchase", "S-Sale", "M-Exempt") %in% Transaction Type) |n_distinct(Transaction Type`) == 1)
  • 我认为不需要相对于 Transaction Type 进行 n_distinct 调整。它会起作用,但我想将此条件与 CIK 编号为 1 的观察次数联系起来。即只有一次购买或出售不符合条件。感谢您的帮助。
  • @js80 我没有正确理解该逻辑。我的理解是,如果组中存在所有三个元素"P-Purchase", "S-Sale", "M-Exempt",则对其进行过滤。 n() &gt; 1 的第二部分不清楚
  • @js80 让我们说如果您的组的交易类型只有 'P-Purchase 并且有 3 行,即重复。它仍然会在第一个条件下失败
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-12-01
  • 1970-01-01
  • 2021-05-11
  • 2020-11-29
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多