根据 R 中的列中的值对数据进行分区答案

【问题标题】：Partitioning the data with respect to values in a column in R根据 R 中的列中的值对数据进行分区
【发布时间】：2017-01-27 11:22:10
【问题描述】：

我有一个格式如下的 csv 文件：

rec | year | ing
----|------|-----
 1  | 2002 | a
 1  | 2002 | b
 1  | 2002 | c
 2  | 2002 | e
 .  |   .  | . 
 .  |   .  | . 
 4  | 2017 | a

现在，我想在 R 中按 2 年增量对这些数据进行分区。我尝试使用 split 函数，但不确定如何定义 2 年增量。

预期输出应如下所示：

$0
rec | year | ing
----|------|-----
 1  | 2002 | a
 1  | 2002 | b
 1  | 2002 | c
 2  | 2002 | e
 .  |   .  | . 
 .  |   .  | . 
 3  | 2003 | a 

$1
rec | year | ing
----|------|-----
 5  | 2004 | a
 5  | 2004 | b
 4  | 2004 | c
 4  | 2005 | e
 .  |   .  | . 
 .  |   .  | . 
 6  | 2005 | a

基本上，按 2 年划分数据。

【问题讨论】：

您能否显示预期的输出以及输入数据的一些行，因为不清楚。
添加了预期输出

标签： r split partition

【解决方案1】：

将每年除以 2，然后将其取底以模拟 2 年：

df <- read.table(header=TRUE,sep="|",text="
rec|year|ing
1|2002|a
1|2002|b
1|2002|c
2|2002|e
3|2003|a
4|2004|c
4|2004|e
5|2004|a
5|2004|b
6|2005|a
4|2017|a
4|2003|a
")

split(df,floor(df$year/2))

如果您关心每个子集的名称，请将其调整为：

split(df,floor(df$year/2)-min(df$year)/2)

【讨论】：

我猜另一个选项是split(df, df$year%/%2)

【解决方案2】：

您可以尝试将split 与cut 结合使用。

这将在 2 年内 sequence 从 minimum of year 到 maximum 开始 split 您的数据框。

split(df, cut(df$year, seq(min(df$year), max(df$year), 2), include.lowest=TRUE))

假设您的数据框为df。

【讨论】：