【发布时间】:2020-07-12 23:51:38
【问题描述】:
我正在尝试解决一个(对我来说非常复杂的)问题。我会尽力解释。
我正在处理一个包含 150 个其他列表的列表。这些子列表每个包含 3 个数据帧。这是列表中的str(),其中包含 150 个包含数据帧的列表:
str(listSM)
$ SE1 :List of 3
..$ d20:'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d50:'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d5 :'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
$ SE10 :List of 3
..$ d20:'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d50:'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d5 :'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
$ SE100:List of 3
..$ d20:'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d50:'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d5 :'data.frame': 96408 obs. of 2 variables:
.. ..$ Date: Date[1:96408], format: "2009-01-01" "2009-01-01" "2009-01-01" ...
.. ..$ SWC : num [1:96408] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
上面的代码只显示了 3 个数据帧。这些被称为 SE1、SE2、SE3、SE4 ……,一直到 SE150。在这些列表中,每个列表中始终有 3 个具有相同名称的数据帧,即 d20、d50 和 d5。
这是我想做的:
我想根据日期为每个列表中的每个数据框(d20、d50、d5)创建多个子集,并将它们存储在具有唯一名称的新变量中。每个数据框都包含从2009-2019 年。
我写了这个作为例子:
SE1_d20_2009 <- subset(listSM$SE1$d20, Date <= "2009-12-31 23:00:00")
SE1_d20_2010 <- subset(listSM$SE1$d20, Date > "2009-12-31 23:00:00" & Date <= "2010-12-31 23:00:00")
SE1_d20_2011 <- subset(listSM$SE1$d20, Date > "2010-12-31 23:00:00" & Date <= "2011-12-31 23:00:00")
SE1_d20_2012 <- subset(listSM$SE1$d20, Date > "2011-12-31 23:00:00" & Date <= "2012-12-31 23:00:00")
SE1_d20_2013 <- subset(listSM$SE1$d20, Date > "2012-12-31 23:00:00" & Date <= "2013-12-31 23:00:00")
SE1_d20_2014 <- subset(listSM$SE1$d20, Date > "2013-12-31 23:00:00" & Date <= "2014-12-31 23:00:00")
SE1_d20_2015 <- subset(listSM$SE1$d20, Date > "2014-12-31 23:00:00" & Date <= "2015-12-31 23:00:00")
SE1_d20_2016 <- subset(listSM$SE1$d20, Date > "2015-12-31 23:00:00" & Date <= "2016-12-31 23:00:00")
SE1_d20_2017 <- subset(listSM$SE1$d20, Date > "2016-12-31 23:00:00" & Date <= "2017-12-31 23:00:00")
SE1_d20_2018 <- subset(listSM$SE1$d20, Date > "2017-12-31 23:00:00" & Date <= "2018-12-31 23:00:00")
SE1_d20_2019 <-subset(listSM$SE1$d20, Date > "2018-12-31 23:00:00" & Date <= "2019-12-31 23:00:00"
如您所见,我想制作年度子集。此外,变量名称取决于 SE 编号和 d 编号。这是土壤水分测量数据,因此 SE 代表传感器,d 代表传感器的深度。上面的代码是 SE1 和 d20 的变量名称的示例,所以 SE2 的名称应该是:SE2_d20_2009、SE2_d20_2010 等等……但我当然不想只对 d20 这样做,也对 d5 和 d50 这样做,因此这些深度的变量名称将是:SE2_d5_2009、SE2_d5_2010 // SE2_d50_2009、SE2_d50_2010 等等……。
显然,我可以为上述列表中的每个数据帧执行此操作,但总共有 450 个数据帧,这将花费太长时间。所以我想知道这是否可以自动化,如果可以,如何实现?由于我是 R 的完全初学者,这超出了我的能力范围,所以我真的希望有人能帮助我。如果这令人难以理解,请随时提出任何问题,我已尽力解释。
编辑:
dput(droplevels(listSM$SE1$d20[1:50, ]))
structure(list(Date = structure(c(14245, 14245, 14245, 14245,
14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245,
14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245,
14245, 14245, 14246, 14246, 14246, 14246, 14246, 14246, 14246,
14246, 14246, 14246, 14246, 14246, 14246, 14246, 14246, 14246,
14246, 14246, 14246, 14246, 14246, 14246, 14246, 14246, 14247,
14247), class = "Date"), SWC = c(NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN)), row.names = c(NA, 50L), class = "data.frame")
【问题讨论】:
-
存储为唯一变量不是一个好主意(填充 Globalenv)。也许是子集并将这些子集保存在列表中?您能否列出一个简单的清单,以便我们测试一些可能的解决方案?
-
@NelsonGon 是的,你可能是对的,会弄乱所有变量。我在帖子中添加了 dput(droplevels() 以便您可以看到一个数据框的样子。所有其他数据框看起来完全相同但具有不同的值(它们通常不是年初的 NaN)。跨度>