【发布时间】:2021-09-30 04:58:12
【问题描述】:
我有这个数据框: 从 A 到 X 和从 a 到 k 共有 34 个字母。每个字母代表一个时间段,因此一天的第一个时间段是 A,一天的最后一个时间段是 k。鉴于这是财务数据,并非所有字母都存在,有时 C 可能会丢失,有时我可能只有 A 和 B,而且这种性质
set.seed(42)
day_month = rep(seq.Date(as.Date("2006-04-17"), as.Date("2006-04-26"), "day"),35)
day_month = day_month[order(day_month)]
let = rep(c(LETTERS[1:24],letters[1:11]),10)
High = rnorm(350, 14000, 250)
Low = rnorm(350, 13000, 300)
df <- data.frame(day_month, let, High, Low)
df <- df[-1,]
df <- df[-349,]
我需要创建一个具有 high_a、low_a、high_A、low_A 等值的变量。
我使用的第一种方法是:
df <- df %>% group_by(day_month) %>% summarise(day_month = first(day_month),
high_A = nth(High, 2),
low_A = nth(Low, 2),
high_B = nth(High, 4),
low_B = nth(Low, 4),
high_D = nth(High, 7),
low_D = nth(Low, 7),
high_E = nth(High, 9),
low_E = nth(Low, 9),
high_F = nth(High, 11),
low_F = nth(Low, 11),
high_G = nth(High, 13),
low_G = nth(Low, 13),
high_H = nth(High, 15),
low_H = nth(Low, 15),
high_I = nth(High, 17),
low_I = nth(Low, 17),
high_J = nth(High, 19),
low_J = nth(Low, 19),
high_K = nth(High, 21),
low_K = nth(Low, 21),
high_L = nth(High, 22),
low_L = nth(Low, 22),
high_M = nth(High, 23),
low_M = nth(Low, 23),
high_N = nth(High, 24),
low_N = nth(Low, 24),
high_O = nth(High, 25),
low_O = nth(Low, 25),
high_P = nth(High, 26),
low_P = nth(Low, 26),
high_Q = nth(High, 27),
low_Q = nth(Low, 27),
high_R = nth(High, 28),
low_R = nth(Low, 28),
high_S = nth(High, 29),
low_S = nth(Low, 29),
high_T = nth(High, 30),
low_T = nth(Low, 30),
high_U = nth(High, 31),
low_U = nth(Low, 31),
high_V = nth(High, 32),
low_V = nth(Low, 32),
high_W = nth(High, 33),
low_W = nth(Low, 33),
high_X = nth(High, 34),
low_X = nth(Low, 34),
high_a = nth(High, 1),
low_a = nth(Low, 1),
high_b = nth(High, 3),
low_b = nth(Low, 3),
high_c = nth(High, 5),
low_c = nth(Low, 5),
high_d = nth(High, 6),
low_d = nth(Low, 6),
high_e = nth(High, 8),
low_e = nth(Low, 8),
high_f = nth(High, 9),
low_f = nth(Low, 9),
high_g = nth(High, 12),
low_g = nth(Low, 12),
high_h = nth(High, 14),
low_h = nth(Low, 14),
high_i = nth(High, 16),
low_i = nth(Low, 16),
high_j = nth(High, 18),
low_j = nth(Low, 18),
high_k = nth(High, 20),
low_k = nth(Low, 20))
此代码有效,但鉴于某些日子并非所有观察结果,数据可能不一致。我想找到一个函数,我可以在其中定义要用作条件的字母而不是行号。
我尝试使用相同的代码,但不是 nth,而是使用 subset 形式为:high_A = subset(High, let == "A") 但此代码创建了一个数据框,其中没有缺少观察的日期。
df_2 <- df %>% group_by(day_month) %>% summarise(day_month = first(day_month),
high_A = subset(High, let == "A"),
low_A = subset(Low, let == "A"),
high_B = subset(High, let == "B"),
low_B = subset(Low, let == "B"),
high_C = subset(High, let == "C"),
low_C = subset(Low, let == "C"),
high_D = subset(High, let == "D"),
low_D = subset(Low, let == "D"),
high_E = subset(High, let == "E"),
low_E = subset(Low, let == "E"),
high_F = subset(High, let == "F"),
low_F = subset(Low, let == "F"),
high_G = subset(High, let == "G"),
low_G = subset(Low, let == "G"),
high_H = subset(High, let == "H"),
low_H = subset(Low, let == "H"),
high_I = subset(High, let == "I"),
low_I = subset(Low, let == "I"),
high_J = subset(High, let == "J"),
low_J = subset(Low, let == "J"),
high_K = subset(High, let == "K"),
low_K = subset(Low, let == "K"),
high_L = subset(High, let == "L"),
low_L = subset(Low, let == "L"),
high_M = subset(High, let == "M"),
low_M = subset(Low, let == "M"),
high_N = subset(High, let == "N"),
low_N = subset(Low, let == "N"),
high_O = subset(High, let == "O"),
low_O = subset(Low, let == "O"),
high_P = subset(High, let == "P"),
low_P = subset(Low, let == "P"),
high_Q = subset(High, let == "Q"),
low_Q = subset(Low, let == "Q"),
high_R = subset(High, let == "R"),
low_R = subset(Low, let == "R"),
high_S = subset(High, let == "S"),
low_S = subset(Low, let == "S"),
high_T = subset(High, let == "T"),
low_T = subset(Low, let == "T"),
high_U = subset(High, let == "U"),
low_U = subset(Low, let == "U"),
high_V = subset(High, let == "V"),
low_V = subset(Low, let == "V"),
high_W = subset(High, let == "W"),
low_W = subset(Low, let == "W"),
high_X = subset(High, let == "X"),
low_X = subset(Low, let == "X"),
high_a = subset(High, let == "a"),
low_a = subset(Low, let == "a"),
high_b = subset(High, let == "b"),
low_b = subset(Low, let == "b"),
high_c = subset(High, let == "c"),
low_c = subset(Low, let == "c"),
high_d = subset(High, let == "d"),
low_d = subset(Low, let == "d"),
high_e = subset(High, let == "e"),
low_e = subset(Low, let == "e"),
high_f = subset(High, let == "f"),
low_f = subset(Low, let == "f"),
high_g = subset(High, let == "g"),
low_g = subset(Low, let == "g"),
high_h = subset(High, let == "h"),
low_h = subset(Low, let == "h"),
high_i = subset(High, let == "i"),
low_i = subset(Low, let == "i"),
high_j = subset(High, let == "j"),
low_j = subset(Low, let == "j"),
high_k = subset(High, let == "k"),
low_k = subset(Low, let == "k"))
有没有什么方法可以通过在let 列上获取 High 和 Low 的值来获得我需要的变量?
【问题讨论】:
-
为了便于理解和解决,与其共享 350 行和一堆字母列的示例数据,您可以制作一个 20 行和 3 个字母列的最小示例吗?您仍然可以指定解决方案需要轻松扩展到更多列,但使用最少的示例会更容易查看、理解和测试解决方案。