时间序列数据的百分比变化差异答案

【问题标题】：Percent change difference for time series data时间序列数据的百分比变化差异
【发布时间】：2018-03-18 19:28:35
【问题描述】：

我想在时间 1、时间 2 和时间 3 计算变量“wt”和“wc”从时间 = 1 开始的百分比变化。在时间 1，它将为 0。在时间 2，它看起来像这样在 t2 = t2-t1/t1*100 和 t3 时，'wt' 的百分比变化应该看起来像 'wt' = t3-t1/t1*100。然后，我想将此作为新变量添加到现有的 excel 数据表中。我尝试寻找其他示例，但没有一个与我的数据格式匹配。谢谢！

structure(list(code = c(100, 100, 100, 101, 101, 101, 102, 102, 
102), treatment = c(1, 1, 1, 2, 2, 2, 1, 1, 1), time = c(1, 2, 
3, 1, 2, 3, 1, 2, 3), wt = c(80, 78, 76, 75, 74, 74, 78, 74, 
72), wc = c(90, 89, 87, 92, 91, 90, 89, 86, 84)), .Names = c("code", 
"treatment", "time", "wt", "wc"), row.names = c(NA, -9L), 
class =c("tbl_df", 
"tbl", "data.frame"))

我尝试遵循以下建议。但我收到一个错误

> data <- read.csv("All Data with BMI and other tweaks.csv", header = TRUE, na.strings = ".", stringsAsFactors = FALSE)
> names(data)
 [1] "code"           "treatment"      "age"            "sex"           
 [5] "time"           "bicep"          "tricep"         "subscapular"   
 [9] "suprailiac"     "weight"         "pwc"            "wc"            
[13] "bia"            "height"         "bmi"            "wthr"          
[17] "density"        "X.fat"          "fm"             "ffm"           
[21] "dietary.recall" "reportingdate"  "NumFoods"       "NumCodes"      
[25] "kcal"           "prot"           "tfat"           "carb"          
[29] "mois"           "alc"            "caff"           "theo"          
[33] "sugr"           "fibe"           "calc"           "iron"          
[37] "magn"           "phos"           "pota"           "sodi"          
[41] "zinc"           "copp"           "sele"           "vc"            
[45] "vb1"            "vb2"            "niac"           "vb6"           
[49] "fola"           "fa"             "ff"             "fdfe"          
[53] "vb12"           "vara"           "ret"            "bcar"          
[57] "acar"           "cryp"           "lyco"           "lz"            
[61] "atoc"           "vk"             "vitd"           "choln"         
[65] "chole"          "sfat"           "s040"           "s060"          
[69] "s080"           "s100"           "s120"           "s140"          
[73] "s160"           "s180"           "mfat"           "m161"          
[77] "m181"           "m201"           "m221"           "pfat"          
[81] "p182"           "p183"           "p184"           "p204"          
[85] "p205"           "p225"           "p226"           "vite_add"      
[89] "b12_add"        "datacomp"      
> library(dplyr)
> data <- data %>%
+ group_by(code) %>%
+ mutate(wt.pch = (data$weight - data$weight[1]) / data$weight * 100, wc.pch = (data$wc - data$wc[1]) / data$wc[1] * 100)
Error in mutate_impl(.data, dots) : 
  Column `wt.pch` must be length 3 (the group size) or one, not 114

【问题讨论】：

t2 - t1 / 100 * 100 是指 (t2 - t1 / 100) * 100？看起来很奇怪。我猜应该是 (t2 - t1) / t1 * 100。
@Julius 你是对的。我错了！

标签： r time-series

【解决方案1】：

这是一种方法：

library(dplyr)
df %>% group_by(code) %>% mutate(wt.pch = (wt - wt[1]) / wt[1] * 100, 
                                 wc.pch = (wc - wc[1]) / wc[1] * 100)
# A tibble: 9 x 7
# Groups:   code [3]
#    code treatment  time    wt    wc wt.pch wc.pch
#   <dbl>     <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
# 1   100      1.00  1.00  80.0  90.0   0      0   
# 2   100      1.00  2.00  78.0  89.0  -2.50  -1.11
# 3   100      1.00  3.00  76.0  87.0  -5.00  -3.33
# 4   101      2.00  1.00  75.0  92.0   0      0   
# 5   101      2.00  2.00  74.0  91.0  -1.33  -1.09
# 6   101      2.00  3.00  74.0  90.0  -1.33  -2.17
# 7   102      1.00  1.00  78.0  89.0   0      0   
# 8   102      1.00  2.00  74.0  86.0  -5.13  -3.37
# 9   102      1.00  3.00  72.0  84.0  -7.69  -5.62

【讨论】：

谢谢！这就是我想做的。如何将这些新变量编辑到 excel 文件中？
@DiscoR，如果res 有结果，您可以使用例如write.csv(res, "file.csv") 导出它。
对不起，如果我不清楚。我已经有一个包含此数据的文件，称为“所有数据”。我想根据同一文件中的数据向该文件添加两个新的百分比变量。
@DiscoR，那么如何加载文件，运行此代码，以这种方式生成两个额外的列，然后将所有内容导出回来？没有标准的方法可以在现有的 Excel 表中粘贴额外的值。如果您确实想要一些非标准的东西，我建议发布一个新问题。或者我可能误解了这个问题。
我遵循了您的建议，但遇到了错误。详情请看主帖。

【解决方案2】：

你可以试试：

df %>% 
    group_by(code) %>% 
    mutate(pct_change_wt = ((wt - lag(wt))/ lag(wt)) * 100,
           pct_change_wc = ((wc - lag(wc))/ lag(wc)) * 100)

print(df)

   code treatment  time    wt    wc pct_change_wt pct_change_wc
  <dbl>     <dbl> <dbl> <dbl> <dbl>         <dbl>         <dbl>
1   100      1.00  1.00  80.0  90.0         NA            NA   
2   100      1.00  2.00  78.0  89.0        - 2.50        - 1.11
3   100      1.00  3.00  76.0  87.0        - 2.56        - 2.25
4   101      2.00  1.00  75.0  92.0         NA            NA   
5   101      2.00  2.00  74.0  91.0        - 1.33        - 1.09
6   101      2.00  3.00  74.0  90.0          0           - 1.10
7   102      1.00  1.00  78.0  89.0         NA            NA   
8   102      1.00  2.00  74.0  86.0        - 5.13        - 3.37
9   102      1.00  3.00  72.0  84.0        - 2.70        - 2.33

解释：

1.group_by on code 确保我们计算每个组的百分比变化。
2.lag函数取每组内的前一个值。

【讨论】：