【发布时间】:2021-06-09 14:47:44
【问题描述】:
一个简单的工作流程如下:
- 对于每个实体,获取“
PROD_OIL”列的first 3 non-null values - 计算'
FORCAST_PROD_OIL'列对应值的mean;忽略NA's(如果有)。
输入:
structure(list(entity= c("A", "A", "A", "A", "A", "A", "A",
"A"), REPORT_DATE = structure(c(1623110400, 1623024000, 1622937600,
1622851200, 1622764800, 1622678400, 1622592000, 1622505600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), PROD_OIL = c("NA", "NA", "265.85000000000002",
"NA", "272.45999999999998", "NA", "262.32", "NA"), PROD_GAS = c("NA",
"NA", "2940.78", "NA", "2947.35", "NA", "3237.78", "NA"), FORECAST_PROD_OIL = c(283.71353,
284.29868, 284.88622, 285.47615, 286.06849, 286.66326, 287.26047,
287.86013), FORECAST_PROD_GAS = c(3038.99083, 3042.47991, 3045.97701,
3049.48216, 3052.99539, 3056.51672, 3060.04619, 3063.58382)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
我写了这个简单的dpylr 命令,但我没有得到正确的平均值。
AvgLast3WT <- dt%>%
dplyr::arrange(entity,desc(REPORT_DATE))%>%
dplyr::group_by(entity) %>%
dplyr::select(entity,REPORT_DATE,PROD_OIL,PROD_GAS,FORECAST_PROD_OIL, FORECAST_PROD_GAS)%>%
dplyr::summarise(GetMean= mean(na.omit(with(dt, FORECAST_PROD_OIL[!is.na(PROD_OIL)])[1:3])))%>%
ungroup()
答案应该是 286.07(下面红细胞的平均值),但我得到 285.4!
【问题讨论】: