【发布时间】:2013-07-11 22:08:17
【问题描述】:
我正在尝试根据 data.frame 中包含的值重塑和“扩展”data.frame。下面是我开始使用的数据框的结构:
起始结构:
'data.frame': 9 obs. of 5 variables:
$ Delivery.Location : chr "Henry" "Henry" "Henry" "Henry" ...
$ Price : num 2.97 2.96 2.91 2.85 2.89 ...
$ Trade.Date : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ...
$ Delivery.Start.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ...
$ Delivery.End.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ...
此价格数据的来源市场称为“次日市场”,因为天然气的实物交割通常是在天然气交易后的第二天(即上述Trade.Date) .我强调通常,因为周末和节假日会出现例外情况,在这种情况下,交货期可能是多天(即 2-3 天)。但是,数据结构提供了明确声明Delivery.Start.Date 和Delivery.End.Date 的变量。
我正在尝试通过以下方式重构 data.frame 以生成一些时间序列图表并进行额外分析:
所需结构:
$ Delivery.Location
$ Trade.Date
$ Delivery.Date <<<-- How do I create this variable?
$ Price
如何根据现有的Delivery.Start.Date 和Delivery.End.Date 变量创建Delivery.Date 变量?
换句话说,2012-01-06 Trade.Date 的数据如下所示:
Delivery Location Price Trade.Date Delivery.Start.Date Delivery.End.Date
Henry 2.851322 2012-01-06 2012-01-07 2012-01-09
我想以某种方式“填写”2012-01-08 的 Delivery.Location & Price 以获得这样的信息:
Delivery Location Price Trade.Date Delivery.Date
Henry 2.851322 2012-01-06 2012-01-07
Henry 2.851322 2012-01-06 2012-01-08 <--new record "filled in"
Henry 2.851322 2012-01-06 2012-01-09
以下是我的 data.frame 的子集示例:
##--------------------------------------------------------------------------------------------
## sample data
##--------------------------------------------------------------------------------------------
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745), Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"), Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame")
str(df)
##--------------------------------------------------------------------------------------------
## create sequence of Delivery.Dates to potentially use
##--------------------------------------------------------------------------------------------
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date)))
Delivery.Date <- seq(rng[1], rng[2], by=1)
任何帮助或一般指导将不胜感激。
【问题讨论】:
-
你能具体说明你想要什么吗?
-
@Metrics:我已经编辑了我的问题,希望能更清楚地说明问题。我很抱歉从一开始就没有更具体。
-
NP;你想知道交货开始日期和结束日期之间的差异吗?
-
@metrix。不不想要差异只是希望开始日期和结束日期之间的所有日期的价格相同