【问题标题】:R data frame reshape, restructure, and/or mergeR 数据框重塑、重组和/或合并
【发布时间】:2013-07-11 22:08:17
【问题描述】:

我正在尝试根据 data.frame 中包含的值重塑和“扩展”data.frame。下面是我开始使用的数据框的结构:

起始结构:

'data.frame':   9 obs. of  5 variables:
 $ Delivery.Location    : chr  "Henry" "Henry" "Henry" "Henry" ...
 $ Price                : num  2.97 2.96 2.91 2.85 2.89 ...
 $ Trade.Date           : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ...
 $ Delivery.Start.Date  : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ...
 $ Delivery.End.Date    : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ...

此价格数据的来源市场称为“次日市场”,因为天然气的实物交割通常是在天然气交易后的第二天(即上述Trade.Date) .我强调通常,因为周末和节假日会出现例外情况,在这种情况下,交货期可能是多天(即 2-3 天)。但是,数据结构提供了明确声明Delivery.Start.DateDelivery.End.Date 的变量。

我正在尝试通过以下方式重构 data.frame 以生成一些时间序列图表并进行额外分析:

所需结构:

$ Delivery.Location
$ Trade.Date
$ Delivery.Date    <<<-- How do I create this variable? 
$ Price

如何根据现有的Delivery.Start.DateDelivery.End.Date 变量创建Delivery.Date 变量?

换句话说,2012-01-06 Trade.Date 的数据如下所示:

Delivery Location   Price      Trade.Date      Delivery.Start.Date     Delivery.End.Date     
Henry               2.851322    2012-01-06     2012-01-07              2012-01-09  

我想以某种方式“填写”2012-01-08 的 Delivery.Location & Price 以获得这样的信息:

Delivery Location     Price      Trade.Date      Delivery.Date
Henry                 2.851322    2012-01-06     2012-01-07   
Henry                 2.851322    2012-01-06     2012-01-08   <--new record "filled in"
Henry                 2.851322    2012-01-06     2012-01-09   

以下是我的 data.frame 的子集示例:

##--------------------------------------------------------------------------------------------
## sample data
##--------------------------------------------------------------------------------------------
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745),  Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"),  Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame")

str(df)

##--------------------------------------------------------------------------------------------   
## create sequence of Delivery.Dates to potentially use
##--------------------------------------------------------------------------------------------
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date)))
Delivery.Date <- seq(rng[1], rng[2], by=1)

任何帮助或一般指导将不胜感激。

【问题讨论】:

  • 你能具体说明你想要什么吗?
  • @Metrics:我已经编辑了我的问题,希望能更清楚地说明问题。我很抱歉从一开始就没有更具体。
  • NP;你想知道交货开始日期和结束日期之间的差异吗?
  • @metrix。不不想要差异只是希望开始日期和结束日期之间的所有日期的价格相同

标签: r dataframe plyr reshape


【解决方案1】:

您可以使用plyr 包中的ddply

library(plyr)
ddply(
      df,
      c("Delivery.Location","Trade.Date"),
      function(trade)
      data.frame(
      trade,
      Delivery.Date=seq(
          from=trade$Delivery.Start.Date,
          to=trade$Delivery.End.Date,
          by="day")
      )
 )

当然,您仍然需要实现有关周末、节假日等的逻辑。

我还假设Delivery.LocationTrade.Date 足以识别单笔交易。

【讨论】:

    【解决方案2】:

    这样好吗?

    library(plyr)   
    
    
    
    lookuptable<-df[,2:3]
    
    Trade.Date<-df[,4]
    filluptable1<-as.data.frame(Trade.Date)
    Trade.Date<-df[,5]
    filluptable2<-as.data.frame(Trade.Date)
    
    myfillstart<- join(filluptable1, lookuptable, by = "Trade.Date")
    myfillstart<- rename(myfillstart, c(Trade.Date="Delivery.Start.Date"))
    myfillstart<- rename(myfillstart, c(Price="Price.Start.Date"))
    myfillend<- join(filluptable2, lookuptable, by = "Trade.Date")
    myfillend<- rename(myfillend, c(Trade.Date="Delivery.End.Date"))
    myfillend<- rename(myfillend, c(Price="Price.End.Date"))
    finaldf<-cbind(df[,1:3],myfillstart,myfillend)
    
    
    
    finaldf
        Delivery.Location    Price Trade.Date Delivery.Start.Date Price.Start.Date Delivery.End.Date Price.End.Date
    35              Henry 2.965398 2012-01-03          2012-01-04         2.959077        2012-01-04       2.959077
    150             Henry 2.959077 2012-01-04          2012-01-05         2.906436        2012-01-05       2.906436
    263             Henry 2.906436 2012-01-05          2012-01-06         2.851322        2012-01-06       2.851322
    377             Henry 2.851322 2012-01-06          2012-01-07               NA        2012-01-09       2.890364
    493             Henry 2.890364 2012-01-09          2012-01-10         2.965585        2012-01-10       2.965585
    607             Henry 2.965585 2012-01-10          2012-01-11         2.807734        2012-01-11       2.807734
    724             Henry 2.807734 2012-01-11          2012-01-12         2.702072        2012-01-12       2.702072
    838             Henry 2.702072 2012-01-12          2012-01-13         2.671732        2012-01-13       2.671732
    955             Henry 2.671732 2012-01-13          2012-01-14               NA        2012-01-17             NA
    

    注意:由于您的位置相同,因此我没有查找该位置。但是,您也可以这样做。代码看起来有点乱。 Here 是您也可以通过的替代方案。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-01-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-03-10
      • 1970-01-01
      • 2023-04-06
      相关资源
      最近更新 更多