【问题标题】:R - Tricky CSV PrintoutR - 棘手的 CSV 打印输出
【发布时间】:2014-01-01 21:51:00
【问题描述】:

我想得到一个棘手的打印输出格式。这是我目前的数据框,它是由 for 循环和 rbind 构建的。

bets<- data.frame(status=character(), f_name=character(), d_name=character(), type_bet=character(), sec=character(),
                  spread=character(), total=character(), deriv=character(), book=character(), edge=character(), 
                  my_f_price=character(), book_f_price=character(), my_d_price=character(), book_d_price=character())

示例打印输出:

status  f_name  d_name  type_bet    sec spread  total   deriv   book    edge    my_f_price  book_f_price    my_d_price  book_d_price

9:00 PM ET  San Diego State Colorado State  total   h1  3.5 138.5   65  pin 12  120 -108    -120    -108
9:00 PM ET  San Diego State Colorado State  total   h1  3.5 138.5   65  5d  10  120 -110    -120    -110
6:00 PM ET  Cincinnati  SMU total   h1  8   125.5   59  pin 9   122 -103    -122    -113
8:00 PM ET  Temple  Rutgers total   h1  1.5 150 70.5    pin 8   116 -108    -116    -108
8:00 PM ET  Temple  Rutgers total   h1  1.5 150 70.5    5d  6   116 -110    -116    -110
8:05 PM ET  Drake   Evansville  ml  h1  7   136 0   5d  4   -214    -210    214 175
8:00 PM ET  Northern Iowa   Bradley total   h1  12  133 62  5d  3   113 -110    -113    -110
6:00 PM ET  Cincinnati  SMU ml  h1  8   125.5   0   5d  2   -242    -240    242 200
6:00 PM ET  Cincinnati  SMU total   h1  8   125.5   58.5    5d  2   112 -110    -112    -110

有点难看,但是边缘列是它的排序方式,12、10、9、8、6、4、3、2、2。我想要做的是将一些条目组合在一起。当 f_name、d_name、type_bet 和 sec 都相同,并且唯一不同的列是 book 时,应将其视为一组。所以理想情况下,我希望打印输出如下所示:

status  f_name  d_name  type_bet    sec spread  total   deriv   book    edge    my_f_price  book_f_price    my_d_price  book_d_price

9:00 PM ET  San Diego State Colorado State  total   h1  3.5 138.5   65  pin 12  120 -108    -120    -108
9:00 PM ET  San Diego State Colorado State  total   h1  3.5 138.5   65  5d  10  120 -110    -120    -110

6:00 PM ET  Cincinnati  SMU total   h1  8   125.5   59  pin 9   122 -103    -122    -113
6:00 PM ET  Cincinnati  SMU total   h1  8   125.5   58.5    5d  2   112 -110    -112    -110

8:00 PM ET  Temple  Rutgers total   h1  1.5 150 70.5    pin 8   116 -108    -116    -108
8:00 PM ET  Temple  Rutgers total   h1  1.5 150 70.5    5d  6   116 -110    -116    -110

8:05 PM ET  Drake   Evansville  ml  h1  7   136 0   5d  4   -214    -210    214 175

8:00 PM ET  Northern Iowa   Bradley total   h1  12  133 62  5d  3   113 -110    -113    -110

6:00 PM ET  Cincinnati  SMU ml  h1  8   125.5   0   5d  2   -242    -240    242 200

现在我能想到的唯一方法是逐行打印到 txt 文件,循环遍历数据框(按边缘列排序),对于每个条目,我可以在数据框的其余部分中搜索另一个条目相同的 f_name、d_name、type_bet、sec 并打印它,然后从数据框中删除它。但我认为有更好的方法吗?

【问题讨论】:

    标签: r csv dataframe subset


    【解决方案1】:

    您的示例数据(您可以使用 dput(yourData) 来生成此数据 - 更容易提供帮助)

    df <- structure(list(status = c("9:00 PM ET", "9:00 PM ET", "6:00 PM ET", 
     "8:00 PM ET", "8:00 PM ET", "8:05 PM ET", "8:00 PM ET", "6:00 PM ET", 
    "6:00 PM ET"), f_name = c("San Diego State", "San Diego State", 
    "Cincinnati", "Temple", "Temple", "Drake", "Northern Iowa", "Cincinnati", 
    "Cincinnati"), 
     d_name = c("Colorado State", "Colorado State", "SMU", "Rutgers", "Rutgers", 
      "Evansville", "Bradley", "SMU", "SMU"), type_bet = c("total", "total", "total",      
    "total", "total", 
     "ml", "total", "ml", "total"), sec = c("h1", "h1", "h1", "h1", 
     "h1", "h1", "h1", "h1", "h1"), spread = c(3.5, 3.5, 8, 1.5, 1.5, 
    7, 12, 8, 8), total = c(138.5, 138.5, 125.5, 150, 150, 136, 133, 
    125.5, 125.5), deriv = c(65, 65, 59, 70.5, 70.5, 0, 62, 0, 58.5), 
     book = c("pin", "5d", "pin", "pin", "5d", "5d", "5d", "5d", 
     "5d"), edge = c(12L, 10L, 9L, 8L, 6L, 4L, 3L, 2L, 2L), my_f_price = c(120L, 
     120L, 122L, 116L, 116L, -214L, 113L, -242L, 112L), book_f_price = c(-108L, 
    -110L, -103L, -108L, -110L, -210L, -110L, -240L, -110L), my_d_price = c(-120L, 
    -120L, -122L, -116L, -116L, 214L, -113L, 242L, -112L), book_d_price = c(-108L, 
     -110L, -113L, -108L, -110L, 175L, -110L, 200L, -110L)), .Names = c("status", 
    "f_name", "d_name", "type_bet", "sec", "spread", "total", "deriv", 
    "book", "edge", "my_f_price", "book_f_price", "my_d_price", "book_d_price" ),
                  class = "data.frame", row.names = c(NA, -9L))
    
    #You can sort your data on the required columns - but doesn't produce exactly the output you want
    df2 <- df[order(df$f_name, df$d_name, df$type_bet, df$sec) , ]
    

    不确定你想要什么输出结构 (即组之间的空白行是什么?),但您可以使用列表来接近这一点。

    #Split data by required groups (and remove empty dataframes produced by interaction)
    df.grp <- split(df , list(df$f_name, df$d_name, df$type_bet, df$sec))
    df.grp <- df.grp[sapply(df.grp, function(z) nrow(z)>0)]
    
    #Get in the order of decreasing edge
    max.edge <- unlist(lapply(df.grp , function(x) max(x[,'edge'])))
    list.names <- names(sort(max.edge, decreasing=T))
    (out <- df.grp[match(names(df.grp),list.names)])
    

    【讨论】:

    • 对不起,我会尽量让打印输出更好,重现性更好
    【解决方案2】:

    我使用自己的数据框,因为它比处理上面的文本字符串要少。

    假设你想组成一个组的变量被称为formGroupVarX(在你的例子中是“f_name”、“d_name”、“type_bet”、“sec”)和FreeVarX之外的变量(所有其他变量) 然后你可以显示如下:

    formGroupVars = c("formGroupVar1","formGroupVar2","formGroupVar3")
    freeVars = c("FreeVar1")
    frameToShow <- data.frame(cbind(sample(LETTERS[1:3],20,replace=TRUE),sample(LETTERS[4:6],20,replace=TRUE),
                                    sample(LETTERS[7:9],20,replace=TRUE),sample(letters,20,replace=TRUE)        ))
    colnames(frameToShow) = c(formGroupVars,freeVars)
    frameToShow[order(apply(frameToShow,1,function(X) {  paste(X[formGroupVars],collapse="")   } )),]
    

    基本上,您创建一个临时因子级别,该级别由您想要组成一个组的所有变量的函数组成,并在该临时因子上排序您的显示。在您和我的示例中,值的简单串联就可以解决问题,但理论上该函数可以是数学函数或任何其他函数。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-08-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-11-19
      相关资源
      最近更新 更多