创建长度不等的数据框答案

【问题标题】：Create a Data Frame of Unequal Lengths创建长度不等的数据框
【发布时间】：2023-04-07 07:36:01
【问题描述】：

虽然数据框列必须具有相同数量的行，但有什么方法可以创建长度不等的数据框。我对将它们保存为列表的单独元素不感兴趣，因为我经常必须将此信息作为 csv 文件通过电子邮件发送给人们，而这作为数据框最简单。

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
cbind(x,y,z)

在上面的代码中，cbind() 函数只是回收较短的列，以便它们在每列中都有 10 个元素。我怎样才能改变它，使长度为 2、10 和 5。

我过去通过执行以下操作来完成此操作，但效率低下。

  df = data.frame(one=c(rep("one",2),rep("",8)), 
           two=c(rep("two",10)), three=c(rep("three",5), rep("",5)))

【问题讨论】：

这个问题有arisen before。后者可能不是完全重复，但前者非常接近。
是的。特别是，我的答案与前者给出的两个答案几乎相同。 @Owen 的“颠覆性”答案新颖而聪明（如果危险的话）。
这个问题就像在问我如何存储一个表示 2/3 的整数。
您也可以使用 dput 以 ascii（仅限 R）格式存储数据。

标签： r dataframe

【解决方案1】：

要放大@goodside 的答案，您可以这样做

L <- list(x,y,z)
cfun <- function(L) {
  pad.na <- function(x,len) {
   c(x,rep(NA,len-length(x)))
  }
  maxlen <- max(sapply(L,length))
  do.call(data.frame,lapply(L,pad.na,len=maxlen))
}
cfun(L)

【讨论】：

【解决方案2】：

抱歉，这不是您所要求的，但我认为可能还有其他方法可以得到您想要的。

首先，如果向量的长度不同，数据就不是真正的表格，是吗？将其保存到不同的 CSV 文件怎么样？您也可以尝试允许存储多个对象的 ascii 格式（json、XML）。

如果你觉得数据确实是表格的，你可以在 NA 上填充：

> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
 [1]  1  2  3  4  5 NA NA NA NA NA NA NA
> y
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

如果您绝对必须使用不相等的列创建data.frame，您可能会破坏检查，后果自负：

> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
    row.names=1:max(length(x), length(y)), class='data.frame')
> df
      x  y
1     1  1
2     2  2
3     3  3
4     4  4
5     5  5
6  <NA>  6
7  <NA>  7
 [ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

【讨论】：

attributes(df) = list( names = names(df), row.names=1:max.len, class = 'data.frame')
“颠覆检查”选项无法通过 RStudio 1.0.136 使用 r.3.3.3。它使 R 崩溃。

【解决方案3】：

填充的另一种方法：

na.pad <- function(x,len){
    x[1:len]
}

makePaddedDataFrame <- function(l,...){
    maxlen <- max(sapply(l,length))
    data.frame(lapply(l,na.pad,len=maxlen),...)
}

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))

makePaddedDataFrame(list(x=x,y=y,z=z))

na.pad() 函数利用了这样一个事实，即如果您尝试索引不存在的元素，R 将自动用 NA 填充向量。

makePaddedDataFrame() 只找到最长的一个并将其余的填充到匹配的长度。

【讨论】：

【解决方案4】：

类似的问题：

 coin <- c("Head", "Tail")
toss <- sample(coin, 50, replace=TRUE)

categorize <- function(x,len){
  count_heads <- 0
  count_tails <- 0
  tails <- as.character()
  heads <- as.character()
  for(i in 1:len){
    if(x[i] == "Head"){
      heads <- c(heads,x[i])
      count_heads <- count_heads + 1
    }else {
      tails <- c(tails,x[i])
      count_tails <- count_tails + 1
    }
  }
  if(count_heads > count_tails){
    head <- heads
    tail <- c(tails, rep(NA, (count_heads-count_tails)))
  } else {
    head <- c(heads, rep(NA,(count_tails-count_heads)))
    tail <- tails
  }
  data.frame(cbind("Heads"=head, "Tails"=tail))
}

分类（折腾，50）

输出： After the toss of the coin there will be 31 Head and 19 Tail. Then the rest of the tail will be filled with NA in order to make a data frame.

【讨论】：

在循环中增长东西在 R 中是个坏主意；通常的参考是 www.burns-stat.com/documents/books/the-r-inferno/ 你可以做heads = sum(x == "Head")，对吧？真的，我想rbinom 在任何情况下都比sample 更有意义。

【解决方案5】：

我们可以通过用空字符“”填充列来创建包含长度不等的列的数据框。以下代码可用于创建长度不等的数据框

代码首先找到列表对象的最大列长度，l 然后用“”填充列。这将导致列表的每一列具有相同数量的元素。然后将列表转换为数据框。

# The list column names
cols <- names(l)

# The maximum column length
max_len <- 0
for (col in cols){
    if (length(l[[col]]) > max_len)
        max_len <- length(l[[col]])
}

# Each column is padded
for (col in cols){
    l[[col]] <- c(l[[col]], rep("", max_len - length(l[[col]])))
}

# The list is converted to data frame
df <- as.data.frame(l)

【讨论】：