R - 将列表列表转换为数据框时如何保留数据类型和标题答案

【问题标题】：R - How to preserve data types and titles when converting list of lists to data frameR - 将列表列表转换为数据框时如何保留数据类型和标题
【发布时间】：2020-03-12 15:03:11
【问题描述】：

我正在处理列表receipts 的列表。 receipts 中的每个条目都包含一个代表收据的列表。收据的结构是一致的，看起来像这样。

> str(receipts[[1]])
List of 6
 $ receipt_type  : chr "SALESPERSON_ACTIVITY"
 $ timestamp     : POSIXct[1:1], format: "2020-01-01 09:29:00"
 $ receipt_number: int 1195
 $ POS           : int 1
 $ KNo           : int 12
 $ shift_number  : int 9

receipt_number 也可能包含 NA 值。

我想将此列表转换为具有相应列（receipt_type、timestamp、receipt_number 等）的数据框。目前我正在使用这个

receipts_as_df <- as.data.frame(matrix(unlist(receipts), byrow=TRUE, ncol=length(receipts[[1]])))

这会将数据放入数据框中。可悲的是unlist 删除了有关数据类型的所有信息（我认为所有内容都被强制转换为character）。此外，列名也会丢失。因此，我有一个包含所有数据的数据框，但类型和列名丢失了。

我知道我可以手动重命名列和数据类型，但想知道是否有更舒适的方法来处理这种情况。

示例：目前数据框是这样的

> head(receipts_as_df)
                        V1         V2   V3 V4 V5 V6
1     SALESPERSON_ACTIVITY 1577867340 1195  1 12  9
2 CASH_REGISTER_MONITORING 1577867340 <NA>  1 12  9
3      PAYOUT_NOTIFICATION 1577867340 1196  1 12  9
4             TSE_ACTIVITY 1577869080 <NA>  1 12  9
5   BUSINESS_MODE_ACTIVITY 1577869080 <NA>  1 12  9
6             ZERO_RECEIPT 1577869140 1197  1 12  9

【问题讨论】：

do.call(rbind.data.frame, receipts)?还有dplyr::bind_rows(receipts) 或data.table::rbindlist(receipts)。

标签： r list dataframe matrix coercion

【解决方案1】：

Base R，需要做更多工作：

receipts <- replicate(3, list(
  receipt_type   = "SALESPERSON_ACTIVITY",
  timestamp      = as.POSIXct("2020-01-01 09:29:00", tz = "UTC"),
  receipt_number = 1195,
  POS            = 1,
  KNo            = 12,
  shift_number   = 9
), simplify = FALSE)

out <- do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE)))
out
#            receipt_type  timestamp receipt_number POS KNo shift_number
# 2  SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
# 21 SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
# 3  SALESPERSON_ACTIVITY 1577870940           1195   1  12            9
str(out)
# 'data.frame': 3 obs. of  6 variables:
#  $ receipt_type  : chr  "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY" "SALESPERSON_ACTIVITY"
#  $ timestamp     : num  1.58e+09 1.58e+09 1.58e+09
#  $ receipt_number: num  1195 1195 1195
#  $ POS           : num  1 1 1
#  $ KNo           : num  12 12 12
#  $ shift_number  : num  9 9 9
out$timestamp <- as.POSIXct(out$timestamp, origin = "1970-01-01")
out
#            receipt_type           timestamp receipt_number POS KNo shift_number
# 2  SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
# 21 SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9
# 3  SALESPERSON_ACTIVITY 2020-01-01 01:29:00           1195   1  12            9

dplyr 和 data.table 无需额外工作：

dplyr::bind_rows(receipts)
# # A tibble: 3 x 6
#   receipt_type         timestamp           receipt_number   POS   KNo shift_number
#   <chr>                <dttm>                       <dbl> <dbl> <dbl>        <dbl>
# 1 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
# 2 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
# 3 SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195     1    12            9
data.table::rbindlist(receipts)
#            receipt_type           timestamp receipt_number POS KNo shift_number
# 1: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9
# 2: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9
# 3: SALESPERSON_ACTIVITY 2020-01-01 09:29:00           1195   1  12            9

【讨论】：

谢谢！我想使用base R，但我担心do.call(rbind.data.frame, c(receipts, list(stringsAsFactors = FALSE))) 中的索引21。这里发生了什么？
是行名，一般容易忽略；它是rbind 进程的产物，它不努力设置或修复推断的行名称。如果您担心，那么您可以使用rownames(out) <- NULL“重置”行名（将它们重置为整数字符串）。
这行得通。谢谢你。我知道它不会有什么不同，但看起来更好......