R data.frame 到 SQL - 保留有序因子答案

【问题标题】：R data.frame to SQL - preserving ordered factorsR data.frame 到 SQL - 保留有序因子
【发布时间】：2015-02-04 15:52:16
【问题描述】：

我刚刚开始使用 MySQL 来处理当前在 R 数据框对象中的数据。我希望有一个简单的往返 SQL 来准确地重新创建一个 R 数据框：

library("compare",pos=2)
library("RMySQL",pos=2)
conR <- dbConnect(MySQL(),
             user = '...',
             password = '...',
             host = '...',
             dbname='r2014')
a3 <- data.frame(x=5:1,y=letters[1:5],z=ordered(c("NEVER","ALWAYS","NEVER","SOMETIMES","NEVER"),levels=c("NEVER","SOMETIMES","ALWAYS")))
a3
dbWriteTable(conn = conR, name = 'a3', value = a3)
a4 <- dbReadTable(conn = conR, name = 'a3')
compare(a3,a4)$detailedResult
a3$z
a4$z

结果显示因子最终以字符串形式出现（y 列和 z 列），有序因子的排序信息丢失了（z 列）：

> a3
  x y         z
1 5 a     NEVER
2 4 b    ALWAYS
3 3 c     NEVER
4 2 d SOMETIMES
5 1 e     NEVER
> compare(a3,a4)$detailedResult
    x     y     z 
  TRUE FALSE FALSE 
> a3$z
[1] NEVER     ALWAYS    NEVER     SOMETIMES NEVER    
Levels: NEVER < SOMETIMES < ALWAYS
> a4$z
[1] "NEVER"     "ALWAYS"    "NEVER"     "SOMETIMES" "NEVER" 
> a3$y
[1] a b c d e
Levels: a b c d e
> a4$y
[1] "a" "b" "c" "d" "e"

在数据库中创建表a3时有没有办法指定有序因子中的信息？

【问题讨论】：

在关系数据库中，表表示无序集。除非您在最外层的 select 中明确使用 order by 进行查询，否则您不能依赖查询返回的结果的顺序。
@GordonLinoff 问题不在查询结果中，而是 R data.frame 对象存储了一个因子的级别信息，R 中的其他函数可以理解这些信息。我希望将这些信息传输到 dbWriteTable 创建的表中，然后在 dbReadTable 将 data.frame 读回 R 时保留下来。
我不认为关系数据库可以轻松地表示同一张表中的排序。我建议创建另一个表来表示排序。它将有两列：因子标签和标签在排序中的排名。
目前没有办法无缝地将因素（有序或其他）往返于数据库。

标签： mysql sql r rmysql factors

【解决方案1】：

我会将代码更改为：

dbWriteTable(conn = conR, name = 'a3', value = a3, row.names=TRUE)
a4 <- dbReadTable(conn = conR, name = 'a3', row.names=TRUE)

row.names 的 data.frame 是默认排序的。当它们存储在 SQL 列中时，它们也是有序的。 SELECT 查询可以使用ORDER BY row_names 来获取有序集。

dbReadTable() 参数中row.names 的值可以更改为NA，以防SQL 表不包含row_names 列。[2]

[1] 参考号：DBI::dbWriteTable

 The interpretation of rownames depends on the ‘row.names’
 argument, see ‘sqlRownamesToColumn()’ for details:

    • If ‘FALSE’ or ‘NULL’, row names are ignored.

    • If ‘TRUE’, row names are converted to a column named
      "row_names", even if the input data frame only has natural
      row names from 1 to ‘nrow(...)’.

    • If ‘NA’, a column named "row_names" is created if the data
      has custom row names, no extra column is created in the case
      of natural row names.

    • If a string, this specifies the name of the column in the
      remote table that contains the row names, even if the input
      data frame only has natural row names.

[2] 参考号：DBI::dbReadTable

 The presence of rownames depends on the ‘row.names’ argument, see
 ‘sqlColumnToRownames()’ for details:

    • If ‘FALSE’ or ‘NULL’, the returned data frame doesn't have
      row names.

    • If ‘TRUE’, a column named "row_names" is converted to row
      names.

    • If ‘NA’, a column named "row_names" is converted to row names
      if it exists, otherwise no translation occurs.

    • If a string, this specifies the name of the column in the
      remote table that contains the row names.

【讨论】：