在 R 中组合向量和表格答案

【问题标题】：Combine vectors and tables in R在 R 中组合向量和表格
【发布时间】：2015-10-02 20:07:31
【问题描述】：

我在一个简单的合并任务中遇到了一个问题，我正在寻找更好的解决方案。我正在从一系列调查中创建表格（我无法合并）。这些表具有相同的值，但维度不同。

数据如下。

桌子 x

x <- structure(c(44L, 167L), .Dim = 2L, .Dimnames = structure(list(
    c("similar", "compete")), .Names = ""), class = "table")

表 y

y <- structure(c(69L, 213L, 154L, 4L, 29L, 32L), .Dim = c(3L, 2L), .Dimnames = structure(list(
    c("other", "compete", "similar"), c("college", "no college"
    )), .Names = c("", "")), class = "table")

表z

z <- structure(c(13L, 38L, 43L, 46L, 131L, 172L, 37L, 177L, 122L, 
8L, 34L, 12L, 16L, 114L, 70L, 20L, 17L, 27L), .Dim = c(3L, 6L
), .Dimnames = structure(list(c("other", "compete", "similar"
), c("skipped", "Democrat", "Independent", "Libertarian", "Republican", 
"other")), .Names = c("", "")), class = "table")

我的解决方案是使用cbind 并像这样取出不同的列

cbind(y[-1,], x,  z[-1,-1])

然后我了解到，在 R 中，行名是不可靠的，如果 cbind 的顺序混淆，表格会变得不同。这使得创建表非常不可靠。我希望能够合并 3 个或更多表，而不必担心合并的顺序会弄乱数据。

有什么更好的方法来组合不同维度的表格？

我怀疑data.table 或dplyr 可能有很好的方法，但还没有弄清楚。

谢谢，如果我能更清楚地说明这个问题，请告诉我。

【问题讨论】：

您可以放心使用cbind(y[rownames(x), ], x, z[rownames(x), -1])。这样你就会得到正确的顺序
A table 几乎是一个终端对象，不太适合在 addmargins 和 prop.table 之外进行进一步操作。我认为这是行名可以的一个地方。如果您想做一些花哨的事情，请在使用 table() 之前进行，我会说。
另外，一般注意：最好不要仅仅因为你想要一个使用它的解决方案而标记 data.table。据我所知，dplyr 作者并不在乎。
@Frank 我正在尝试比较来自不同调查人群的交叉表结果，他们提出了相同的问题（而不是绘制它们）。您是否建议使用不同的对象来查看表格？也许是一个 data.frame？
抱歉，我不知道如何绘制二维以上的表格。

标签： r merge data.table dplyr

【解决方案1】：

不确定我是否错过了这里的重点，也不确定您需要该流程如何“自动化”，但这可能会有所帮助：

x <- structure(c(44L, 167L), .Dim = 2L, .Dimnames = structure(list(
  c("similar", "compete")), .Names = ""), class = "table")

y <- structure(c(69L, 213L, 154L, 4L, 29L, 32L), .Dim = c(3L, 2L), .Dimnames = structure(list(
  c("other", "compete", "similar"), c("college", "no college"
  )), .Names = c("", "")), class = "table")

z <- structure(c(13L, 38L, 43L, 46L, 131L, 172L, 37L, 177L, 122L, 
                 8L, 34L, 12L, 16L, 114L, 70L, 20L, 17L, 27L), .Dim = c(3L, 6L
                 ), .Dimnames = structure(list(c("other", "compete", "similar"
                 ), c("skipped", "Democrat", "Independent", "Libertarian", "Republican", 
                      "other")), .Names = c("", "")), class = "table")

library(dplyr)
library(tidyr)

# create data frames from tables
x = data.frame(x)
names(x) = c("group","x")

y = data.frame(y) %>% spread(Var2,Freq)
names(y)[1] = "group"

z = data.frame(z) %>% spread(Var2, Freq)
names(z)[1] = "group"

# join data frames
x %>% inner_join(y, by="group") %>% inner_join(z, by="group")

#     group   x college no college skipped Democrat Independent Libertarian Republican other
# 1 similar  44     154         32      43      172         122          12         70    27
# 2 compete 167     213         29      38      131         177          34        114    17

【讨论】：

【解决方案2】：

下面的代码按行绑定您的数据，并用 NA 填充缺失列的值。从那里你应该能够继续你的分析。

library(plyr)

my_list <- list(as.data.frame(x),
                as.data.frame(y),
                as.data.frame(z))


Reduce(x = my_list, f = rbind.fill)

# resulting data.frame

      Var1 Freq        Var2
1  similar   44        <NA>
2  compete  167        <NA>
3    other   69     college
4  compete  213     college
5  similar  154     college
6    other    4  no college
7  compete   29  no college
8  similar   32  no college
9    other   13     skipped
10 compete   38     skipped
11 similar   43     skipped
12   other   46    Democrat
13 compete  131    Democrat
14 similar  172    Democrat
15   other   37 Independent
16 compete  177 Independent
17 similar  122 Independent
18   other    8 Libertarian
19 compete   34 Libertarian
20 similar   12 Libertarian
21   other   16  Republican
22 compete  114  Republican
23 similar   70  Republican
24   other   20       other
25 compete   17       other
26 similar   27       other

【讨论】：