在 R [重复] 中导入包含字符串和浮点数的邻接矩阵 (csv)答案

【问题标题】：Importing an adjacency matrix (csv) containing strings and floats in R [duplicate]在 R [重复] 中导入包含字符串和浮点数的邻接矩阵 (csv)
【发布时间】：2015-10-31 08:14:45
【问题描述】：

我有一个这样的共现邻接矩阵： https://dl.dropboxusercontent.com/u/73950/matrix_added_cats.csv

其中行和列可能包含带有特殊字符（“（”、“-”、“”等）的字符串

当我将此数据导入 R 以使用 ggplot2 对其进行可视化时，我会这样做：

myData <- read.csv("/matrix_added_cats.csv")

                Name  NGO Gov..institutions Industry..farming. Industry..mining. Academia.research Aboriginal.groups
1                NGO 0.00              0.00                  0              0.00              0.01              0.00
2  Gov. institutions 0.00              0.01                  0              0.04              0.03              0.01
3 Industry (farming) 0.00              0.00                  0              0.00              0.00              0.00
4  Industry (mining) 0.00              0.04                  0              0.10              0.25              0.07
5  Academia/research 0.01              0.03                  0              0.25              0.36              0.10
6  Aboriginal groups 0.00              0.01                  0              0.07              0.10              0.02

我们看到包含浮点值的列的名称与之前的字符串不同。我认为这导致了我的ggplot2 可视化中的几个问题：

library(reshape)
dat <- melt(myData)
myPalette <- colorRampPalette(rev(brewer.pal(9, "Spectral")), space="Lab")

zp1 <- ggplot(dat,aes(x = variable, y = Name, fill = value))
zp1 <- zp1 + geom_tile()
zp1 <- zp1 + scale_fill_gradientn(colours = myPalette(100),trans = "reverse")
zp1 <- zp1 + scale_x_discrete(expand = c(0, 0))
zp1 <- zp1 + scale_y_discrete(expand = c(0, 0))
zp1 <- zp1 + coord_equal()
zp1 <- zp1 + theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(zp1)

1) 为了使共现矩阵有意义，行和列的顺序应该相同（以便相同的行/列元素在对角线上相遇），但由于某种原因，Ggplot2 对它们的排序不同。可能是因为导入后行和列之间的字符串不同吗？

2) 特殊字符被替换为“..”，看起来很糟糕。

有没有办法解决这些问题？

【问题讨论】：

标签： r csv matrix ggplot2

【解决方案1】：

您可以使用read.csv 中的参数check.names = FALSE 来禁止替换列名中的特殊字符。

myData <- read.csv("/matrix_added_cats.csv", check.names = FALSE)

names(myData)
# [1] "Name"               "NGO"                "Gov. institutions"  "Industry (farming)"
# [5] "Industry (mining)"  "Academia/research"  "Aboriginal groups"

【讨论】：