R - 如何使用 R 规范化矩阵中的值答案

【问题标题】：R - How to normalize the values in matrix using RR - 如何使用 R 规范化矩阵中的值
【发布时间】：2015-09-15 14:35:53
【问题描述】：

我有一个这样的矩阵。

term        SaS   PaP   WH
affection   3.06  2.76  2.3
jealous     2     1.85  2.04
gossip     1.3    0     1.78
wuthering   0     0     2.58

我想把它转换成如下的归一化矩阵

term        SaS     PaP     WH
affection   0.789   0.832   0.524
jealous     0.515   0.555   0.465
gossip      0.335   0       0.405
wuthering   0       0       0.588

我尝试使用缩放和扫描来标准化这些值。但是我收到了下面提到的错误

sweep(terms, 2, colSums(terms), FUN ="/" )
colSums(terms) 中的错误：
'x' 必须是至少二维的数组

规模（条款，中心 = FALSE，规模 = colSums（条款））
colSums(terms) 中的错误：
'x' 必须是至少二维的数组

这是类类型

> class(terms)   
[1] "DocumentTermMatrix"       "simple_triplet_matrix"

请帮忙。

更新

根据@small_data 的以下建议，我已将代码更改如下：

  terms <-DocumentTermMatrix(obama.train.p,control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE)))
inspect(terms[1:2, 1:100])
sweep(terms, 2, colSums(as.matrix(terms)), FUN ="/" )
scale(terms, center = FALSE, scale = colSums(as.matrix(terms)))

幸运的是它没有抛出任何错误。但它没有对数据进行标准化。

Docs           93republican94 93son 93stopgap 93surge94 93the 93we 93where 93whi 93you a10  abandon abbottabad
  Obama 1.txt               0     0         0         0     0    0       0     0     0   0 2.321928          0
  Obama 10.txt              0     0         0         0     0    0       0     0     0   0 0.000000

如果您能看到放弃这个词，即使在规范化前后，该值也是 2.321928。对此的任何帮助都会对我有用。

谢谢

【问题讨论】：

@small_data88 - 谢谢。这没有引发任何错误。但它没有对数据进行标准化。发布了问题的更新。再次感谢您的帮助。
@small_data88 - 我认为这是一个数据框。然而，“术语”类显示为 simple_triplet_matrix。我从来没有听说过。
@small_data88 - 好的，你认为应该做些什么来规范数字吗？

标签： r

【解决方案1】：

因为第一列的类是因子sweep 函数不起作用。试试这个：

data.frame(term=term$term,sweep(term[,-1], 2, colSums(term[,-1]), FUN ="/" ))


       term       SaS       PaP        WH
1 affection 0.4811321 0.5986985 0.2643678
2   jealous 0.3144654 0.4013015 0.2344828
3    gossip 0.2044025 0.0000000 0.2045977
4 wuthering 0.0000000 0.0000000 0.2965517

【讨论】：

谢谢。但我现在得到一个不同的错误。 colSums(terms[,-1]) 中的错误：“x”必须是至少二维的数组。对此有什么想法吗？
这应该是一个数组以应用 colSums 函数吗？
@Arun 也许您需要使用as.data.frame 将term 类更改为data.frame。在我回答 term<-as.data.frame(term) 之前尝试一下，然后运行我的回答。 colSums 适用于类似矩阵的对象。至少二维的数组意味着它应该像矩阵一样具有列和行。