R keras tfidf 要求 tf-idf 和 tf-idf 要求 tfidf答案

【问题标题】：R keras tfidf asking for tf-idf and tf-idf asking for tfidfR keras tfidf 要求 tf-idf 和 tf-idf 要求 tfidf
【发布时间】：2020-11-08 07:12:58
【问题描述】：

我正在尝试为班级创建假新闻分类模型，并一直在尝试使用 Keras 来实现。

library(keras)
library(dplyr)
library(ggplot2)
library(purrr)
library(readr)

#loading data
df <- read_csv("train.csv")
test <- read_csv("test.csv")

df %>% count(label)

#splitting data
training_id <- sample.int(nrow(df), size = nrow(df)*0.8)
training <- df[training_id,]
testing <- df[-training_id,]

num_words <- 10000
max_length <- 50
text_vectorization <- layer_text_vectorization(
  max_tokens = num_words,
  output_mode = "tfidf"
)


#modeling
text_vectorization %>% 
  adapt(df$text)

input <- layer_input(shape = c(1), dtype = "string")

output <- input %>% 
  text_vectorization() %>% 
  layer_embedding(input_dim = num_words + 1, output_dim = 16) %>%
  layer_global_average_pooling_1d() %>%
  layer_dense(units = 16, activation = "relu") %>%
  layer_dropout(0.5) %>% 
  layer_dense(units = 1, activation = "sigmoid")
model <- keras_model(input, output)

model %>% compile(
  optimizer = 'rmsprop',
  loss = 'binary_crossentropy',
  metrics = list('accuracy')
)

history <- model %>% fit(
  training$text,
  as.numeric(training$label == "real"),
  epochs = 30,
  batch_size = 512,
  validation_split = 0.2,
  verbose=2
)

results <- model %>% evaluate(testing$text, as.numeric(testing$label == "real"), verbose = 0)
results

plot(history)

问题具体出现在这部分

num_words <- 10000
max_length <- 50
text_vectorization <- layer_text_vectorization(
  max_tokens = num_words,
  output_mode = "tfidf"
)

虽然它在输出模式“count”、“int”和“binary”下工作，但当我使用 tfidf 运行它时出现此错误

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: TextVectorization's output_mode arg received an invalid value tfidf. Allowed values are `None`, or one of the following values: ('int', 'count', 'binary', 'tf-idf').

当我用 tf-idf 运行它时，我得到了这个错误

Error in match.arg(output_mode) : 
  'arg' should be one of “int”, “binary”, “count”, “tfidf”

如果有人知道这个问题的解决方案，我将非常感激

【问题讨论】：

欢迎来到 SO。从将output_mode = 'tfidf' 更改为output_mode = "tf-idf" 的错误消息来看，可能会奏效。在这两个错误信息之间，'tfidf' 和 'tf-idf' 都被使用了，并且你已经尝试了 'tfidf'。 HTH
感谢您的欢迎和简短的回答，但是当将其更改为 output_mode = "tf-idf" 我得到第二个错误。我的意思是当我使用 tfidf 它告诉我应该使用 tf-idf 而当我使用 tf-idf 它说我应该使用 tfidf
你的 `packageVersion('keras') 是什么？似乎是一个有用的主题项目，很高兴看到结果。我会说这很酷，虽然很难。
正如你所建议的，我更新了 keras、r、rstudio 和 tensorflow。我认为它是包中的一个错误，我会向开发人员报告这个问题
如果您能够将其确定为错误，请在此处写下并接受您自己的答案，因为这将完成此问答过程。期待看到你破解这个。

标签： python r keras tf-idf

【解决方案1】：

我已将其确定为一个 bug，并在 github 上向 R keras 团队报告。幸运的是，由于 R 是一个开源的，我已经管理了一个解决方案，但并不完美，因为我仍然无法将它与交叉验证进行比较，但因为学习过程已经完成，所以它是不必要的。

我所要做的就是跟踪trace("layer_text_vectorization", edit=TRUE)

并放置

if (output_mode=="tfidf")
      output_mode <- "tf-idf"

之后

output_mode <- match.arg(output_mode)

【讨论】：